sep/README.md
2026-01-23 15:06:41 -05:00

170 lines
3.2 KiB
Markdown

# Audio Separator API
REST API for separating audio into vocal and instrumental stems using ML models.
## Quick Start
```bash
# Clone and install
git clone <repo-url>
cd sep
chmod +x install.sh test.sh
sudo ./install.sh
# Run tests
./test.sh
# Start the API
.venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000
```
## Requirements
- Python 3.10+
- FFmpeg
- 10GB+ disk space (for models)
- NVIDIA GPU with CUDA (optional, but recommended)
## API Endpoints
### Health Check
```bash
curl http://localhost:8000/health
```
Response:
```json
{
"status": "healthy",
"cuda_available": true,
"cuda_device": "NVIDIA GeForce RTX 5090"
}
```
### Separate Audio
```bash
curl -X POST http://localhost:8000/separate \
-F "file=@song.mp3" \
-F "output_format=mp3"
```
Response:
```json
{
"job_id": "a1b2c3d4",
"status": "completed",
"vocals_url": "/download/song_(Vocals)_model_bs_roformer.mp3",
"instrumental_url": "/download/song_(Instrumental)_model_bs_roformer.mp3"
}
```
### Download Stems
```bash
curl -O http://localhost:8000/download/song_(Vocals)_model_bs_roformer.mp3
```
### List Models
```bash
curl http://localhost:8000/models
```
## Configuration
### Output Formats
- `mp3` (default) - Good compression, iOS compatible
- `wav` - Lossless, larger files
- `flac` - Lossless compression
### Models
| Model | Quality | Speed | Best For |
|-------|---------|-------|----------|
| BS-RoFormer (default) | Highest | Slow | Production use |
| UVR_MDXNET_KARA_2 | Good | Fast | Karaoke |
| Kim_Vocal_2 | Good | Medium | Vocal isolation |
## VM Deployment
### Using systemd (Linux)
The install script creates a systemd service:
```bash
sudo systemctl enable audio-separator
sudo systemctl start audio-separator
sudo systemctl status audio-separator
```
### Manual Start
```bash
.venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000 --workers 1
```
Note: Use `--workers 1` because the ML model is not thread-safe.
## GPU Support
The API automatically detects CUDA GPUs. To verify:
```bash
./test.sh
```
Look for:
```
[PASS] CUDA available: NVIDIA GeForce RTX 5090 (32.0GB VRAM)
```
### CUDA Installation (Ubuntu)
```bash
# Add NVIDIA repo
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-toolkit-12-1
```
## iOS Integration
The API returns MP3 files by default, which are natively supported on iOS.
Example Swift code:
```swift
func separateAudio(fileURL: URL) async throws -> (vocals: URL, instrumental: URL) {
var request = URLRequest(url: URL(string: "http://your-vm:8000/separate")!)
request.httpMethod = "POST"
// Upload file and get response with download URLs
// ...
}
```
## File Cleanup
Uploaded and output files are automatically deleted after 5 minutes.
## Troubleshooting
### "CUDA not available"
1. Check NVIDIA drivers: `nvidia-smi`
2. Reinstall PyTorch with CUDA:
```bash
uv pip install torch --index-url https://download.pytorch.org/whl/cu121
```
### "Model download failed"
Check network access to huggingface.co and github.com.
### "Out of memory"
Reduce batch size or use a smaller model like `UVR_MDXNET_KARA_2`.