sep/README.md

# Audio Separator API

REST API for separating audio into vocal and instrumental stems using ML models.

## Quick Start

```bash
# Clone and install
git clone <repo-url>
cd sep
chmod +x install.sh test.sh
sudo ./install.sh

# Run tests
./test.sh

# Start the API
.venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000
```

## Requirements

- Python 3.10+
- FFmpeg
- 10GB+ disk space (for models)
- NVIDIA GPU with CUDA (optional, but recommended)

## API Endpoints

### Health Check

```bash
curl http://localhost:8000/health
```

Response:
```json
{
  "status": "healthy",
  "cuda_available": true,
  "cuda_device": "NVIDIA GeForce RTX 5090"
}
```

### Separate Audio

```bash
curl -X POST http://localhost:8000/separate \
  -F "file=@song.mp3" \
  -F "output_format=mp3"
```

Response:
```json
{
  "job_id": "a1b2c3d4",
  "status": "completed",
  "vocals_url": "/download/song_(Vocals)_model_bs_roformer.mp3",
  "instrumental_url": "/download/song_(Instrumental)_model_bs_roformer.mp3"
}
```

### Download Stems

```bash
curl -O http://localhost:8000/download/song_(Vocals)_model_bs_roformer.mp3
```

### List Models

```bash
curl http://localhost:8000/models
```

## Configuration

### Output Formats

- `mp3` (default) - Good compression, iOS compatible
- `wav` - Lossless, larger files
- `flac` - Lossless compression

### Models

| Model | Quality | Speed | Best For |
|-------|---------|-------|----------|
| BS-RoFormer (default) | Highest | Slow | Production use |
| UVR_MDXNET_KARA_2 | Good | Fast | Karaoke |
| Kim_Vocal_2 | Good | Medium | Vocal isolation |

## VM Deployment

### Using systemd (Linux)

The install script creates a systemd service:

```bash
sudo systemctl enable audio-separator
sudo systemctl start audio-separator
sudo systemctl status audio-separator
```

### Manual Start

```bash
.venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000 --workers 1
```

Note: Use `--workers 1` because the ML model is not thread-safe.

## GPU Support

The API automatically detects CUDA GPUs. To verify:

```bash
./test.sh
```

Look for:
```
[PASS] CUDA available: NVIDIA GeForce RTX 5090 (32.0GB VRAM)
```

### CUDA Installation (Ubuntu)

```bash
# Add NVIDIA repo
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-toolkit-12-1
```

## iOS Integration

The API returns MP3 files by default, which are natively supported on iOS.

Example Swift code:

```swift
func separateAudio(fileURL: URL) async throws -> (vocals: URL, instrumental: URL) {
    var request = URLRequest(url: URL(string: "http://your-vm:8000/separate")!)
    request.httpMethod = "POST"

    // Upload file and get response with download URLs
    // ...
}
```

## File Cleanup

Uploaded and output files are automatically deleted after 5 minutes.

## Troubleshooting

### "CUDA not available"

1. Check NVIDIA drivers: `nvidia-smi`
2. Reinstall PyTorch with CUDA:
   ```bash
   uv pip install torch --index-url https://download.pytorch.org/whl/cu121
   ```

### "Model download failed"

Check network access to huggingface.co and github.com.

### "Out of memory"

Reduce batch size or use a smaller model like `UVR_MDXNET_KARA_2`.