sep/README.md
2026-01-23 15:06:41 -05:00

3.2 KiB

Audio Separator API

REST API for separating audio into vocal and instrumental stems using ML models.

Quick Start

# Clone and install
git clone <repo-url>
cd sep
chmod +x install.sh test.sh
sudo ./install.sh

# Run tests
./test.sh

# Start the API
.venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000

Requirements

  • Python 3.10+
  • FFmpeg
  • 10GB+ disk space (for models)
  • NVIDIA GPU with CUDA (optional, but recommended)

API Endpoints

Health Check

curl http://localhost:8000/health

Response:

{
  "status": "healthy",
  "cuda_available": true,
  "cuda_device": "NVIDIA GeForce RTX 5090"
}

Separate Audio

curl -X POST http://localhost:8000/separate \
  -F "file=@song.mp3" \
  -F "output_format=mp3"

Response:

{
  "job_id": "a1b2c3d4",
  "status": "completed",
  "vocals_url": "/download/song_(Vocals)_model_bs_roformer.mp3",
  "instrumental_url": "/download/song_(Instrumental)_model_bs_roformer.mp3"
}

Download Stems

curl -O http://localhost:8000/download/song_(Vocals)_model_bs_roformer.mp3

List Models

curl http://localhost:8000/models

Configuration

Output Formats

  • mp3 (default) - Good compression, iOS compatible
  • wav - Lossless, larger files
  • flac - Lossless compression

Models

Model Quality Speed Best For
BS-RoFormer (default) Highest Slow Production use
UVR_MDXNET_KARA_2 Good Fast Karaoke
Kim_Vocal_2 Good Medium Vocal isolation

VM Deployment

Using systemd (Linux)

The install script creates a systemd service:

sudo systemctl enable audio-separator
sudo systemctl start audio-separator
sudo systemctl status audio-separator

Manual Start

.venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000 --workers 1

Note: Use --workers 1 because the ML model is not thread-safe.

GPU Support

The API automatically detects CUDA GPUs. To verify:

./test.sh

Look for:

[PASS] CUDA available: NVIDIA GeForce RTX 5090 (32.0GB VRAM)

CUDA Installation (Ubuntu)

# Add NVIDIA repo
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-toolkit-12-1

iOS Integration

The API returns MP3 files by default, which are natively supported on iOS.

Example Swift code:

func separateAudio(fileURL: URL) async throws -> (vocals: URL, instrumental: URL) {
    var request = URLRequest(url: URL(string: "http://your-vm:8000/separate")!)
    request.httpMethod = "POST"

    // Upload file and get response with download URLs
    // ...
}

File Cleanup

Uploaded and output files are automatically deleted after 5 minutes.

Troubleshooting

"CUDA not available"

  1. Check NVIDIA drivers: nvidia-smi
  2. Reinstall PyTorch with CUDA:
    uv pip install torch --index-url https://download.pytorch.org/whl/cu121
    

"Model download failed"

Check network access to huggingface.co and github.com.

"Out of memory"

Reduce batch size or use a smaller model like UVR_MDXNET_KARA_2.