# Audio Separator API REST API for separating audio into vocal and instrumental stems using ML models. ## Quick Start ```bash # Clone and install git clone cd sep chmod +x install.sh test.sh sudo ./install.sh # Run tests ./test.sh # Start the API .venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000 ``` ## Requirements - Python 3.10+ - FFmpeg - 10GB+ disk space (for models) - NVIDIA GPU with CUDA (optional, but recommended) ## API Endpoints ### Health Check ```bash curl http://localhost:8000/health ``` Response: ```json { "status": "healthy", "cuda_available": true, "cuda_device": "NVIDIA GeForce RTX 5090" } ``` ### Separate Audio ```bash curl -X POST http://localhost:8000/separate \ -F "file=@song.mp3" \ -F "output_format=mp3" ``` Response: ```json { "job_id": "a1b2c3d4", "status": "completed", "vocals_url": "/download/song_(Vocals)_model_bs_roformer.mp3", "instrumental_url": "/download/song_(Instrumental)_model_bs_roformer.mp3" } ``` ### Download Stems ```bash curl -O http://localhost:8000/download/song_(Vocals)_model_bs_roformer.mp3 ``` ### List Models ```bash curl http://localhost:8000/models ``` ## Configuration ### Output Formats - `mp3` (default) - Good compression, iOS compatible - `wav` - Lossless, larger files - `flac` - Lossless compression ### Models | Model | Quality | Speed | Best For | |-------|---------|-------|----------| | BS-RoFormer (default) | Highest | Slow | Production use | | UVR_MDXNET_KARA_2 | Good | Fast | Karaoke | | Kim_Vocal_2 | Good | Medium | Vocal isolation | ## VM Deployment ### Using systemd (Linux) The install script creates a systemd service: ```bash sudo systemctl enable audio-separator sudo systemctl start audio-separator sudo systemctl status audio-separator ``` ### Manual Start ```bash .venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000 --workers 1 ``` Note: Use `--workers 1` because the ML model is not thread-safe. ## GPU Support The API automatically detects CUDA GPUs. To verify: ```bash ./test.sh ``` Look for: ``` [PASS] CUDA available: NVIDIA GeForce RTX 5090 (32.0GB VRAM) ``` ### CUDA Installation (Ubuntu) ```bash # Add NVIDIA repo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt-get update sudo apt-get install -y cuda-toolkit-12-1 ``` ## iOS Integration The API returns MP3 files by default, which are natively supported on iOS. Example Swift code: ```swift func separateAudio(fileURL: URL) async throws -> (vocals: URL, instrumental: URL) { var request = URLRequest(url: URL(string: "http://your-vm:8000/separate")!) request.httpMethod = "POST" // Upload file and get response with download URLs // ... } ``` ## File Cleanup Uploaded and output files are automatically deleted after 5 minutes. ## Troubleshooting ### "CUDA not available" 1. Check NVIDIA drivers: `nvidia-smi` 2. Reinstall PyTorch with CUDA: ```bash uv pip install torch --index-url https://download.pytorch.org/whl/cu121 ``` ### "Model download failed" Check network access to huggingface.co and github.com. ### "Out of memory" Reduce batch size or use a smaller model like `UVR_MDXNET_KARA_2`.