mirror of
https://github.com/harivansh-afk/sep.git
synced 2026-04-15 06:04:43 +00:00
No description
| .DS_Store | ||
| .gitignore | ||
| .python-version | ||
| app.py | ||
| install.sh | ||
| pyproject.toml | ||
| README.md | ||
| test.sh | ||
| uv.lock | ||
Audio Separator API
REST API for separating audio into vocal and instrumental stems using ML models.
Quick Start
# Clone and install
git clone <repo-url>
cd sep
chmod +x install.sh test.sh
sudo ./install.sh
# Run tests
./test.sh
# Start the API
.venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000
Requirements
- Python 3.10+
- FFmpeg
- 10GB+ disk space (for models)
- NVIDIA GPU with CUDA (optional, but recommended)
API Endpoints
Health Check
curl http://localhost:8000/health
Response:
{
"status": "healthy",
"cuda_available": true,
"cuda_device": "NVIDIA GeForce RTX 5090"
}
Separate Audio
curl -X POST http://localhost:8000/separate \
-F "file=@song.mp3" \
-F "output_format=mp3"
Response:
{
"job_id": "a1b2c3d4",
"status": "completed",
"vocals_url": "/download/song_(Vocals)_model_bs_roformer.mp3",
"instrumental_url": "/download/song_(Instrumental)_model_bs_roformer.mp3"
}
Download Stems
curl -O http://localhost:8000/download/song_(Vocals)_model_bs_roformer.mp3
List Models
curl http://localhost:8000/models
Configuration
Output Formats
mp3(default) - Good compression, iOS compatiblewav- Lossless, larger filesflac- Lossless compression
Models
| Model | Quality | Speed | Best For |
|---|---|---|---|
| BS-RoFormer (default) | Highest | Slow | Production use |
| UVR_MDXNET_KARA_2 | Good | Fast | Karaoke |
| Kim_Vocal_2 | Good | Medium | Vocal isolation |
VM Deployment
Using systemd (Linux)
The install script creates a systemd service:
sudo systemctl enable audio-separator
sudo systemctl start audio-separator
sudo systemctl status audio-separator
Manual Start
.venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000 --workers 1
Note: Use --workers 1 because the ML model is not thread-safe.
GPU Support
The API automatically detects CUDA GPUs. To verify:
./test.sh
Look for:
[PASS] CUDA available: NVIDIA GeForce RTX 5090 (32.0GB VRAM)
CUDA Installation (Ubuntu)
# Add NVIDIA repo
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-toolkit-12-1
iOS Integration
The API returns MP3 files by default, which are natively supported on iOS.
Example Swift code:
func separateAudio(fileURL: URL) async throws -> (vocals: URL, instrumental: URL) {
var request = URLRequest(url: URL(string: "http://your-vm:8000/separate")!)
request.httpMethod = "POST"
// Upload file and get response with download URLs
// ...
}
File Cleanup
Uploaded and output files are automatically deleted after 5 minutes.
Troubleshooting
"CUDA not available"
- Check NVIDIA drivers:
nvidia-smi - Reinstall PyTorch with CUDA:
uv pip install torch --index-url https://download.pytorch.org/whl/cu121
"Model download failed"
Check network access to huggingface.co and github.com.
"Out of memory"
Reduce batch size or use a smaller model like UVR_MDXNET_KARA_2.