mirror of
https://github.com/harivansh-afk/sep.git
synced 2026-04-15 21:03:26 +00:00
170 lines
3.2 KiB
Markdown
170 lines
3.2 KiB
Markdown
# Audio Separator API
|
|
|
|
REST API for separating audio into vocal and instrumental stems using ML models.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Clone and install
|
|
git clone <repo-url>
|
|
cd sep
|
|
chmod +x install.sh test.sh
|
|
sudo ./install.sh
|
|
|
|
# Run tests
|
|
./test.sh
|
|
|
|
# Start the API
|
|
.venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000
|
|
```
|
|
|
|
## Requirements
|
|
|
|
- Python 3.10+
|
|
- FFmpeg
|
|
- 10GB+ disk space (for models)
|
|
- NVIDIA GPU with CUDA (optional, but recommended)
|
|
|
|
## API Endpoints
|
|
|
|
### Health Check
|
|
|
|
```bash
|
|
curl http://localhost:8000/health
|
|
```
|
|
|
|
Response:
|
|
```json
|
|
{
|
|
"status": "healthy",
|
|
"cuda_available": true,
|
|
"cuda_device": "NVIDIA GeForce RTX 5090"
|
|
}
|
|
```
|
|
|
|
### Separate Audio
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8000/separate \
|
|
-F "file=@song.mp3" \
|
|
-F "output_format=mp3"
|
|
```
|
|
|
|
Response:
|
|
```json
|
|
{
|
|
"job_id": "a1b2c3d4",
|
|
"status": "completed",
|
|
"vocals_url": "/download/song_(Vocals)_model_bs_roformer.mp3",
|
|
"instrumental_url": "/download/song_(Instrumental)_model_bs_roformer.mp3"
|
|
}
|
|
```
|
|
|
|
### Download Stems
|
|
|
|
```bash
|
|
curl -O http://localhost:8000/download/song_(Vocals)_model_bs_roformer.mp3
|
|
```
|
|
|
|
### List Models
|
|
|
|
```bash
|
|
curl http://localhost:8000/models
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Output Formats
|
|
|
|
- `mp3` (default) - Good compression, iOS compatible
|
|
- `wav` - Lossless, larger files
|
|
- `flac` - Lossless compression
|
|
|
|
### Models
|
|
|
|
| Model | Quality | Speed | Best For |
|
|
|-------|---------|-------|----------|
|
|
| BS-RoFormer (default) | Highest | Slow | Production use |
|
|
| UVR_MDXNET_KARA_2 | Good | Fast | Karaoke |
|
|
| Kim_Vocal_2 | Good | Medium | Vocal isolation |
|
|
|
|
## VM Deployment
|
|
|
|
### Using systemd (Linux)
|
|
|
|
The install script creates a systemd service:
|
|
|
|
```bash
|
|
sudo systemctl enable audio-separator
|
|
sudo systemctl start audio-separator
|
|
sudo systemctl status audio-separator
|
|
```
|
|
|
|
### Manual Start
|
|
|
|
```bash
|
|
.venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000 --workers 1
|
|
```
|
|
|
|
Note: Use `--workers 1` because the ML model is not thread-safe.
|
|
|
|
## GPU Support
|
|
|
|
The API automatically detects CUDA GPUs. To verify:
|
|
|
|
```bash
|
|
./test.sh
|
|
```
|
|
|
|
Look for:
|
|
```
|
|
[PASS] CUDA available: NVIDIA GeForce RTX 5090 (32.0GB VRAM)
|
|
```
|
|
|
|
### CUDA Installation (Ubuntu)
|
|
|
|
```bash
|
|
# Add NVIDIA repo
|
|
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
|
|
sudo dpkg -i cuda-keyring_1.1-1_all.deb
|
|
sudo apt-get update
|
|
sudo apt-get install -y cuda-toolkit-12-1
|
|
```
|
|
|
|
## iOS Integration
|
|
|
|
The API returns MP3 files by default, which are natively supported on iOS.
|
|
|
|
Example Swift code:
|
|
|
|
```swift
|
|
func separateAudio(fileURL: URL) async throws -> (vocals: URL, instrumental: URL) {
|
|
var request = URLRequest(url: URL(string: "http://your-vm:8000/separate")!)
|
|
request.httpMethod = "POST"
|
|
|
|
// Upload file and get response with download URLs
|
|
// ...
|
|
}
|
|
```
|
|
|
|
## File Cleanup
|
|
|
|
Uploaded and output files are automatically deleted after 5 minutes.
|
|
|
|
## Troubleshooting
|
|
|
|
### "CUDA not available"
|
|
|
|
1. Check NVIDIA drivers: `nvidia-smi`
|
|
2. Reinstall PyTorch with CUDA:
|
|
```bash
|
|
uv pip install torch --index-url https://download.pytorch.org/whl/cu121
|
|
```
|
|
|
|
### "Model download failed"
|
|
|
|
Check network access to huggingface.co and github.com.
|
|
|
|
### "Out of memory"
|
|
|
|
Reduce batch size or use a smaller model like `UVR_MDXNET_KARA_2`.
|