mirror of https://github.com/harivansh-afk/sep.git synced 2026-04-15 06:04:43 +00:00

No description

Find a file

Harivansh Rathi 2c6d447c18 fix		2026-01-24 14:55:51 -05:00
.DS_Store	init	2026-01-23 15:06:41 -05:00
.gitignore	init	2026-01-23 15:06:41 -05:00
.python-version	init	2026-01-23 15:06:41 -05:00
app.py	fix	2026-01-24 14:55:51 -05:00
install.sh	init	2026-01-23 15:06:41 -05:00
pyproject.toml	init	2026-01-23 15:06:41 -05:00
README.md	init	2026-01-23 15:06:41 -05:00
test.sh	init	2026-01-23 15:06:41 -05:00
uv.lock	init	2026-01-23 15:06:41 -05:00

README.md

Audio Separator API

REST API for separating audio into vocal and instrumental stems using ML models.

Quick Start

# Clone and install
git clone <repo-url>
cd sep
chmod +x install.sh test.sh
sudo ./install.sh

# Run tests
./test.sh

# Start the API
.venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000

Requirements

Python 3.10+
FFmpeg
10GB+ disk space (for models)
NVIDIA GPU with CUDA (optional, but recommended)

API Endpoints

Health Check

curl http://localhost:8000/health

Response:

{
  "status": "healthy",
  "cuda_available": true,
  "cuda_device": "NVIDIA GeForce RTX 5090"
}

Separate Audio

curl -X POST http://localhost:8000/separate \
  -F "file=@song.mp3" \
  -F "output_format=mp3"

Response:

{
  "job_id": "a1b2c3d4",
  "status": "completed",
  "vocals_url": "/download/song_(Vocals)_model_bs_roformer.mp3",
  "instrumental_url": "/download/song_(Instrumental)_model_bs_roformer.mp3"
}

Download Stems

curl -O http://localhost:8000/download/song_(Vocals)_model_bs_roformer.mp3

List Models

curl http://localhost:8000/models

Configuration

Output Formats

mp3 (default) - Good compression, iOS compatible
wav - Lossless, larger files
flac - Lossless compression

Models

Model	Quality	Speed	Best For
BS-RoFormer (default)	Highest	Slow	Production use
UVR_MDXNET_KARA_2	Good	Fast	Karaoke
Kim_Vocal_2	Good	Medium	Vocal isolation

VM Deployment

Using systemd (Linux)

The install script creates a systemd service:

sudo systemctl enable audio-separator
sudo systemctl start audio-separator
sudo systemctl status audio-separator

Manual Start

.venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000 --workers 1

Note: Use --workers 1 because the ML model is not thread-safe.

GPU Support

The API automatically detects CUDA GPUs. To verify:

./test.sh

Look for:

[PASS] CUDA available: NVIDIA GeForce RTX 5090 (32.0GB VRAM)

CUDA Installation (Ubuntu)

# Add NVIDIA repo
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-toolkit-12-1

iOS Integration

The API returns MP3 files by default, which are natively supported on iOS.

Example Swift code:

func separateAudio(fileURL: URL) async throws -> (vocals: URL, instrumental: URL) {
    var request = URLRequest(url: URL(string: "http://your-vm:8000/separate")!)
    request.httpMethod = "POST"

    // Upload file and get response with download URLs
    // ...
}

File Cleanup

Uploaded and output files are automatically deleted after 5 minutes.

Troubleshooting

"CUDA not available"

Check NVIDIA drivers: nvidia-smi

Reinstall PyTorch with CUDA:

uv pip install torch --index-url https://download.pytorch.org/whl/cu121

"Model download failed"

Check network access to huggingface.co and github.com.

"Out of memory"

Reduce batch size or use a smaller model like UVR_MDXNET_KARA_2.