init

2026-04-16 16:01:05 +00:00 · 2026-01-23 15:06:41 -05:00 · 2026-01-23 15:06:41 -05:00 · 9d85ca1ebb
commit 9d85ca1ebb
9 changed files with 2928 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,170 @@
+# Audio Separator API
+
+REST API for separating audio into vocal and instrumental stems using ML models.
+
+## Quick Start
+
+```bash
+# Clone and install
+git clone <repo-url>
+cd sep
+chmod +x install.sh test.sh
+sudo ./install.sh
+
+# Run tests
+./test.sh
+
+# Start the API
+.venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000
+```
+
+## Requirements
+
+- Python 3.10+
+- FFmpeg
+- 10GB+ disk space (for models)
+- NVIDIA GPU with CUDA (optional, but recommended)
+
+## API Endpoints
+
+### Health Check
+
+```bash
+curl http://localhost:8000/health
+```
+
+Response:
+```json
+{
+  "status": "healthy",
+  "cuda_available": true,
+  "cuda_device": "NVIDIA GeForce RTX 5090"
+}
+```
+
+### Separate Audio
+
+```bash
+curl -X POST http://localhost:8000/separate \
+  -F "file=@song.mp3" \
+  -F "output_format=mp3"
+```
+
+Response:
+```json
+{
+  "job_id": "a1b2c3d4",
+  "status": "completed",
+  "vocals_url": "/download/song_(Vocals)_model_bs_roformer.mp3",
+  "instrumental_url": "/download/song_(Instrumental)_model_bs_roformer.mp3"
+}
+```
+
+### Download Stems
+
+```bash
+curl -O http://localhost:8000/download/song_(Vocals)_model_bs_roformer.mp3
+```
+
+### List Models
+
+```bash
+curl http://localhost:8000/models
+```
+
+## Configuration
+
+### Output Formats
+
+- `mp3` (default) - Good compression, iOS compatible
+- `wav` - Lossless, larger files
+- `flac` - Lossless compression
+
+### Models
+
+| Model | Quality | Speed | Best For |
+|-------|---------|-------|----------|
+| BS-RoFormer (default) | Highest | Slow | Production use |
+| UVR_MDXNET_KARA_2 | Good | Fast | Karaoke |
+| Kim_Vocal_2 | Good | Medium | Vocal isolation |
+
+## VM Deployment
+
+### Using systemd (Linux)
+
+The install script creates a systemd service:
+
+```bash
+sudo systemctl enable audio-separator
+sudo systemctl start audio-separator
+sudo systemctl status audio-separator
+```
+
+### Manual Start
+
+```bash
+.venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000 --workers 1
+```
+
+Note: Use `--workers 1` because the ML model is not thread-safe.
+
+## GPU Support
+
+The API automatically detects CUDA GPUs. To verify:
+
+```bash
+./test.sh
+```
+
+Look for:
+```
+[PASS] CUDA available: NVIDIA GeForce RTX 5090 (32.0GB VRAM)
+```
+
+### CUDA Installation (Ubuntu)
+
+```bash
+# Add NVIDIA repo
+wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
+sudo dpkg -i cuda-keyring_1.1-1_all.deb
+sudo apt-get update
+sudo apt-get install -y cuda-toolkit-12-1
+```
+
+## iOS Integration
+
+The API returns MP3 files by default, which are natively supported on iOS.
+
+Example Swift code:
+
+```swift
+func separateAudio(fileURL: URL) async throws -> (vocals: URL, instrumental: URL) {
+    var request = URLRequest(url: URL(string: "http://your-vm:8000/separate")!)
+    request.httpMethod = "POST"
+
+    // Upload file and get response with download URLs
+    // ...
+}
+```
+
+## File Cleanup
+
+Uploaded and output files are automatically deleted after 5 minutes.
+
+## Troubleshooting
+
+### "CUDA not available"
+
+1. Check NVIDIA drivers: `nvidia-smi`
+2. Reinstall PyTorch with CUDA:
+   ```bash
+   uv pip install torch --index-url https://download.pytorch.org/whl/cu121
+   ```
+
+### "Model download failed"
+
+Check network access to huggingface.co and github.com.
+
+### "Out of memory"
+
+Reduce batch size or use a smaller model like `UVR_MDXNET_KARA_2`.