Enhanced fork of ACE-Step UI β AI Music Production Suite for Windows
Fork mejorado de ACE-Step UI β Suite de producciΓ³n musical con IA para Windows
ProdIA-MAX is a fork and extension of ACE-Step UI by fspecii.
All original UI code, architecture, and design belong to their respective authors.
This project adds Windows-specific tooling and production-focused enhancements on top of their work.
ProdIA-MAX es un fork y extensiΓ³n de ACE-Step UI creado por fspecii.
Todo el cΓ³digo UI original, arquitectura y diseΓ±o pertenecen a sus respectivos autores.
Este proyecto aΓ±ade herramientas optimizadas para Windows y mejoras orientadas a producciΓ³n musical.
| Component | Author | License | Link |
|---|---|---|---|
| ACE-Step UI (base UI) | fspecii | MIT | github.com/fspecii/ace-step-ui |
| ACE-Step 1.5 (AI model) | ACE-Step Team | MIT | github.com/ace-step/ACE-Step-1.5 |
| ProdIA-MAX (this fork) | ElWalki | MIT | β |
| i18n system & translations | scruffynerf | MIT | PR #1 |
EN: ProdIA-MAX is a Windows-optimized fork of ACE-Step UI that bundles extra production tools: BPM/key detection, automatic lyric transcription, vocal/instrumental separation, LoRA training preparation, and one-click launchers β all wired to the ACE-Step 1.5 AI music generation engine.
ES: ProdIA-MAX es un fork de ACE-Step UI optimizado para Windows que incluye herramientas extra de producciΓ³n: detecciΓ³n de BPM y tonalidad, transcripciΓ³n automΓ‘tica de letras, separaciΓ³n vocal/instrumental, preparaciΓ³n para entrenamiento LoRA y lanzadores con un solo clic β todo integrado con el motor de generaciΓ³n musical IA ACE-Step 1.5.
| Feature | Status | Description |
|---|---|---|
| Vocal Separation (Demucs) | β Beta | Separate vocals/instrumental from any song |
| Vocal Reference Tab | β | Dedicated vocal reference panel |
| Independent Audio Strengths | β | Separate reference + source strength sliders |
| Audio Codes System | β | Full semantic audio code pipeline β extract, apply & condition generation |
| Mic Recorder + Audio Codes | β | Record voice β auto-extract Audio Codes + optional Whisper transcription |
| Whisper Model Selector | β | Choose Whisper model (tinyβturbo) with download status indicators |
| Process + Whisper / Solo Procesar | β | Two-button workflow: extract codes with or without lyric transcription |
| Chord Progression Editor | β | Interactive chord builder with scale-aware suggestions and audio preview |
| 1/4 Time Signature | β | Added 1/4 time signature option to all time signature selectors |
| Feature | Status | Description |
|---|---|---|
| Chat Assistant | β | Built-in AI assistant (OpenRouter/local LLM) for music production guidance |
| LLM Provider Selector | β | Switch between OpenRouter models or local LLM endpoints |
| Language Selector | β | Full i18n β switch UI language (ES/EN/+more) |
| Sidebar Info Panel | β | Click info icons β scrollable info panel in sidebar (no clipping) |
| Professional Audio Player | β | Enhanced player with playback speed, shuffle, repeat modes |
| Song Cards & Drag-Drop | β | Visual song cards with drag-and-drop to playlists |
| Resizable Chat Panel | β | Drag to resize chat assistant panel |
| Style Editor | β | Edit and manage generation style presets |
| Feature | Status | Description |
|---|---|---|
| Audio Metadata Tagging | β | ID3 tags (title, artist, BPM, key) in MP3s |
| Edit Metadata | β | Edit BPM, key, time signature, title in-app |
| LoRA Quick Unload | β | One-click unload on collapsed LoRA panel |
| Time Signature Labels | β | Proper 1/4, 2/4, 3/4, 4/4, 5/4, 6/8, 7/8, 8/8 notation |
| Prepare for Training | β Beta | Quick button to prep songs for LoRA training |
| VRAM Safety | β | Generation locked during Demucs separation |
| BPM & Key Detection | β | detectar_bpm_clave.py β batch detect BPM/key |
| Lyric Transcription | β | transcribir_letras.py β Whisper-based transcription |
| Caption Tools | β | Apply/truncate training captions automatically |
| One-click Windows launchers | β | .bat scripts for setup, launch, and cleanup |
IMPORTANT / IMPORTANTE: Audio generation quality is dramatically better using the base model at 124 inference steps compared to turbo mode (8 steps). While turbo is faster (~3-5s per song), the base model at 124 steps produces significantly richer, more coherent, and higher-fidelity audio.
La calidad de generaciΓ³n de audio es MUCHΓSIMO mejor usando el modelo base a 124 pasos en comparaciΓ³n con el modo turbo (8 pasos). Aunque turbo es mΓ‘s rΓ‘pido (~3-5s por canciΓ³n), el modelo base a 124 pasos produce audio significativamente mΓ‘s rico, mΓ‘s coherente y de mayor fidelidad.
| Setting | Turbo (fast) | Base (recommended) |
|---|---|---|
| Model | turbo |
base |
| Inference Steps | 8 | 124 |
| Speed (30s song) | ~3-5s | ~45-90s |
| Quality | Good | Excellent β |
| Coherence | Decent | Very high |
| Recommended for | Quick previews | Final production |
| Requirement | Minimum | Recommended |
|---|---|---|
| OS | Windows 10 / Linux / macOS | Windows 11 |
| GPU VRAM | 4 GB | 12 GB+ |
| Python | 3.10 | 3.11 |
| Node.js | 18 | 20+ |
| CUDA | β | 12.8 |
| Disk Space | ~15 GB (required models) | ~40 GB (all models) |
You only need 3 things installed before running ProdIA-MAX. Everything else is automatic.
Solo necesitas 3 cosas instaladas antes de ejecutar ProdIA-MAX. Todo lo demΓ‘s es automΓ‘tico.
EN: Download from python.org. During installation, check "Add Python to PATH" β this is critical.
ES: Descarga desde python.org. Durante la instalaciΓ³n, marca "Add Python to PATH" β esto es esencial.
β οΈ Python 3.11 is strongly recommended. Thepyproject.tomllocks to==3.11.*and some dependencies (likeflash-attn) only have pre-built wheels for 3.11.
EN: Download from nodejs.org (LTS recommended). This powers the backend server and both frontends.
ES: Descarga desde nodejs.org (recomendado LTS). Esto ejecuta el servidor backend y ambas interfaces.
EN: You need an NVIDIA GPU with updated drivers. CUDA 12.8 compatible drivers are required β PyTorch will be installed with CUDA 12.8 support automatically. Without an NVIDIA GPU, the app cannot generate music.
ES: Necesitas una GPU NVIDIA con drivers actualizados. Se requieren drivers compatibles con CUDA 12.8 β PyTorch se instalarΓ‘ con soporte CUDA 12.8 automΓ‘ticamente. Sin GPU NVIDIA, la app no puede generar mΓΊsica.
π‘ Git is only needed if you clone the repo. If you download the ZIP, Git is not required.
π‘ Git solo es necesario si clonas el repo. Si descargas el ZIP, no necesitas Git.
python --version REM Should show 3.11.x
node --version REM Should show v18+ or v20+
nvidia-smi REM Should show your GPU and driver versionREM Option 1: Launch everything (Classic UI + Pro UI + AI engine)
iniciar_todo.bat
REM Option 2: Launch Pro UI only (recommended)
iniciar_pro.batchmod +x iniciar_todo.sh
./iniciar_todo.sh| Launcher | URL | Description |
|---|---|---|
iniciar_todo.bat |
http://localhost:3000 | Classic UI (+ Pro UI on :3002) |
iniciar_pro.bat |
http://localhost:3002 | Pro UI only (recommended) |
EN: The first time you run iniciar_todo.bat or iniciar_pro.bat, the script will automatically:
ES: La primera vez que ejecutes iniciar_todo.bat o iniciar_pro.bat, el script automΓ‘ticamente:
| Step | What it does | Time (est.) |
|---|---|---|
| 1οΈβ£ | Creates Python virtual environment (.venv inside ACE-Step-1.5_/) |
~10 sec |
| 2οΈβ£ | Installs Python dependencies via pip install -r requirements.txt β PyTorch (CUDA 12.8), Transformers, Gradio, etc. |
~5-15 min |
| 3οΈβ£ | Installs Node.js dependencies via npm install in 3 folders (ace-step-ui, ace-step-ui/server, ace-step-ui-pro) |
~2-5 min |
| 4οΈβ£ | Downloads AI models from HuggingFace (~9.6 GB for required models) | ~10-30 min |
| 5οΈβ£ | Starts all services and opens your browser | ~30 sec |
β±οΈ Total first run time: ~20-50 minutes depending on your internet speed. Subsequent runs start in ~30 seconds.
β±οΈ Tiempo total primera ejecuciΓ³n: ~20-50 minutos dependiendo de tu velocidad de internet. Las siguientes ejecuciones arrancan en ~30 segundos.
A marker file .deps_installed is created after successful installation β subsequent runs skip the install step unless requirements.txt changes.
All models are downloaded from HuggingFace automatically on first run. They are stored in ACE-Step-1.5_/checkpoints/.
| Model | Size | VRAM | Purpose / PropΓ³sito |
|---|---|---|---|
acestep-v15-turbo |
~4.5 GB | ~4 GB | Default generation model (diffusion transformer) |
vae |
~500 MB | ~1 GB | Audio encoder/decoder |
Qwen3-Embedding-0.6B |
~1.2 GB | ~1 GB | Text encoder for lyrics/tags |
acestep-5Hz-lm-1.7B |
~3.4 GB | ~4 GB | Language model for music structure |
| Total | ~9.6 GB |
Use verificar_modelos.bat to check which models you have and download extras:
Usa verificar_modelos.bat para ver quΓ© modelos tienes y descargar extras:
| Model | Size | VRAM | Purpose / PropΓ³sito |
|---|---|---|---|
acestep-v15-base |
~4.5 GB | ~4 GB | Base model β 124 steps, highest quality β |
acestep-v15-sft |
~4.5 GB | ~4 GB | Fine-tuned variant |
acestep-v15-turbo-shift1 |
~4.5 GB | ~4 GB | Turbo variant 1 |
acestep-v15-turbo-shift3 |
~4.5 GB | ~4 GB | Turbo variant 3 |
acestep-v15-turbo-continuous |
~4.5 GB | ~4 GB | Continuous generation mode |
acestep-5Hz-lm-0.6B |
~1.2 GB | ~3 GB | Small LM (less VRAM) |
acestep-5Hz-lm-4B |
~7.8 GB | ~12 GB | Large LM (best quality lyrics) |
π‘ VRAM recommendation: 8 GB VRAM for turbo mode, 12 GB+ for base model at 124 steps with the 4B language model.
π‘ RecomendaciΓ³n VRAM: 8 GB VRAM para modo turbo, 12 GB+ para modelo base a 124 pasos con el modelo de lenguaje 4B.
All models come from HuggingFace repos under ACE-Step/ β no account or token needed.
Todos los modelos vienen de repos HuggingFace bajo ACE-Step/ β no se necesita cuenta ni token.
ProdIA-MAX/
β
βββ ACE-Step-1.5_/ # ACE-Step 1.5 AI engine (original, MIT)
β βββ acestep/ # Core Python inference code
β βββ requirements.txt # β
Python dependencies (PyTorch, Gradio, etc.)
β βββ ...
β
βββ ace-step-ui/ # Base UI fork from fspecii (original, MIT)
β βββ server/ # Express + SQLite backend (Node.js)
β βββ package.json # Node.js dependencies (backend + frontend)
β βββ ...
β
βββ ace-step-ui-pro/ # ProdIA-MAX Pro UI (React 19 + Vite + Tailwind v4)
β βββ src/ # TypeScript source code
β β βββ components/ # React components (views, create, ui, layout)
β β βββ services/ # API clients, chord service, etc.
β β βββ i18n.ts # Internationalization (EN/ES)
β β βββ App.tsx # Main application entry
β βββ package.json # Node.js dependencies (frontend)
β βββ ...
β
βββ i18n/ # i18n utilities & locale files
β βββ locales/ # Translation JSON files
β βββ utils.py # i18n helper scripts
β
βββ IMAGES/ # Screenshots & product images
β βββ main.png # Main product screenshot
β
βββ SecurityScan/ # Security tools
β βββ escanear_seguridad.bat # Run security scan (Windows)
β βββ limpiar_kms.bat # KMS cleanup
β βββ LEEME.md # Security scan documentation
β
βββ backups/ # Session backups
β
βββ β
MAX Production Tools ββββββββββββββββββββββββββββββ
βββ aplicar_captions_v3.py # Auto-apply training captions
βββ detectar_bpm_clave.py # BPM & key detection (librosa)
βββ transcribir_letras.py # Lyric transcription (Whisper)
βββ transcribir_letras_v2.py # Lyric transcription v2
βββ truncar_captions.py # Caption truncation helper
βββ check_genres.py # Genre validator
βββ check_tensors.py # Tensor/model inspector
β
βββ β
Launchers (Windows) βββββββββββββββββββββββββββββββ
βββ iniciar_todo.bat # One-click launcher (backend + frontend + AI)
βββ iniciar_pro.bat # Launch Pro UI only
βββ verificar_modelos.bat # Model verification
βββ limpiar_datos_usuario.bat # Clean user data
βββ desinstalar.bat # Uninstall / cleanup
βββ transcribir_letras.bat # Transcribe lyrics launcher
β
βββ β
Launchers (Linux/macOS) βββββββββββββββββββββββββββ
βββ setup.sh # First-time setup
βββ iniciar_todo.sh # One-click launcher
βββ iniciar_pro.sh # Launch Pro UI only
βββ iniciar_acestep_ui.sh # Launch base UI
βββ verificar_modelos.sh # Model verification
βββ limpiar_datos_usuario.sh # Clean user data
βββ desinstalar.sh # Uninstall / cleanup
βββ detectar_bpm_clave.sh # BPM & key detection
βββ transcribir_letras.sh # Transcribe lyrics
β
βββ β
Documentation βββββββββββββββββββββββββββββββββββββ
βββ README.md # This file
βββ SINGER_LIBRARY_SPEC.md # Singer library specification
βββ i18nHowTo.md # i18n guide (English)
βββ i18nHowTo_es.md # i18n guide (Spanish)
ACE-Step uses semantic audio codes β tokens at 5Hz that encode melody, rhythm, structure, and timbre holistically. Each code (<|audio_code_N|>, N=0β63999) represents 200ms of audio.
How to use:
- Load audio in Source Audio β click "Convert to Codes" β codes appear in the Audio Codes field
- Or use the Voice Recorder β "Process + Whisper" extracts codes automatically
- Write a different text prompt (e.g., change genre/style)
- Adjust audio_cover_strength (0.3β0.5 = new style with original structure)
- Generate β the model follows the code structure but with your new style
| Property | Value |
|---|---|
| Rate | 5 codes/second (200ms each) |
| Codebook size | 64,000 (FSQ: [8,8,8,5,5,5]) |
| 30s song | ~150 codes |
| 60s song | ~300 codes |
| Quantizer | Finite Scalar Quantization (1 quantizer, 6D) |
Note: Individual codes encode ALL musical properties simultaneously (melody + rhythm + timbre). You cannot isolate rhythm from melody at the code level, but you can use codes as structural guidance while the text prompt controls style/instrumentation.
- Record your voice/humming/beatbox
- Choose a Whisper model (tinyβturbo) β download status shown with β/β indicators
- Click "Procesar + Whisper" β extracts Audio Codes AND transcribes lyrics
- Or click "Solo Procesar" β extracts Audio Codes only (faster, no transcription)
- Click "Aplicar" β codes + lyrics are applied to the Create Panel
- The generation will use your voice structure as a musical guide
| Script | What it does / QuΓ© hace |
|---|---|
iniciar_todo.bat |
Full launch: installs deps if needed β starts AI engine (port 8001) β backend (port 3001) β Classic UI (port 3000) β Pro UI (port 3002) β opens browser |
iniciar_pro.bat |
Pro launch: same as above but skips Classic UI β only AI engine + backend + Pro UI |
iniciar_todo.sh |
Linux/macOS equivalent of iniciar_todo.bat |
| Script | What it does / QuΓ© hace |
|---|---|
verificar_modelos.bat |
Checks which AI models are installed and offers to download missing ones from HuggingFace |
transcribir_letras.bat |
Launches the lyrics transcription tool (uses Demucs for stem separation + Whisper for transcription). 6 quality modes available |
limpiar_datos_usuario.bat |
Deletes user data only (database + generated audio). Keeps models, code, and dependencies |
desinstalar.bat |
Full uninstall β removes .venv, all node_modules, database, and generated audio. Keeps models and source code |
| Script | What it does / QuΓ© hace |
|---|---|
_start_gradio_api.bat |
Starts only the Gradio AI engine on port 8001 |
_start_backend.bat |
Starts only the Node.js backend on port 3001 |
_start_frontend.bat |
Starts only the Classic UI on port 3000 |
iniciar_acestep.bat |
Interactive menu: standalone Gradio UI (port 7860), API mode, model download, advanced config |
| Script | What it does / QuΓ© hace |
|---|---|
detectar_bpm_clave.py |
Batch detect BPM and musical key of audio files |
transcribir_letras.py |
Transcribe lyrics from audio using Whisper |
aplicar_captions_v3.py |
Apply training captions to audio files for LoRA preparation |
truncar_captions.py |
Truncate captions to fit training requirements |
check_genres.py |
Validate genre tags in audio files |
check_tensors.py |
Inspect model tensor files |
| Port | Service | Description |
|---|---|---|
| 8001 | Gradio API | ACE-Step AI music generation engine |
| 3001 | Express.js Backend | Auth, model management, audio serving, database |
| 3000 | Classic Frontend | Legacy React UI (Vite dev server) |
| 3002 | Pro Frontend | ProdIA-MAX Pro UI (Vite + React 19 + Tailwind v4) |
| 7860 | Gradio Native UI | Only used with iniciar_acestep.bat standalone mode |
All ports are killed automatically before launch to avoid conflicts. The scripts handle this for you.
Todos los puertos se liberan automΓ‘ticamente antes del lanzamiento para evitar conflictos.
- Make sure Python 3.11 is installed and added to PATH
- Run
python --versionin a terminal to verify - If you have multiple Python versions, ensure 3.11 is the default
- Install Node.js from nodejs.org
- Restart your terminal after installing
- Run
node --versionto verify
- Use a smaller language model (
0.6Binstead of1.7Bor4B) - Use turbo mode instead of base model
- Close other GPU-intensive applications
- Check your VRAM with
nvidia-smi
- Check your internet connection
- Run
verificar_modelos.batto manually download models - Models are downloaded from HuggingFace β no VPN restrictions needed
- If download was interrupted, delete the incomplete folder in
ACE-Step-1.5_/checkpoints/and retry
- The launcher scripts auto-kill processes on required ports
- If you still get errors, manually close any process using ports 8001, 3001, 3000, or 3002
- On Windows:
netstat -ano | findstr :PORT_NUMBERthentaskkill /PID <PID> /F
- Ensure the Gradio API started successfully (check the terminal window titled "Gradio API")
- Wait for the message "Running on local URL: http://127.0.0.1:8001" before trying to generate
- The first generation takes longer as PyTorch compiles kernels
REM Remove all installed dependencies (keeps models and code)
desinstalar.bat
REM Then relaunch β everything reinstalls automatically
iniciar_todo.batThis project is distributed under the MIT License.
Este proyecto se distribuye bajo la Licencia MIT.
- The original ACE-Step UI code remains Β© fspecii under MIT.
- The ACE-Step 1.5 model and backend remain Β© ACE-Step Team under MIT.
- New additions and modifications in this fork are Β© ElWalki under MIT.
See ACE-Step-1.5_/LICENSE for the full license text.
Built on top of the amazing work of fspecii and the ACE-Step team.
i18n system & translations contributed by scruffynerf.
Stop paying subscriptions. Start creating with ACE-Step.
