Skip to content

ElWalki/ProdIA_Max-Ace-Step-UI_Ace-Step-v1.5

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

68 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ›οΈ ProdIA-MAX

Enhanced fork of ACE-Step UI β€” AI Music Production Suite for Windows
Fork mejorado de ACE-Step UI β€” Suite de producciΓ³n musical con IA para Windows

Version MIT License Windows ACE-Step 1.5

ProdIA-MAX Screenshot


πŸ™ Credits / CrΓ©ditos

ProdIA-MAX is a fork and extension of ACE-Step UI by fspecii.
All original UI code, architecture, and design belong to their respective authors.
This project adds Windows-specific tooling and production-focused enhancements on top of their work.

ProdIA-MAX es un fork y extensiΓ³n de ACE-Step UI creado por fspecii.
Todo el cΓ³digo UI original, arquitectura y diseΓ±o pertenecen a sus respectivos autores.
Este proyecto aΓ±ade herramientas optimizadas para Windows y mejoras orientadas a producciΓ³n musical.

Component Author License Link
ACE-Step UI (base UI) fspecii MIT github.com/fspecii/ace-step-ui
ACE-Step 1.5 (AI model) ACE-Step Team MIT github.com/ace-step/ACE-Step-1.5
ProdIA-MAX (this fork) ElWalki MIT β€”
i18n system & translations scruffynerf MIT PR #1

πŸš€ What is ProdIA-MAX? / ΒΏQuΓ© es ProdIA-MAX?

EN: ProdIA-MAX is a Windows-optimized fork of ACE-Step UI that bundles extra production tools: BPM/key detection, automatic lyric transcription, vocal/instrumental separation, LoRA training preparation, and one-click launchers β€” all wired to the ACE-Step 1.5 AI music generation engine.

ES: ProdIA-MAX es un fork de ACE-Step UI optimizado para Windows que incluye herramientas extra de producciΓ³n: detecciΓ³n de BPM y tonalidad, transcripciΓ³n automΓ‘tica de letras, separaciΓ³n vocal/instrumental, preparaciΓ³n para entrenamiento LoRA y lanzadores con un solo clic β€” todo integrado con el motor de generaciΓ³n musical IA ACE-Step 1.5.


✨ MAX Additions / Añadidos MAX

🎡 Generation & Audio

Feature Status Description
Vocal Separation (Demucs) βœ… Beta Separate vocals/instrumental from any song
Vocal Reference Tab βœ… Dedicated vocal reference panel
Independent Audio Strengths βœ… Separate reference + source strength sliders
Audio Codes System βœ… Full semantic audio code pipeline β€” extract, apply & condition generation
Mic Recorder + Audio Codes βœ… Record voice β†’ auto-extract Audio Codes + optional Whisper transcription
Whisper Model Selector βœ… Choose Whisper model (tinyβ†’turbo) with download status indicators
Process + Whisper / Solo Procesar βœ… Two-button workflow: extract codes with or without lyric transcription
Chord Progression Editor βœ… Interactive chord builder with scale-aware suggestions and audio preview
1/4 Time Signature βœ… Added 1/4 time signature option to all time signature selectors

πŸ€– AI Assistant & UX

Feature Status Description
Chat Assistant βœ… Built-in AI assistant (OpenRouter/local LLM) for music production guidance
LLM Provider Selector βœ… Switch between OpenRouter models or local LLM endpoints
Language Selector βœ… Full i18n β€” switch UI language (ES/EN/+more)
Sidebar Info Panel βœ… Click info icons β†’ scrollable info panel in sidebar (no clipping)
Professional Audio Player βœ… Enhanced player with playback speed, shuffle, repeat modes
Song Cards & Drag-Drop βœ… Visual song cards with drag-and-drop to playlists
Resizable Chat Panel βœ… Drag to resize chat assistant panel
Style Editor βœ… Edit and manage generation style presets

πŸ”§ Production Tools

Feature Status Description
Audio Metadata Tagging βœ… ID3 tags (title, artist, BPM, key) in MP3s
Edit Metadata βœ… Edit BPM, key, time signature, title in-app
LoRA Quick Unload βœ… One-click unload on collapsed LoRA panel
Time Signature Labels βœ… Proper 1/4, 2/4, 3/4, 4/4, 5/4, 6/8, 7/8, 8/8 notation
Prepare for Training βœ… Beta Quick button to prep songs for LoRA training
VRAM Safety βœ… Generation locked during Demucs separation
BPM & Key Detection βœ… detectar_bpm_clave.py β€” batch detect BPM/key
Lyric Transcription βœ… transcribir_letras.py β€” Whisper-based transcription
Caption Tools βœ… Apply/truncate training captions automatically
One-click Windows launchers βœ… .bat scripts for setup, launch, and cleanup

πŸ† Best Quality Settings / Mejores Ajustes de Calidad

IMPORTANT / IMPORTANTE: Audio generation quality is dramatically better using the base model at 124 inference steps compared to turbo mode (8 steps). While turbo is faster (~3-5s per song), the base model at 124 steps produces significantly richer, more coherent, and higher-fidelity audio.

La calidad de generación de audio es MUCHÍSIMO mejor usando el modelo base a 124 pasos en comparación con el modo turbo (8 pasos). Aunque turbo es mÑs rÑpido (~3-5s por canción), el modelo base a 124 pasos produce audio significativamente mÑs rico, mÑs coherente y de mayor fidelidad.

Setting Turbo (fast) Base (recommended)
Model turbo base
Inference Steps 8 124
Speed (30s song) ~3-5s ~45-90s
Quality Good Excellent ⭐
Coherence Decent Very high
Recommended for Quick previews Final production

πŸ“‹ Requirements / Requisitos

Requirement Minimum Recommended
OS Windows 10 / Linux / macOS Windows 11
GPU VRAM 4 GB 12 GB+
Python 3.10 3.11
Node.js 18 20+
CUDA β€” 12.8
Disk Space ~15 GB (required models) ~40 GB (all models)

πŸ”§ Installation / InstalaciΓ³n

What you MUST install manually / Lo que DEBES instalar manualmente

You only need 3 things installed before running ProdIA-MAX. Everything else is automatic.
Solo necesitas 3 cosas instaladas antes de ejecutar ProdIA-MAX. Todo lo demΓ‘s es automΓ‘tico.

1. Python 3.11

EN: Download from python.org. During installation, check "Add Python to PATH" β€” this is critical.
ES: Descarga desde python.org. Durante la instalaciΓ³n, marca "Add Python to PATH" β€” esto es esencial.

⚠️ Python 3.11 is strongly recommended. The pyproject.toml locks to ==3.11.* and some dependencies (like flash-attn) only have pre-built wheels for 3.11.

2. Node.js 18+

EN: Download from nodejs.org (LTS recommended). This powers the backend server and both frontends.
ES: Descarga desde nodejs.org (recomendado LTS). Esto ejecuta el servidor backend y ambas interfaces.

3. NVIDIA GPU + CUDA Drivers

EN: You need an NVIDIA GPU with updated drivers. CUDA 12.8 compatible drivers are required β€” PyTorch will be installed with CUDA 12.8 support automatically. Without an NVIDIA GPU, the app cannot generate music.
ES: Necesitas una GPU NVIDIA con drivers actualizados. Se requieren drivers compatibles con CUDA 12.8 β€” PyTorch se instalarΓ‘ con soporte CUDA 12.8 automΓ‘ticamente. Sin GPU NVIDIA, la app no puede generar mΓΊsica.

πŸ’‘ Git is only needed if you clone the repo. If you download the ZIP, Git is not required.
πŸ’‘ Git solo es necesario si clonas el repo. Si descargas el ZIP, no necesitas Git.

Verify installation / Verificar instalaciΓ³n

python --version    REM Should show 3.11.x
node --version       REM Should show v18+ or v20+
nvidia-smi           REM Should show your GPU and driver version

⚑ Quick Start / Inicio rÑpido

Windows

REM Option 1: Launch everything (Classic UI + Pro UI + AI engine)
iniciar_todo.bat

REM Option 2: Launch Pro UI only (recommended)
iniciar_pro.bat

Linux / macOS

chmod +x iniciar_todo.sh
./iniciar_todo.sh

What opens / QuΓ© se abre

Launcher URL Description
iniciar_todo.bat http://localhost:3000 Classic UI (+ Pro UI on :3002)
iniciar_pro.bat http://localhost:3002 Pro UI only (recommended)

πŸš€ First Run β€” What Happens Automatically / Primera ejecuciΓ³n β€” QuΓ© pasa automΓ‘ticamente

EN: The first time you run iniciar_todo.bat or iniciar_pro.bat, the script will automatically:

ES: La primera vez que ejecutes iniciar_todo.bat o iniciar_pro.bat, el script automΓ‘ticamente:

Step What it does Time (est.)
1️⃣ Creates Python virtual environment (.venv inside ACE-Step-1.5_/) ~10 sec
2️⃣ Installs Python dependencies via pip install -r requirements.txt β€” PyTorch (CUDA 12.8), Transformers, Gradio, etc. ~5-15 min
3️⃣ Installs Node.js dependencies via npm install in 3 folders (ace-step-ui, ace-step-ui/server, ace-step-ui-pro) ~2-5 min
4️⃣ Downloads AI models from HuggingFace (~9.6 GB for required models) ~10-30 min
5️⃣ Starts all services and opens your browser ~30 sec

⏱️ Total first run time: ~20-50 minutes depending on your internet speed. Subsequent runs start in ~30 seconds.
⏱️ Tiempo total primera ejecución: ~20-50 minutos dependiendo de tu velocidad de internet. Las siguientes ejecuciones arrancan en ~30 segundos.

A marker file .deps_installed is created after successful installation β€” subsequent runs skip the install step unless requirements.txt changes.


🧠 AI Models / Modelos de IA

All models are downloaded from HuggingFace automatically on first run. They are stored in ACE-Step-1.5_/checkpoints/.

Required models (auto-downloaded) / Modelos requeridos (descarga automΓ‘tica)

Model Size VRAM Purpose / PropΓ³sito
acestep-v15-turbo ~4.5 GB ~4 GB Default generation model (diffusion transformer)
vae ~500 MB ~1 GB Audio encoder/decoder
Qwen3-Embedding-0.6B ~1.2 GB ~1 GB Text encoder for lyrics/tags
acestep-5Hz-lm-1.7B ~3.4 GB ~4 GB Language model for music structure
Total ~9.6 GB

Optional models (manual download) / Modelos opcionales (descarga manual)

Use verificar_modelos.bat to check which models you have and download extras:
Usa verificar_modelos.bat para ver quΓ© modelos tienes y descargar extras:

Model Size VRAM Purpose / PropΓ³sito
acestep-v15-base ~4.5 GB ~4 GB Base model β€” 124 steps, highest quality ⭐
acestep-v15-sft ~4.5 GB ~4 GB Fine-tuned variant
acestep-v15-turbo-shift1 ~4.5 GB ~4 GB Turbo variant 1
acestep-v15-turbo-shift3 ~4.5 GB ~4 GB Turbo variant 3
acestep-v15-turbo-continuous ~4.5 GB ~4 GB Continuous generation mode
acestep-5Hz-lm-0.6B ~1.2 GB ~3 GB Small LM (less VRAM)
acestep-5Hz-lm-4B ~7.8 GB ~12 GB Large LM (best quality lyrics)

πŸ’‘ VRAM recommendation: 8 GB VRAM for turbo mode, 12 GB+ for base model at 124 steps with the 4B language model.
πŸ’‘ RecomendaciΓ³n VRAM: 8 GB VRAM para modo turbo, 12 GB+ para modelo base a 124 pasos con el modelo de lenguaje 4B.

All models come from HuggingFace repos under ACE-Step/ β€” no account or token needed.
Todos los modelos vienen de repos HuggingFace bajo ACE-Step/ β€” no se necesita cuenta ni token.


πŸ“ Project Structure / Estructura del proyecto

ProdIA-MAX/
β”‚
β”œβ”€β”€ ACE-Step-1.5_/              # ACE-Step 1.5 AI engine (original, MIT)
β”‚   β”œβ”€β”€ acestep/                #   Core Python inference code
β”‚   β”œβ”€β”€ requirements.txt        #   β˜… Python dependencies (PyTorch, Gradio, etc.)
β”‚   └── ...
β”‚
β”œβ”€β”€ ace-step-ui/                # Base UI fork from fspecii (original, MIT)
β”‚   β”œβ”€β”€ server/                 #   Express + SQLite backend (Node.js)
β”‚   β”œβ”€β”€ package.json            #   Node.js dependencies (backend + frontend)
β”‚   └── ...
β”‚
β”œβ”€β”€ ace-step-ui-pro/            # ProdIA-MAX Pro UI (React 19 + Vite + Tailwind v4)
β”‚   β”œβ”€β”€ src/                    #   TypeScript source code
β”‚   β”‚   β”œβ”€β”€ components/         #     React components (views, create, ui, layout)
β”‚   β”‚   β”œβ”€β”€ services/           #     API clients, chord service, etc.
β”‚   β”‚   β”œβ”€β”€ i18n.ts             #     Internationalization (EN/ES)
β”‚   β”‚   └── App.tsx             #     Main application entry
β”‚   β”œβ”€β”€ package.json            #   Node.js dependencies (frontend)
β”‚   └── ...
β”‚
β”œβ”€β”€ i18n/                       # i18n utilities & locale files
β”‚   β”œβ”€β”€ locales/                #   Translation JSON files
β”‚   └── utils.py                #   i18n helper scripts
β”‚
β”œβ”€β”€ IMAGES/                     # Screenshots & product images
β”‚   └── main.png                #   Main product screenshot
β”‚
β”œβ”€β”€ SecurityScan/               # Security tools
β”‚   β”œβ”€β”€ escanear_seguridad.bat  #   Run security scan (Windows)
β”‚   β”œβ”€β”€ limpiar_kms.bat         #   KMS cleanup
β”‚   └── LEEME.md                #   Security scan documentation
β”‚
β”œβ”€β”€ backups/                    # Session backups
β”‚
│── β˜… MAX Production Tools ──────────────────────────────
β”œβ”€β”€ aplicar_captions_v3.py      # Auto-apply training captions
β”œβ”€β”€ detectar_bpm_clave.py       # BPM & key detection (librosa)
β”œβ”€β”€ transcribir_letras.py       # Lyric transcription (Whisper)
β”œβ”€β”€ transcribir_letras_v2.py    # Lyric transcription v2
β”œβ”€β”€ truncar_captions.py         # Caption truncation helper
β”œβ”€β”€ check_genres.py             # Genre validator
β”œβ”€β”€ check_tensors.py            # Tensor/model inspector
β”‚
│── β˜… Launchers (Windows) ───────────────────────────────
β”œβ”€β”€ iniciar_todo.bat            # One-click launcher (backend + frontend + AI)
β”œβ”€β”€ iniciar_pro.bat             # Launch Pro UI only
β”œβ”€β”€ verificar_modelos.bat       # Model verification
β”œβ”€β”€ limpiar_datos_usuario.bat   # Clean user data
β”œβ”€β”€ desinstalar.bat             # Uninstall / cleanup
β”œβ”€β”€ transcribir_letras.bat      # Transcribe lyrics launcher
β”‚
│── β˜… Launchers (Linux/macOS) ───────────────────────────
β”œβ”€β”€ setup.sh                    # First-time setup
β”œβ”€β”€ iniciar_todo.sh             # One-click launcher
β”œβ”€β”€ iniciar_pro.sh              # Launch Pro UI only
β”œβ”€β”€ iniciar_acestep_ui.sh       # Launch base UI
β”œβ”€β”€ verificar_modelos.sh        # Model verification
β”œβ”€β”€ limpiar_datos_usuario.sh    # Clean user data
β”œβ”€β”€ desinstalar.sh              # Uninstall / cleanup
β”œβ”€β”€ detectar_bpm_clave.sh       # BPM & key detection
β”œβ”€β”€ transcribir_letras.sh       # Transcribe lyrics
β”‚
│── β˜… Documentation ─────────────────────────────────────
β”œβ”€β”€ README.md                   # This file
β”œβ”€β”€ SINGER_LIBRARY_SPEC.md      # Singer library specification
β”œβ”€β”€ i18nHowTo.md                # i18n guide (English)
└── i18nHowTo_es.md             # i18n guide (Spanish)

🎼 Audio Codes System / Sistema de Códigos de Audio

ACE-Step uses semantic audio codes β€” tokens at 5Hz that encode melody, rhythm, structure, and timbre holistically. Each code (<|audio_code_N|>, N=0–63999) represents 200ms of audio.

How to use:

  1. Load audio in Source Audio β†’ click "Convert to Codes" β†’ codes appear in the Audio Codes field
  2. Or use the Voice Recorder β†’ "Process + Whisper" extracts codes automatically
  3. Write a different text prompt (e.g., change genre/style)
  4. Adjust audio_cover_strength (0.3–0.5 = new style with original structure)
  5. Generate β†’ the model follows the code structure but with your new style
Property Value
Rate 5 codes/second (200ms each)
Codebook size 64,000 (FSQ: [8,8,8,5,5,5])
30s song ~150 codes
60s song ~300 codes
Quantizer Finite Scalar Quantization (1 quantizer, 6D)

Note: Individual codes encode ALL musical properties simultaneously (melody + rhythm + timbre). You cannot isolate rhythm from melody at the code level, but you can use codes as structural guidance while the text prompt controls style/instrumentation.


πŸŽ™οΈ Voice Recorder Workflow / Flujo del Grabador de Voz

  1. Record your voice/humming/beatbox
  2. Choose a Whisper model (tinyβ†’turbo) β€” download status shown with βœ“/↓ indicators
  3. Click "Procesar + Whisper" β†’ extracts Audio Codes AND transcribes lyrics
  4. Or click "Solo Procesar" β†’ extracts Audio Codes only (faster, no transcription)
  5. Click "Aplicar" β†’ codes + lyrics are applied to the Create Panel
  6. The generation will use your voice structure as a musical guide

�️ Scripts Reference / Referencia de Scripts

Main Launchers / Lanzadores principales

Script What it does / QuΓ© hace
iniciar_todo.bat Full launch: installs deps if needed β†’ starts AI engine (port 8001) β†’ backend (port 3001) β†’ Classic UI (port 3000) β†’ Pro UI (port 3002) β†’ opens browser
iniciar_pro.bat Pro launch: same as above but skips Classic UI β€” only AI engine + backend + Pro UI
iniciar_todo.sh Linux/macOS equivalent of iniciar_todo.bat

Utility Scripts / Scripts de utilidad

Script What it does / QuΓ© hace
verificar_modelos.bat Checks which AI models are installed and offers to download missing ones from HuggingFace
transcribir_letras.bat Launches the lyrics transcription tool (uses Demucs for stem separation + Whisper for transcription). 6 quality modes available
limpiar_datos_usuario.bat Deletes user data only (database + generated audio). Keeps models, code, and dependencies
desinstalar.bat Full uninstall β€” removes .venv, all node_modules, database, and generated audio. Keeps models and source code

Internal Scripts (ACE-Step-1.5_/) / Scripts internos

Script What it does / QuΓ© hace
_start_gradio_api.bat Starts only the Gradio AI engine on port 8001
_start_backend.bat Starts only the Node.js backend on port 3001
_start_frontend.bat Starts only the Classic UI on port 3000
iniciar_acestep.bat Interactive menu: standalone Gradio UI (port 7860), API mode, model download, advanced config

Production Tools / Herramientas de producciΓ³n

Script What it does / QuΓ© hace
detectar_bpm_clave.py Batch detect BPM and musical key of audio files
transcribir_letras.py Transcribe lyrics from audio using Whisper
aplicar_captions_v3.py Apply training captions to audio files for LoRA preparation
truncar_captions.py Truncate captions to fit training requirements
check_genres.py Validate genre tags in audio files
check_tensors.py Inspect model tensor files

πŸ”Œ Ports / Puertos

Port Service Description
8001 Gradio API ACE-Step AI music generation engine
3001 Express.js Backend Auth, model management, audio serving, database
3000 Classic Frontend Legacy React UI (Vite dev server)
3002 Pro Frontend ProdIA-MAX Pro UI (Vite + React 19 + Tailwind v4)
7860 Gradio Native UI Only used with iniciar_acestep.bat standalone mode

All ports are killed automatically before launch to avoid conflicts. The scripts handle this for you.
Todos los puertos se liberan automΓ‘ticamente antes del lanzamiento para evitar conflictos.


❓ Troubleshooting / SoluciΓ³n de problemas

"Python not found" / "No se encuentra Python"

  • Make sure Python 3.11 is installed and added to PATH
  • Run python --version in a terminal to verify
  • If you have multiple Python versions, ensure 3.11 is the default

"Node.js not found" / "No se encuentra Node.js"

  • Install Node.js from nodejs.org
  • Restart your terminal after installing
  • Run node --version to verify

"CUDA out of memory" / "Sin memoria CUDA"

  • Use a smaller language model (0.6B instead of 1.7B or 4B)
  • Use turbo mode instead of base model
  • Close other GPU-intensive applications
  • Check your VRAM with nvidia-smi

Models not downloading / Los modelos no se descargan

  • Check your internet connection
  • Run verificar_modelos.bat to manually download models
  • Models are downloaded from HuggingFace β€” no VPN restrictions needed
  • If download was interrupted, delete the incomplete folder in ACE-Step-1.5_/checkpoints/ and retry

Port already in use / Puerto en uso

  • The launcher scripts auto-kill processes on required ports
  • If you still get errors, manually close any process using ports 8001, 3001, 3000, or 3002
  • On Windows: netstat -ano | findstr :PORT_NUMBER then taskkill /PID <PID> /F

Audio doesn't generate / No se genera audio

Fresh start / Empezar de cero

REM Remove all installed dependencies (keeps models and code)
desinstalar.bat

REM Then relaunch β€” everything reinstalls automatically
iniciar_todo.bat

οΏ½πŸ“„ License / Licencia

This project is distributed under the MIT License.
Este proyecto se distribuye bajo la Licencia MIT.

  • The original ACE-Step UI code remains Β© fspecii under MIT.
  • The ACE-Step 1.5 model and backend remain Β© ACE-Step Team under MIT.
  • New additions and modifications in this fork are Β© ElWalki under MIT.

See ACE-Step-1.5_/LICENSE for the full license text.


Built on top of the amazing work of fspecii and the ACE-Step team.
i18n system & translations contributed by scruffynerf.
Stop paying subscriptions. Start creating with ACE-Step.

About

ProdIA-MAX is an advanced fork of fspecii/ace-step-ui paired with ACE-Step v1.5. Adds AI Music Chat Assistant, visual Chord Progression Editor, floating LoRA Manager, Demucs vocal separation, automatic ID3 tagging, full i18n support (EN/ES/ZH/JA/KO), extended presets, section-by-section generation, and multiple backend reliability fixes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages