ACE-Step-1.5 by ace-step

Advanced open-source music generation model

Created 7 months ago

8,392 stars

Top 6.1% on SourcePulse

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> ACE-Step 1.5 is an open-source music generation model designed to deliver commercial-grade audio quality on consumer hardware. It targets music artists, producers, and content creators, offering a fast, efficient, and locally runnable solution that significantly enhances creative workflows. The model provides advanced control and personalization capabilities, democratizing high-fidelity music synthesis.

How It Works

The project employs a novel hybrid architecture where a Language Model (LM) acts as an omni-capable planner. This LM transforms user queries into detailed song blueprints, guiding a Diffusion Transformer (DiT) through Chain-of-Thought synthesis of metadata and lyrics. A key innovation is its alignment mechanism, which uses intrinsic reinforcement learning based on internal model states, bypassing biases from external reward models or human preferences. This approach enables precise stylistic control and versatile editing.

Quick Start & Requirements

Primary Install/Run:
- Windows Portable Package (Recommended): Download and extract ACE-Step-1.5.7z. Launch the Gradio Web UI via start_gradio_ui.bat or the REST API Server via start_api_server.bat.
- Standard Installation: Install the uv package manager (via curl/PowerShell script). Clone the repository (git clone https://github.com/ACE-Step/ACE-Step-1.5.git), navigate into the directory, and run uv sync. Launch via uv run acestep (Gradio UI) or uv run acestep-api (REST API).
Prerequisites: Python 3.11. CUDA GPU is recommended for performance; CPU/MPS are supported but slower. The portable package specifies CUDA 12.8.
Resource Footprint: Runs locally with <4GB VRAM. LoRA training needs ~12GB VRAM (1hr/8 songs on 3090).
Documentation/Demos: Links to Hugging Face, ModelScope, Space Demo, Discord, and Technical Report are available.

Highlighted Details

Performance: Achieves ultra-fast generation (under 10 seconds on an RTX 3090) and runs locally with <4GB VRAM.
Quality: Delivers commercial-grade output, outperforming many commercial alternatives.
Versatility: Supports flexible durations (10s to 10min), multi-language prompts (50+), and advanced editing features like cover generation, repainting, and vocal-to-BGM conversion.
Personalization: Enables lightweight LoRA training from just a few songs to capture user-specific styles.

Maintenance & Community

The project is co-led by ACE Studio and StepFun. A Discord server is available for community interaction.

Licensing & Compatibility

ACE-Step 1.5 is released under the MIT license, permitting broad use, including commercial applications and integration into closed-source projects without significant restrictions.

Limitations & Caveats

While functional on CPU/MPS, performance is significantly reduced. Intel GPU support is experimental, with potential speed limitations for longer audio and lack of specific acceleration features. The project also warns against fake domains, directing users exclusively to its official GitHub Pages site.

Health Check

Last Commit

13 hours ago

Responsiveness

Inactive

Pull Requests (30d)

134

Issues (30d)

140

Star History

1,609 stars in the last 30 days