HeartMuLa-Studio by fspecii

AI music studio for professional audio creation

Created 4 months ago

583 stars

Top 55.1% on SourcePulse

Project Summary

HeartMuLa Studio is a professional, Suno-like AI music generation platform designed for creators seeking advanced features like reference audio style transfer and LLM-powered lyric generation. It targets engineers, researchers, and power users, offering a powerful toolset for producing complete songs with vocals, instrumentals, and customizable styles, while optimizing for performance and VRAM usage.

How It Works

The studio leverages the HeartLib AI engine (MuQ, MuLan, HeartCodec) for core music generation, enabling full song creation up to four minutes, instrumental tracks, and style definition via tags. A key differentiator is its experimental reference audio style transfer, allowing users to upload any audio file to influence the generated music, with adjustable intensity and precise region selection via a waveform visualizer. AI-powered lyrics are generated using LLMs, supporting both local Ollama and cloud-based OpenRouter, with features for topic-based generation, style suggestions, and prompt enhancement. The architecture combines a React/TypeScript frontend with a FastAPI backend.

Quick Start & Requirements

Installation is streamlined via a ./start.sh script or a recommended Docker setup.

Script Install: Requires Python 3.10+, Node.js 18+, and a CUDA-enabled NVIDIA GPU with 10GB+ VRAM. Triton (pip install triton or triton-windows) is needed for torch.compile.
Docker Install: Requires Docker with NVIDIA Container Toolkit and an NVIDIA GPU with 10GB+ VRAM.
Resource Footprint: Initial setup involves downloading ~5GB of AI models. The Docker image is ~10GB. VRAM requirements range from ~3GB with 4-bit quantization to 10GB+ for optimal performance.
Links: The README serves as primary documentation. No direct demo URL is provided.

Highlighted Details

Performance Optimizations: Features 4-bit quantization (reducing VRAM from ~11GB to ~3GB), Flash Attention for compatible NVIDIA GPUs (SM 7.0+), and experimental torch.compile for up to 2x faster inference.
Reference Audio Style Transfer: Offers professional waveform visualization, draggable region selection for precise style sampling, and an adjustable influence slider.
LLM Integration: Seamlessly integrates with Ollama (local) and OpenRouter (cloud) for AI-driven lyric generation.
Multi-GPU Support: Automatically detects and configures multiple GPUs, assigning the main model to the fastest GPU and the audio codec to the GPU with the most VRAM.
Coming Soon: LoRA Voice Training is under development, with early tests claiming superior voice consistency compared to Suno.

Maintenance & Community

The project is actively developed by fspecii/HeartMuLa. No specific details regarding core maintainers, sponsorships, or dedicated community channels (like Discord/Slack) are provided in the README.

Licensing & Compatibility

The project is released under the permissive MIT License, which generally allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

HeartMuLa Studio is not supported on systems with less than 10GB of VRAM; systems with 10-14GB VRAM require model swapping, impacting generation speed. The reference audio style transfer feature is marked as experimental. Initial model downloads and torch.compile can lead to slower first-run performance. Flash Attention is disabled on older NVIDIA GPUs (SM 6.x and older) and AMD GPUs, with compatibility varying for the latter.

HeartMuLa-Studio by fspecii

Explore Similar Projects

SongGen by LiuZH-19

stable-audio-3 by Stability-AI

awesome-ai-voice by wildminder

TCSinger by AaronZ345

acestep.cpp by ServeurpersoCom

ultimate-rvc by JackismyShephard

alexandria-audiobook by Finrandojin

SongGeneration by tencent-ailab

ultravox by fixie-ai

higgs-audio by boson-ai

DiffSinger by MoonInTheRiver

whisper-vits-svc by PlayVoice