HeartMuLa-Studio  by fspecii

AI music studio for professional audio creation

Created 1 month ago
477 stars

Top 64.2% on SourcePulse

GitHubView on GitHub
Project Summary

HeartMuLa Studio is a professional, Suno-like AI music generation platform designed for creators seeking advanced features like reference audio style transfer and LLM-powered lyric generation. It targets engineers, researchers, and power users, offering a powerful toolset for producing complete songs with vocals, instrumentals, and customizable styles, while optimizing for performance and VRAM usage.

How It Works

The studio leverages the HeartLib AI engine (MuQ, MuLan, HeartCodec) for core music generation, enabling full song creation up to four minutes, instrumental tracks, and style definition via tags. A key differentiator is its experimental reference audio style transfer, allowing users to upload any audio file to influence the generated music, with adjustable intensity and precise region selection via a waveform visualizer. AI-powered lyrics are generated using LLMs, supporting both local Ollama and cloud-based OpenRouter, with features for topic-based generation, style suggestions, and prompt enhancement. The architecture combines a React/TypeScript frontend with a FastAPI backend.

Quick Start & Requirements

Installation is streamlined via a ./start.sh script or a recommended Docker setup.

  • Script Install: Requires Python 3.10+, Node.js 18+, and a CUDA-enabled NVIDIA GPU with 10GB+ VRAM. Triton (pip install triton or triton-windows) is needed for torch.compile.
  • Docker Install: Requires Docker with NVIDIA Container Toolkit and an NVIDIA GPU with 10GB+ VRAM.
  • Resource Footprint: Initial setup involves downloading ~5GB of AI models. The Docker image is ~10GB. VRAM requirements range from ~3GB with 4-bit quantization to 10GB+ for optimal performance.
  • Links: The README serves as primary documentation. No direct demo URL is provided.

Highlighted Details

  • Performance Optimizations: Features 4-bit quantization (reducing VRAM from ~11GB to ~3GB), Flash Attention for compatible NVIDIA GPUs (SM 7.0+), and experimental torch.compile for up to 2x faster inference.
  • Reference Audio Style Transfer: Offers professional waveform visualization, draggable region selection for precise style sampling, and an adjustable influence slider.
  • LLM Integration: Seamlessly integrates with Ollama (local) and OpenRouter (cloud) for AI-driven lyric generation.
  • Multi-GPU Support: Automatically detects and configures multiple GPUs, assigning the main model to the fastest GPU and the audio codec to the GPU with the most VRAM.
  • Coming Soon: LoRA Voice Training is under development, with early tests claiming superior voice consistency compared to Suno.

Maintenance & Community

The project is actively developed by fspecii/HeartMuLa. No specific details regarding core maintainers, sponsorships, or dedicated community channels (like Discord/Slack) are provided in the README.

Licensing & Compatibility

The project is released under the permissive MIT License, which generally allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

HeartMuLa Studio is not supported on systems with less than 10GB of VRAM; systems with 10-14GB VRAM require model swapping, impacting generation speed. The reference audio style transfer feature is marked as experimental. Initial model downloads and torch.compile can lead to slower first-run performance. Flash Attention is disabled on older NVIDIA GPUs (SM 6.x and older) and AMD GPUs, with compatibility varying for the latter.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
12
Star History
227 stars in the last 30 days

Explore Similar Projects

Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

ultravox by fixie-ai

0.1%
4k
Multimodal LLM for real-time voice interactions
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.