YuE  by multimodal-art-projection

Open-source tool for generating full songs from lyrics

created 6 months ago
5,267 stars

Top 9.7% on sourcepulse

GitHubView on GitHub
Project Summary

YuE is an open-source foundation model series for generating full-length songs from lyrics, offering both vocal and accompaniment tracks. It supports diverse genres, languages, and vocal techniques, aiming to democratize AI music creation for artists and researchers.

How It Works

YuE employs a multi-stage generation process. The core approach involves transforming lyrics into a complete song structure, including vocals and instrumental backing. It supports both standard "Chain-of-Thought" (CoT) generation and "In-Context Learning" (ICL) for style transfer or voice cloning by conditioning on reference audio. This dual approach allows for creative control and stylistic adherence.

Quick Start & Requirements

  • Install: Recommended to use conda for environment setup. Install dependencies via pip install -r requirements.txt. FlashAttention 2 is mandatory for memory efficiency.
  • Prerequisites: Python >= 3.8, CUDA >= 11.8, PyTorch with matching cudatoolkit, git-lfs.
  • Models: Download model weights from Hugging Face (e.g., m-a-p/YuE-s1-7B-anneal-en-cot).
  • Resources: Requires significant GPU memory; 24GB recommended for basic use, 80GB+ for extensive generation.
  • Demo/UI: Gradio interfaces (YuE-UI, YuE-exllamav2-UI, YuEGP) and a Windows installer (Pinokio) are available.
  • Docs: Prompt engineering guide available.

Highlighted Details

  • Supports incremental song generation and music continuation.
  • Offers dual-track ICL for advanced style transfer and voice cloning.
  • Optimized for lower VRAM GPUs (e.g., 8GB) via quantized models and community UIs.
  • Generates multi-minute songs with distinct vocal and accompaniment tracks.

Maintenance & Community

  • Active development with recent updates on incremental generation and ICL modes.
  • Community support via Discord.
  • Project co-led by HKUST and M-A-P, with support from industry partners.

Licensing & Compatibility

  • Licensed under Apache License 2.0.
  • Encourages commercial use and monetization of generated outputs with attribution to "YuE by HKUST/M-A-P".

Limitations & Caveats

The model requires substantial GPU resources for full-song generation. While community optimizations exist for lower VRAM, they may impact musicality. The "intro" label is noted as less stable.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
3
Star History
412 stars in the last 90 days

Explore Similar Projects

Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

ultravox by fixie-ai

0.4%
4k
Multimodal LLM for real-time voice interactions
created 1 year ago
updated 4 days ago
Feedback? Help us improve.