SongGeneration  by tencent-ailab

AI framework for high-fidelity song generation

Created 4 months ago
870 stars

Top 41.2% on SourcePulse

GitHubView on GitHub
Project Summary

LeVo SongGeneration is an LM-based framework for high-quality AI music creation, targeting researchers and developers. It generates music with vocals and accompaniment, offering competitive performance against industry-standard systems and improving upon existing open-source solutions.

How It Works

The system employs a LeLM paired with a music codec. LeLM models tokens representing either combined vocals/accompaniment (mixed tokens) for harmony or separate vocals and accompaniment (dual-track tokens) for detailed control. A music codec then reconstructs these dual-track tokens into high-fidelity audio, enabling nuanced song generation.

Quick Start & Requirements

Primary installation involves pip install -r requirements.txt and potentially requirements_nodeps.txt, followed by Flash Attention installation from source. Docker images are also available (docker pull juhayna/song-generation-levo:hf0613). Prerequisites include Python >= 3.8.12 and CUDA >= 11.8; Flash Attention is recommended. Minimum GPU memory is 10GB (16GB with audio prompts). Relevant links include ComfyUI integrations at https://github.com/smthemex/ComfyUI_SongGeneration, a Windows installer video at http://bilibili.com/video/BV1ATK8zQE8L/?vd_source=22cfc54298226c4161b1aff457d17585, and ComfyUI on CNB at https://cnb.cool/tencent/tencent-ailab/examples/SongGeneration-comfyui.

Highlighted Details

  • Low Memory Footprint: Operates with as little as 10GB of GPU memory.
  • Flexible Output: Generates pure music, pure vocals, or separate vocal/accompaniment tracks.
  • Advanced Data Pipeline: Includes a data processing pipeline for song structure/lyric analysis, outperforming SOTA models on benchmarks like SSLD-200.
  • Multi-Modal Control: Supports generation guided by descriptive text attributes (genre, emotion, etc.) and/or reference audio prompts.

Maintenance & Community

The repository indicates ongoing development with several items on its TODO list. Specific details on community channels, active contributors, or a public roadmap are not present in the README.

Licensing & Compatibility

The code and weights are released under a license specified in the repository's LICENSE file. The exact license type and its implications for commercial use or closed-source integration are not detailed in the provided README text.

Limitations & Caveats

Several model versions and features, including English-enhanced models and finetuning scripts, are marked as "Coming soon" or are pending updates. Users may need to manually install Flash Attention or disable it if unsupported (--not_use_flash_attn). Conflicting prompt audio and descriptive text inputs can degrade generation quality.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
14
Star History
114 stars in the last 30 days

Explore Similar Projects

Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

muzic by microsoft

0.1%
5k
AI research project for music understanding and generation
Created 4 years ago
Updated 1 year ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Dan Abramov Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App), and
11 more.

jukebox by openai

0.0%
8k
Generative model for music research paper
Created 5 years ago
Updated 1 year ago
Feedback? Help us improve.