SongGeneration by tencent-ailab

AI framework for high-fidelity song generation

Created 8 months ago

1,355 stars

Top 29.3% on SourcePulse

Project Summary

LeVo SongGeneration is an LM-based framework for high-quality AI music creation, targeting researchers and developers. It generates music with vocals and accompaniment, offering competitive performance against industry-standard systems and improving upon existing open-source solutions.

How It Works

The system employs a LeLM paired with a music codec. LeLM models tokens representing either combined vocals/accompaniment (mixed tokens) for harmony or separate vocals and accompaniment (dual-track tokens) for detailed control. A music codec then reconstructs these dual-track tokens into high-fidelity audio, enabling nuanced song generation.

Quick Start & Requirements

Primary installation involves pip install -r requirements.txt and potentially requirements_nodeps.txt, followed by Flash Attention installation from source. Docker images are also available (docker pull juhayna/song-generation-levo:hf0613). Prerequisites include Python >= 3.8.12 and CUDA >= 11.8; Flash Attention is recommended. Minimum GPU memory is 10GB (16GB with audio prompts). Relevant links include ComfyUI integrations at https://github.com/smthemex/ComfyUI_SongGeneration, a Windows installer video at http://bilibili.com/video/BV1ATK8zQE8L/?vd_source=22cfc54298226c4161b1aff457d17585, and ComfyUI on CNB at https://cnb.cool/tencent/tencent-ailab/examples/SongGeneration-comfyui.

Highlighted Details

Low Memory Footprint: Operates with as little as 10GB of GPU memory.
Flexible Output: Generates pure music, pure vocals, or separate vocal/accompaniment tracks.
Advanced Data Pipeline: Includes a data processing pipeline for song structure/lyric analysis, outperforming SOTA models on benchmarks like SSLD-200.
Multi-Modal Control: Supports generation guided by descriptive text attributes (genre, emotion, etc.) and/or reference audio prompts.

Maintenance & Community

The repository indicates ongoing development with several items on its TODO list. Specific details on community channels, active contributors, or a public roadmap are not present in the README.

Licensing & Compatibility

The code and weights are released under a license specified in the repository's LICENSE file. The exact license type and its implications for commercial use or closed-source integration are not detailed in the provided README text.

Limitations & Caveats

Several model versions and features, including English-enhanced models and finetuning scripts, are marked as "Coming soon" or are pending updates. Users may need to manually install Flash Attention or disable it if unsupported (--not_use_flash_attn). Conflicting prompt audio and descriptive text inputs can degrade generation quality.

SongGeneration by tencent-ailab

Explore Similar Projects

UniAudio2 by yangdongchao

SongGen by LiuZH-19

mustango by AMAAI-Lab

music-generation-research by AI-Guru

genmusic_demo_list by affige

ace-step-ui by fspecii

openvino-plugins-ai-audacity by intel

FunMusic by FunAudioLLM

muzic by microsoft

YuE by multimodal-art-projection

jukebox by openai

audiocraft by facebookresearch