Discover and explore top open-source AI tools and projects—updated daily.
tencent-ailabAI framework for high-fidelity song generation
Top 41.2% on SourcePulse
LeVo SongGeneration is an LM-based framework for high-quality AI music creation, targeting researchers and developers. It generates music with vocals and accompaniment, offering competitive performance against industry-standard systems and improving upon existing open-source solutions.
How It Works
The system employs a LeLM paired with a music codec. LeLM models tokens representing either combined vocals/accompaniment (mixed tokens) for harmony or separate vocals and accompaniment (dual-track tokens) for detailed control. A music codec then reconstructs these dual-track tokens into high-fidelity audio, enabling nuanced song generation.
Quick Start & Requirements
Primary installation involves pip install -r requirements.txt and potentially requirements_nodeps.txt, followed by Flash Attention installation from source. Docker images are also available (docker pull juhayna/song-generation-levo:hf0613). Prerequisites include Python >= 3.8.12 and CUDA >= 11.8; Flash Attention is recommended. Minimum GPU memory is 10GB (16GB with audio prompts). Relevant links include ComfyUI integrations at https://github.com/smthemex/ComfyUI_SongGeneration, a Windows installer video at http://bilibili.com/video/BV1ATK8zQE8L/?vd_source=22cfc54298226c4161b1aff457d17585, and ComfyUI on CNB at https://cnb.cool/tencent/tencent-ailab/examples/SongGeneration-comfyui.
Highlighted Details
Maintenance & Community
The repository indicates ongoing development with several items on its TODO list. Specific details on community channels, active contributors, or a public roadmap are not present in the README.
Licensing & Compatibility
The code and weights are released under a license specified in the repository's LICENSE file. The exact license type and its implications for commercial use or closed-source integration are not detailed in the provided README text.
Limitations & Caveats
Several model versions and features, including English-enhanced models and finetuning scripts, are marked as "Coming soon" or are pending updates. Users may need to manually install Flash Attention or disable it if unsupported (--not_use_flash_attn). Conflicting prompt audio and descriptive text inputs can degrade generation quality.
1 week ago
Inactive
AI-Guru
intel
microsoft
openai
facebookresearch