moshi-finetune  by kyutai-labs

Fine-tune audio models with LoRA

created 4 months ago
268 stars

Top 96.5% on sourcepulse

GitHubView on GitHub
Project Summary

Moshi-Finetune offers a streamlined workflow for fine-tuning Moshi speech models using LoRA, enabling users to adapt pre-trained models to custom conversational datasets. It targets researchers and developers looking to create personalized voice assistants or specialized audio transcription tools. The primary benefit is efficient, lightweight model adaptation without requiring extensive computational resources for full fine-tuning.

How It Works

The project leverages LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning, significantly reducing the number of trainable parameters. It processes stereo audio, using the left channel for generated audio and the right for user input, with associated JSON files containing timestamped transcripts. Training is orchestrated via YAML configuration files, allowing customization of hyperparameters, dataset paths, and LoRA settings.

Quick Start & Requirements

  • Install dependencies using uv run pip install -e . or pip install -e . (Python 3.10+ recommended).
  • Clone the repository: git clone git@github.com:kyutai-labs/moshi-finetune.git
  • Dataset preparation involves stereo WAV files and a .jsonl index. A sample 14GB dataset can be downloaded via snapshot_download("kyutai/DailyTalkContiguous").
  • Training can be initiated with torchrun --nproc-per-node <N> -m train <config_file.yaml>.
  • Inference requires installing the moshi package and running python -m moshi.server.
  • Official Colab notebook available for interactive guidance.

Highlighted Details

  • Supports LoRA for efficient fine-tuning with configurable rank.
  • Includes an annotate.py script for generating JSON transcripts, with SLURM support for distributed annotation.
  • Provides performance benchmarks on H100 GPUs, showing high token throughput.
  • Integrates with Weights & Biases (W&B) for experiment monitoring and visualization.

Maintenance & Community

  • Project acknowledges code contributions from mistral-finetune (Apache License 2.0).
  • No specific community links (Discord/Slack) or roadmap details are provided in the README.

Licensing & Compatibility

  • The project itself does not explicitly state a license in the README.
  • It utilizes code from mistral-finetune, which is licensed under Apache License 2.0.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the license for the moshi-finetune repository itself, which could impact commercial adoption. There is also no mention of community support channels or a public roadmap.

Health Check
Last commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
62 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.