moshi-finetune by kyutai-labs

Fine-tune audio models with LoRA

Created 9 months ago

347 stars

Top 80.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Laurent Mazare

Cofounder of Kyutai

Project Summary

Moshi-Finetune offers a streamlined workflow for fine-tuning Moshi speech models using LoRA, enabling users to adapt pre-trained models to custom conversational datasets. It targets researchers and developers looking to create personalized voice assistants or specialized audio transcription tools. The primary benefit is efficient, lightweight model adaptation without requiring extensive computational resources for full fine-tuning.

How It Works

The project leverages LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning, significantly reducing the number of trainable parameters. It processes stereo audio, using the left channel for generated audio and the right for user input, with associated JSON files containing timestamped transcripts. Training is orchestrated via YAML configuration files, allowing customization of hyperparameters, dataset paths, and LoRA settings.

Quick Start & Requirements

Install dependencies using uv run pip install -e . or pip install -e . (Python 3.10+ recommended).
Clone the repository: git clone git@github.com:kyutai-labs/moshi-finetune.git
Dataset preparation involves stereo WAV files and a .jsonl index. A sample 14GB dataset can be downloaded via snapshot_download("kyutai/DailyTalkContiguous").
Training can be initiated with torchrun --nproc-per-node <N> -m train <config_file.yaml>.
Inference requires installing the moshi package and running python -m moshi.server.
Official Colab notebook available for interactive guidance.

Highlighted Details

Supports LoRA for efficient fine-tuning with configurable rank.
Includes an annotate.py script for generating JSON transcripts, with SLURM support for distributed annotation.
Provides performance benchmarks on H100 GPUs, showing high token throughput.
Integrates with Weights & Biases (W&B) for experiment monitoring and visualization.

Maintenance & Community

Project acknowledges code contributions from mistral-finetune (Apache License 2.0).
No specific community links (Discord/Slack) or roadmap details are provided in the README.

Licensing & Compatibility

The project itself does not explicitly state a license in the README.
It utilizes code from mistral-finetune, which is licensed under Apache License 2.0.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the license for the moshi-finetune repository itself, which could impact commercial adoption. There is also no mention of community support channels or a public roadmap.

Health Check

Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days