parameter-efficient-moe by Cohere-Labs-Community

Research code for parameter-efficient Mixture of Experts (MoE) instruction tuning

Created 2 years ago

270 stars

Top 95.2% on SourcePulse

Project Summary

This repository provides the official code for "Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning." It enables researchers and practitioners to implement and experiment with Mixture-of-Experts (MoE) models for instruction tuning, offering significant parameter efficiency gains.

How It Works

The codebase leverages T5X, Flaxformer, Flax, and Jax for its architecture and training loops. It implements parameter-efficient fine-tuning techniques like IA3, LoRA, and novel Mixture-of-Experts (MoE) variants, specifically Mixture-of-Volunteers (MoV) and Mixture-of-LoRA (MoLoRA). This approach allows for highly efficient adaptation of large language models with minimal parameter updates.

Quick Start & Requirements

Installation: Clone the repository and copy it to TPUs using gcloud alpha compute tpus tpus-vm scp.
Prerequisites: Requires Google Cloud TPUs, SeqIO for dataset caching (e.g., bigscience/P3 dataset), and specific versions of T5X, Flaxformer, Flax, and Jax.
Setup: Training and evaluation scripts are provided, requiring configuration via Gin files. Example commands for training and evaluation are included.
Documentation: References are made to bigscience/t-zero for dataset preparation.

Highlighted Details

Implements parameter-efficient fine-tuning methods: IA3, LoRA, MoV, and MoLoRA.
Built on the T5X, Flaxformer, Flax, and Jax ecosystem.
Supports instruction tuning for large language models.
Codebase includes Gin configurations for model architectures and training parameters.

Maintenance & Community

The project is associated with Cohere-Labs-Community and the authors of the cited paper. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The setup is heavily reliant on Google Cloud TPUs and specific infrastructure configurations, making it less accessible for users without this environment. The README does not detail specific performance benchmarks or provide direct links to community channels or roadmaps.

parameter-efficient-moe by Cohere-Labs-Community

Explore Similar Projects

Awesome-instruction-tuning by zhilizju

DeltaPapers by thunlp

Finetune_LLAMA by chaoyi-wu

awesome-llms-fine-tuning by Curated-Awesome-Lists

Hands-On-LLM-Fine-Tuning by youssefHosni

MFTCoder by codefuse-ai

MeZO by princeton-nlp

LLM-Tuning by beyondguo

Alpaca-CoT by PhoebusSi

mistral-finetune by mistralai

xtuner by InternLM

Chinese-Vicuna by Facico