parameter-efficient-moe  by Cohere-Labs-Community

Research code for parameter-efficient Mixture of Experts (MoE) instruction tuning

Created 2 years ago
270 stars

Top 95.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official code for "Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning." It enables researchers and practitioners to implement and experiment with Mixture-of-Experts (MoE) models for instruction tuning, offering significant parameter efficiency gains.

How It Works

The codebase leverages T5X, Flaxformer, Flax, and Jax for its architecture and training loops. It implements parameter-efficient fine-tuning techniques like IA3, LoRA, and novel Mixture-of-Experts (MoE) variants, specifically Mixture-of-Volunteers (MoV) and Mixture-of-LoRA (MoLoRA). This approach allows for highly efficient adaptation of large language models with minimal parameter updates.

Quick Start & Requirements

  • Installation: Clone the repository and copy it to TPUs using gcloud alpha compute tpus tpus-vm scp.
  • Prerequisites: Requires Google Cloud TPUs, SeqIO for dataset caching (e.g., bigscience/P3 dataset), and specific versions of T5X, Flaxformer, Flax, and Jax.
  • Setup: Training and evaluation scripts are provided, requiring configuration via Gin files. Example commands for training and evaluation are included.
  • Documentation: References are made to bigscience/t-zero for dataset preparation.

Highlighted Details

  • Implements parameter-efficient fine-tuning methods: IA3, LoRA, MoV, and MoLoRA.
  • Built on the T5X, Flaxformer, Flax, and Jax ecosystem.
  • Supports instruction tuning for large language models.
  • Codebase includes Gin configurations for model architectures and training parameters.

Maintenance & Community

The project is associated with Cohere-Labs-Community and the authors of the cited paper. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The setup is heavily reliant on Google Cloud TPUs and specific infrastructure configurations, making it less accessible for users without this environment. The README does not detail specific performance benchmarks or provide direct links to community channels or roadmaps.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), and
3 more.

Alpaca-CoT by PhoebusSi

0.1%
3k
IFT platform for instruction collection, parameter-efficient methods, and LLMs
Created 2 years ago
Updated 1 year ago
Starred by Casper Hansen Casper Hansen(Author of AutoAWQ), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
5 more.

xtuner by InternLM

0.5%
5k
LLM fine-tuning toolkit for research
Created 2 years ago
Updated 1 day ago
Feedback? Help us improve.