parameter-efficient-moe  by Cohere-Labs-Community

Research code for parameter-efficient Mixture of Experts (MoE) instruction tuning

created 1 year ago
269 stars

Top 96.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official code for "Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning." It enables researchers and practitioners to implement and experiment with Mixture-of-Experts (MoE) models for instruction tuning, offering significant parameter efficiency gains.

How It Works

The codebase leverages T5X, Flaxformer, Flax, and Jax for its architecture and training loops. It implements parameter-efficient fine-tuning techniques like IA3, LoRA, and novel Mixture-of-Experts (MoE) variants, specifically Mixture-of-Volunteers (MoV) and Mixture-of-LoRA (MoLoRA). This approach allows for highly efficient adaptation of large language models with minimal parameter updates.

Quick Start & Requirements

  • Installation: Clone the repository and copy it to TPUs using gcloud alpha compute tpus tpus-vm scp.
  • Prerequisites: Requires Google Cloud TPUs, SeqIO for dataset caching (e.g., bigscience/P3 dataset), and specific versions of T5X, Flaxformer, Flax, and Jax.
  • Setup: Training and evaluation scripts are provided, requiring configuration via Gin files. Example commands for training and evaluation are included.
  • Documentation: References are made to bigscience/t-zero for dataset preparation.

Highlighted Details

  • Implements parameter-efficient fine-tuning methods: IA3, LoRA, MoV, and MoLoRA.
  • Built on the T5X, Flaxformer, Flax, and Jax ecosystem.
  • Supports instruction tuning for large language models.
  • Codebase includes Gin configurations for model architectures and training parameters.

Maintenance & Community

The project is associated with Cohere-Labs-Community and the authors of the cited paper. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The setup is heavily reliant on Google Cloud TPUs and specific infrastructure configurations, making it less accessible for users without this environment. The README does not detail specific performance benchmarks or provide direct links to community channels or roadmaps.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
13 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.