MixtralKit  by open-compass

Toolkit for inference/evaluation of Mistral AI's 'mixtral-8x7b-32kseqlen'

Created 1 year ago
769 stars

Top 45.4% on SourcePulse

GitHubView on GitHub
Project Summary

MixtralKit provides a toolkit for efficient inference and evaluation of the Mixtral-8x7B-32Kseqlen model. It is designed for researchers and developers working with large language models, offering a streamlined way to deploy and benchmark this specific Mixture-of-Experts (MoE) architecture.

How It Works

MixtralKit leverages a Mixture-of-Experts (MoE) architecture, where the Feed-Forward Network (FFN) layer in standard transformer blocks is replaced by an MoE FFN. This MoE FFN uses a gating layer to select the top-k out of 8 experts for each token, enabling sparse activation and potentially more efficient computation. The model utilizes RMSNorm, similar to LLaMA, and features specific QKV matrix shapes for its attention layers.

Quick Start & Requirements

  • Install: Use conda create --name mixtralkit python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y, activate the environment, clone the repository, and run pip install -r requirements.txt followed by pip install -e ..
  • Prerequisites: Python 3.10, PyTorch with CUDA support. Requires downloading model checkpoints (available via Hugging Face or magnet link).
  • Setup: Link to checkpoints folder using ln -s path/to/checkpoints_folder/ ckpts.
  • Inference: Run python tools/example.py -m ./ckpts -t ckpts/tokenizer.model --num-gpus 2.
  • Evaluation: Requires cloning and installing OpenCompass, downloading datasets, and linking model weights and MixtralKit's playground scripts within the OpenCompass directory.

Highlighted Details

  • Provides performance benchmarks against other leading LLMs on various datasets.
  • Supports inference using vLLM for efficient deployment.
  • Includes fine-tuning scripts (Full-parameters or QLoRA) via XTuner.
  • Offers detailed information on MoE architecture and related research papers.

Maintenance & Community

The project is associated with the OpenCompass initiative. Links to relevant resources like MoE blogs and papers are provided.

Licensing & Compatibility

The repository is licensed under Apache 2.0, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

This is described as an experimental implementation. The README focuses on Mixtral-8x7B-32Kseqlen, and compatibility with other Mixtral variants or models is not explicitly stated. Evaluation setup requires a separate installation of OpenCompass.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Casper Hansen Casper Hansen(Author of AutoAWQ), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
5 more.

xtuner by InternLM

0.5%
5k
LLM fine-tuning toolkit for research
Created 2 years ago
Updated 1 day ago
Feedback? Help us improve.