MixtralKit  by open-compass

Toolkit for inference/evaluation of Mistral AI's 'mixtral-8x7b-32kseqlen'

created 1 year ago
767 stars

Top 46.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

MixtralKit provides a toolkit for efficient inference and evaluation of the Mixtral-8x7B-32Kseqlen model. It is designed for researchers and developers working with large language models, offering a streamlined way to deploy and benchmark this specific Mixture-of-Experts (MoE) architecture.

How It Works

MixtralKit leverages a Mixture-of-Experts (MoE) architecture, where the Feed-Forward Network (FFN) layer in standard transformer blocks is replaced by an MoE FFN. This MoE FFN uses a gating layer to select the top-k out of 8 experts for each token, enabling sparse activation and potentially more efficient computation. The model utilizes RMSNorm, similar to LLaMA, and features specific QKV matrix shapes for its attention layers.

Quick Start & Requirements

  • Install: Use conda create --name mixtralkit python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y, activate the environment, clone the repository, and run pip install -r requirements.txt followed by pip install -e ..
  • Prerequisites: Python 3.10, PyTorch with CUDA support. Requires downloading model checkpoints (available via Hugging Face or magnet link).
  • Setup: Link to checkpoints folder using ln -s path/to/checkpoints_folder/ ckpts.
  • Inference: Run python tools/example.py -m ./ckpts -t ckpts/tokenizer.model --num-gpus 2.
  • Evaluation: Requires cloning and installing OpenCompass, downloading datasets, and linking model weights and MixtralKit's playground scripts within the OpenCompass directory.

Highlighted Details

  • Provides performance benchmarks against other leading LLMs on various datasets.
  • Supports inference using vLLM for efficient deployment.
  • Includes fine-tuning scripts (Full-parameters or QLoRA) via XTuner.
  • Offers detailed information on MoE architecture and related research papers.

Maintenance & Community

The project is associated with the OpenCompass initiative. Links to relevant resources like MoE blogs and papers are provided.

Licensing & Compatibility

The repository is licensed under Apache 2.0, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

This is described as an experimental implementation. The README focuses on Mixtral-8x7B-32Kseqlen, and compatibility with other Mixtral variants or models is not explicitly stated. Evaluation setup requires a separate installation of OpenCompass.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

ktransformers by kvcache-ai

0.4%
15k
Framework for LLM inference optimization experimentation
created 1 year ago
updated 3 days ago
Feedback? Help us improve.