Chinese-Mixtral  by ymcui

Chinese Mixtral: MoE LLMs for Chinese language

created 1 year ago
604 stars

Top 55.0% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides Chinese-language versions of Mistral AI's Mixtral models, specifically a base model (Chinese-Mixtral) and an instruction-tuned variant (Chinese-Mixtral-Instruct). It targets researchers and developers needing high-performance LLMs for Chinese text processing, offering significant improvements in long-context understanding, mathematical reasoning, and code generation, with efficient deployment options.

How It Works

The project leverages Mistral AI's sparse Mixture-of-Experts (MoE) architecture, activating 2 out of 8 experts per token. Chinese-Mixtral is an incremental pre-training of Mixtral-8x7B-v0.1 on large-scale unlabeled Chinese data. Chinese-Mixtral-Instruct is further fine-tuned on instruction datasets. This MoE approach allows for a large effective parameter count (13B active) while maintaining efficient inference, and the models natively support 32K context, extendable to 128K.

Quick Start & Requirements

  • Installation: Download pre-trained models (full, LoRA, or GGUF formats) from Hugging Face, ModelScope, or Baidu. Deployment via llama.cpp, transformers, vLLM, text-generation-webui, etc.
  • Prerequisites: Varies by deployment method. llama.cpp requires minimal resources (16GB RAM/VRAM for quantized models). GPU acceleration is recommended for optimal performance.
  • Resources: Full models are ~87GB, LoRA ~2.4GB. Quantized versions (GGUF) significantly reduce memory footprint.
  • Docs: Docs/文档, Model Arena

Highlighted Details

  • Native 32K context support, tested up to 128K.
  • Quantized versions (e.g., Q4_0) require as little as 16GB RAM/VRAM for inference via llama.cpp.
  • Achieves competitive scores on Chinese benchmarks like C-Eval and CMMLU, and performs well on LongBench for long-context tasks.
  • Provides training and fine-tuning scripts for users to adapt or further train the models.

Maintenance & Community

  • Active development with regular updates (e.g., GGUF quantization, API deployment).
  • Community support via GitHub Issues and Discussions.
  • Related projects include Chinese-LLaMA-Alpaca series.

Licensing & Compatibility

  • Based on Mistral.ai's Mixtral model; users must adhere to its license.
  • Third-party code licenses must also be followed.
  • Commercial use requires adherence to local laws and ensuring output compliance; no liability is assumed by the project.

Limitations & Caveats

The project relies on the base Mixtral model, inheriting its architectural characteristics. Model output accuracy is not guaranteed due to computational factors, randomness, and quantization. Users are responsible for ensuring the compliance of model outputs for commercial applications.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.