Chinese Mixtral: MoE LLMs for Chinese language
Top 55.0% on sourcepulse
This project provides Chinese-language versions of Mistral AI's Mixtral models, specifically a base model (Chinese-Mixtral) and an instruction-tuned variant (Chinese-Mixtral-Instruct). It targets researchers and developers needing high-performance LLMs for Chinese text processing, offering significant improvements in long-context understanding, mathematical reasoning, and code generation, with efficient deployment options.
How It Works
The project leverages Mistral AI's sparse Mixture-of-Experts (MoE) architecture, activating 2 out of 8 experts per token. Chinese-Mixtral is an incremental pre-training of Mixtral-8x7B-v0.1 on large-scale unlabeled Chinese data. Chinese-Mixtral-Instruct is further fine-tuned on instruction datasets. This MoE approach allows for a large effective parameter count (13B active) while maintaining efficient inference, and the models natively support 32K context, extendable to 128K.
Quick Start & Requirements
llama.cpp
, transformers
, vLLM
, text-generation-webui
, etc.llama.cpp
requires minimal resources (16GB RAM/VRAM for quantized models). GPU acceleration is recommended for optimal performance.Highlighted Details
llama.cpp
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project relies on the base Mixtral model, inheriting its architectural characteristics. Model output accuracy is not guaranteed due to computational factors, randomness, and quantization. Users are responsible for ensuring the compliance of model outputs for commercial applications.
1 year ago
1 day