Chinese Mixtral-8x7B: a base model for Chinese language
Top 52.3% on sourcepulse
This project provides Chinese-Mixtral-8x7B, an open-source Mixture-of-Experts (MoE) large language model enhanced for Chinese language processing. It offers a base model and fine-tuned versions, targeting researchers and developers looking to leverage advanced MoE architectures for Chinese NLP tasks, with benefits including improved Chinese tokenization efficiency and strong performance on both Chinese and English benchmarks.
How It Works
The project builds upon Mistral's Mixtral-8x7B architecture by expanding its vocabulary with a custom Chinese token set, trained using SentencePiece on Chinese datasets. This expanded vocabulary significantly improves the model's Chinese tokenization efficiency. Incremental pre-training is then performed on this modified model using a large corpus of Chinese and English data, including SkyPile and SlimPajama, to imbue it with strong Chinese generation and understanding capabilities. Training utilizes QLoRA for efficient fine-tuning, employing 4-bit quantization and other memory-saving techniques.
Quick Start & Requirements
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 transformers==4.36.2 datasets evaluate peft accelerate gradio optimum sentencepiece trl jupyterlab scikit-learn pandas matplotlib tensorboard nltk rouge bitsandbytes fire
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
11 months ago
1 week