Code for a research paper on instruction-tuning a Chinese chat model
Top 97.6% on sourcepulse
Aurora enhances the Chinese conversational capabilities of the Mixtral-8x7B sparse Mixture-of-Experts model through instruction fine-tuning. It targets researchers and developers seeking to improve LLM performance on Chinese language tasks, offering a specialized model derived from a powerful MoE architecture.
How It Works
Aurora is built by instruction fine-tuning the Mixtral-8x7B model on a curated dataset of three Chinese instruction-following datasets. This approach leverages machine-generated instructions to imbue the model with zero-shot capabilities for novel Chinese conversational tasks, a pioneering application of instruction tuning on sparse MoE models.
Quick Start & Requirements
pip install -r requirements.txt
src/web_demo.py
for a Gradio interface.Highlighted Details
Maintenance & Community
The project is primarily developed by the Faculty of Applied Sciences of Macao Polytechnic University. It utilizes the LLaMA-Factory framework.
Licensing & Compatibility
Limitations & Caveats
The project requires substantial GPU resources (40GB+ VRAM) for both training and inference, potentially limiting accessibility for users with less powerful hardware. While benchmarks are provided, comprehensive evaluation across a broader range of Chinese NLP tasks may be needed.
1 year ago
1 day