Aurora by WangRongsheng

Code for a research paper on instruction-tuning a Chinese chat model

Created 2 years ago

266 stars

Top 96.3% on SourcePulse

Project Summary

Aurora enhances the Chinese conversational capabilities of the Mixtral-8x7B sparse Mixture-of-Experts model through instruction fine-tuning. It targets researchers and developers seeking to improve LLM performance on Chinese language tasks, offering a specialized model derived from a powerful MoE architecture.

How It Works

Aurora is built by instruction fine-tuning the Mixtral-8x7B model on a curated dataset of three Chinese instruction-following datasets. This approach leverages machine-generated instructions to imbue the model with zero-shot capabilities for novel Chinese conversational tasks, a pioneering application of instruction tuning on sparse MoE models.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python, PyTorch, Hugging Face Transformers, PEFT. Requires significant GPU memory (~43 GiB for training, ~25 GiB for inference).
Model Weights: Download Mixtral-8x7B-Instruct-v0.1 and Aurora LoRA weights from HuggingFace or ModelScope.
Demo: Run src/web_demo.py for a Gradio interface.
Docs: GitHub Repository

Highlighted Details

Achieves notable performance gains on medical evaluation benchmarks (e.g., CMB score of 29.87 vs. Mistral-7B's 22.26).
Supports 4-bit quantization (QLoRA) for reduced inference memory footprint.
Offers LoRA weights for easy integration with the base Mixtral-8x7B model.
Includes training scripts for further fine-tuning on custom datasets.

Maintenance & Community

The project is primarily developed by the Faculty of Applied Sciences of Macao Polytechnic University. It utilizes the LLaMA-Factory framework.

Licensing & Compatibility

License: Apache 2.0 License.
Compatibility: Permissive license allows for commercial use and integration with closed-source applications.

Limitations & Caveats

The project requires substantial GPU resources (40GB+ VRAM) for both training and inference, potentially limiting accessibility for users with less powerful hardware. While benchmarks are provided, comprehensive evaluation across a broader range of Chinese NLP tasks may be needed.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days