Aurora  by WangRongsheng

Code for a research paper on instruction-tuning a Chinese chat model

created 1 year ago
263 stars

Top 97.6% on sourcepulse

GitHubView on GitHub
Project Summary

Aurora enhances the Chinese conversational capabilities of the Mixtral-8x7B sparse Mixture-of-Experts model through instruction fine-tuning. It targets researchers and developers seeking to improve LLM performance on Chinese language tasks, offering a specialized model derived from a powerful MoE architecture.

How It Works

Aurora is built by instruction fine-tuning the Mixtral-8x7B model on a curated dataset of three Chinese instruction-following datasets. This approach leverages machine-generated instructions to imbue the model with zero-shot capabilities for novel Chinese conversational tasks, a pioneering application of instruction tuning on sparse MoE models.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python, PyTorch, Hugging Face Transformers, PEFT. Requires significant GPU memory (~43 GiB for training, ~25 GiB for inference).
  • Model Weights: Download Mixtral-8x7B-Instruct-v0.1 and Aurora LoRA weights from HuggingFace or ModelScope.
  • Demo: Run src/web_demo.py for a Gradio interface.
  • Docs: GitHub Repository

Highlighted Details

  • Achieves notable performance gains on medical evaluation benchmarks (e.g., CMB score of 29.87 vs. Mistral-7B's 22.26).
  • Supports 4-bit quantization (QLoRA) for reduced inference memory footprint.
  • Offers LoRA weights for easy integration with the base Mixtral-8x7B model.
  • Includes training scripts for further fine-tuning on custom datasets.

Maintenance & Community

The project is primarily developed by the Faculty of Applied Sciences of Macao Polytechnic University. It utilizes the LLaMA-Factory framework.

Licensing & Compatibility

  • License: Apache 2.0 License.
  • Compatibility: Permissive license allows for commercial use and integration with closed-source applications.

Limitations & Caveats

The project requires substantial GPU resources (40GB+ VRAM) for both training and inference, potentially limiting accessibility for users with less powerful hardware. While benchmarks are provided, comprehensive evaluation across a broader range of Chinese NLP tasks may be needed.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.