Mengzi3  by Langboat

LLM for multilingual generation, especially Chinese

created 1 year ago
1,378 stars

Top 29.9% on sourcepulse

GitHubView on GitHub
Project Summary

Mengzi3 is a series of large language models (8B and 13B parameters) based on the Llama architecture, designed to offer strong Chinese language capabilities alongside multilingual support. It targets researchers and developers seeking a performant, commercially viable LLM for various natural language processing tasks.

How It Works

The models are trained on a diverse corpus including web pages, encyclopedias, social media, news, and high-quality open-source datasets, with continued training on trillions of tokens of multilingual data. This approach emphasizes robust Chinese language understanding and generation while maintaining broader multilingual competence.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Usage: Load via Hugging Face transformers library. Example code provided for inference.
  • Requirements: PyTorch, Hugging Face transformers. GPU recommended for inference.
  • Links: Hugging Face, ModelScope

Highlighted Details

  • Mengzi3-13B-Base outperforms comparable models in MMLU (0.651) and CMMLU (0.588) benchmarks.
  • Achieves state-of-the-art results on OCNLI (0.776) among the compared models.
  • Mengzi3.5-13B-Base shows significant improvements, reaching 0.776 on MMLU and 0.813 on CMMLU.
  • Fine-tuning scripts and data format examples are available in finetune_demo.

Maintenance & Community

  • Developed by Langboat.
  • Contact for commercial licenses and business cooperation provided.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Open for academic research and free for commercial use.

Limitations & Caveats

The project disclaims responsibility for any misuse, security issues, or public opinion risks arising from the model's use, emphasizing the need for responsible deployment and adherence to legal guidelines.

Health Check
Last commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.