Baichuan2  by baichuan-inc

LLM for research/commercial use (license required for some commercial use cases)

Created 2 years ago
4,124 stars

Top 12.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Baichuan 2 is a series of large language models developed by Baichuan Intelligent Technology, offering 7B and 13B parameter versions in both base and chat configurations. These models are trained on 2.6 trillion tokens of high-quality data and aim to provide state-of-the-art performance across various Chinese and English benchmarks, including general knowledge, legal, medical, mathematical, coding, and translation tasks. The models are open for academic research and available for free commercial use under specific conditions, making them accessible to developers and researchers.

How It Works

Baichuan 2 models are transformer-based large language models. The project provides pre-trained weights for both base and chat-tuned versions, with the chat versions further optimized for conversational AI. Notably, the project offers 4-bit quantized versions (NF4) of the chat models, significantly reducing memory footprint while maintaining performance close to the original models. This quantization is achieved using the BitsAndBytes library, supporting both online and offline quantization methods for flexible deployment.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt (for fine-tuning) or direct download from Hugging Face/ModelScope.
  • Prerequisites: Python, PyTorch, Transformers library. GPU with CUDA is recommended for efficient inference and training. Specific fine-tuning scripts require peft and xFormers.
  • Resources: 4-bit quantized 7B model requires ~5.1 GB VRAM, 13B requires ~8.6 GB VRAM. Full precision models require significantly more.
  • Links: Hugging Face: https://huggingface.co/baichuan-inc, ModelScope: https://www.modelscope.cn/organization/baichuan-inc

Highlighted Details

  • Offers 7B and 13B parameter models, including 4-bit quantized versions for reduced memory usage.
  • Achieves competitive benchmark results across general, legal, medical, math, code, and translation tasks.
  • Supports fine-tuning with options for LoRA and distributed training via DeepSpeed.
  • Provides intermediate checkpoints for research into model training progression.

Maintenance & Community

The project is actively maintained by Baichuan Intelligent Technology. Community support channels are available via WeChat. The project also highlights integrations with Intel, Huawei Ascend, and MindSpore.

Licensing & Compatibility

The models are released under Apache 2.0 and a specific "Baichuan 2 Model Community License Agreement." Commercial use is permitted if daily active users are below 1 million, the entity is not a cloud/software provider, and there's no third-party sub-licensing. A formal application process is required for commercial licensing.

Limitations & Caveats

The project disclaims responsibility for any misuse or issues arising from the model's use, including data security or public opinion risks. Users are cautioned against using the model for illegal activities or internet services without proper security review. CPU inference is supported but significantly slower than GPU.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), and
7 more.

ChatRWKV by BlinkDL

0.0%
10k
Open-source chatbot powered by the RWKV RNN language model
Created 2 years ago
Updated 3 weeks ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
5 more.

MOSS by OpenMOSS

0.0%
12k
Open-source tool-augmented conversational language model
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.