Baichuan2 by baichuan-inc

LLM for research/commercial use (license required for some commercial use cases)

Created 2 years ago

4,121 stars

Top 11.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

Baichuan 2 is a series of large language models developed by Baichuan Intelligent Technology, offering 7B and 13B parameter versions in both base and chat configurations. These models are trained on 2.6 trillion tokens of high-quality data and aim to provide state-of-the-art performance across various Chinese and English benchmarks, including general knowledge, legal, medical, mathematical, coding, and translation tasks. The models are open for academic research and available for free commercial use under specific conditions, making them accessible to developers and researchers.

How It Works

Baichuan 2 models are transformer-based large language models. The project provides pre-trained weights for both base and chat-tuned versions, with the chat versions further optimized for conversational AI. Notably, the project offers 4-bit quantized versions (NF4) of the chat models, significantly reducing memory footprint while maintaining performance close to the original models. This quantization is achieved using the BitsAndBytes library, supporting both online and offline quantization methods for flexible deployment.

Quick Start & Requirements

Installation: pip install -r requirements.txt (for fine-tuning) or direct download from Hugging Face/ModelScope.
Prerequisites: Python, PyTorch, Transformers library. GPU with CUDA is recommended for efficient inference and training. Specific fine-tuning scripts require peft and xFormers.
Resources: 4-bit quantized 7B model requires ~5.1 GB VRAM, 13B requires ~8.6 GB VRAM. Full precision models require significantly more.
Links: Hugging Face: https://huggingface.co/baichuan-inc, ModelScope: https://www.modelscope.cn/organization/baichuan-inc

Highlighted Details

Offers 7B and 13B parameter models, including 4-bit quantized versions for reduced memory usage.
Achieves competitive benchmark results across general, legal, medical, math, code, and translation tasks.
Supports fine-tuning with options for LoRA and distributed training via DeepSpeed.
Provides intermediate checkpoints for research into model training progression.

Maintenance & Community

The project is actively maintained by Baichuan Intelligent Technology. Community support channels are available via WeChat. The project also highlights integrations with Intel, Huawei Ascend, and MindSpore.

Licensing & Compatibility

The models are released under Apache 2.0 and a specific "Baichuan 2 Model Community License Agreement." Commercial use is permitted if daily active users are below 1 million, the entity is not a cloud/software provider, and there's no third-party sub-licensing. A formal application process is required for commercial licensing.

Limitations & Caveats

The project disclaims responsibility for any misuse or issues arising from the model's use, including data security or public opinion risks. Users are cautioned against using the model for illegal activities or internet services without proper security review. CPU inference is supported but significantly slower than GPU.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days