Baichuan-13B by baichuan-inc

LLM for both pretraining and chat

Created 2 years ago

2,951 stars

Top 16.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

Baichuan-13B is a 13-billion parameter open-source large language model developed by Baichuan Intelligent Technology, offering both base and chat-tuned versions. It excels in Chinese and English benchmarks, trained on 1.4 trillion tokens, and supports a 4096 context window with ALiBi positional encoding. The model is designed for efficient inference, with int8 and int4 quantized versions available for deployment on consumer-grade GPUs, and is available for commercial use upon application.

How It Works

Baichuan-13B utilizes ALiBi positional encoding, which offers computational advantages over Rotary Embeddings, leading to a claimed 31.6% increase in inference speed compared to LLaMA-13B. The model architecture features 40 layers, a hidden dimension of 5120, and 40 attention heads. It supports both full fine-tuning and LoRA fine-tuning methods, with provided scripts and configurations for these processes.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python, PyTorch, Transformers library. GPU with CUDA is recommended for optimal performance.
Demo: Streamlit web demo available via streamlit run web_demo.py.
Resources: Quantized versions (int8, int4) reduce GPU memory footprint to 15.8GB and 9.7GB respectively. CPU inference requires ~60GB RAM.
Links: Hugging Face (Base & Chat), ModelScope.

Highlighted Details

Achieves state-of-the-art results on Chinese and English benchmarks for its size.
Offers int8 and int4 quantized versions with minimal performance degradation.
Supports efficient inference with ALiBi positional encoding, outperforming LLaMA-13B.
Provides clear examples for Python, CLI, and web-based inference, as well as fine-tuning.

Maintenance & Community

The project is actively maintained by Baichuan Intelligent Technology. Updates include the release of Baichuan 2. Community interaction channels are available via WeChat.

Licensing & Compatibility

The source code is licensed under Apache 2.0. Model usage is governed by the "Baichuan-13B Model Community License Agreement." Commercial use is permitted upon registration and written authorization via opensource@baichuan-inc.com.

Limitations & Caveats

The developers disclaim responsibility for any issues arising from the model's use, including data security, public opinion risks, or misuse. Users are urged not to use the model for illegal activities or internet services without proper security review and filing.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days