Baichuan-13B  by baichuan-inc

LLM for both pretraining and chat

created 2 years ago
2,969 stars

Top 16.5% on sourcepulse

GitHubView on GitHub
Project Summary

Baichuan-13B is a 13-billion parameter open-source large language model developed by Baichuan Intelligent Technology, offering both base and chat-tuned versions. It excels in Chinese and English benchmarks, trained on 1.4 trillion tokens, and supports a 4096 context window with ALiBi positional encoding. The model is designed for efficient inference, with int8 and int4 quantized versions available for deployment on consumer-grade GPUs, and is available for commercial use upon application.

How It Works

Baichuan-13B utilizes ALiBi positional encoding, which offers computational advantages over Rotary Embeddings, leading to a claimed 31.6% increase in inference speed compared to LLaMA-13B. The model architecture features 40 layers, a hidden dimension of 5120, and 40 attention heads. It supports both full fine-tuning and LoRA fine-tuning methods, with provided scripts and configurations for these processes.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python, PyTorch, Transformers library. GPU with CUDA is recommended for optimal performance.
  • Demo: Streamlit web demo available via streamlit run web_demo.py.
  • Resources: Quantized versions (int8, int4) reduce GPU memory footprint to 15.8GB and 9.7GB respectively. CPU inference requires ~60GB RAM.
  • Links: Hugging Face (Base & Chat), ModelScope.

Highlighted Details

  • Achieves state-of-the-art results on Chinese and English benchmarks for its size.
  • Offers int8 and int4 quantized versions with minimal performance degradation.
  • Supports efficient inference with ALiBi positional encoding, outperforming LLaMA-13B.
  • Provides clear examples for Python, CLI, and web-based inference, as well as fine-tuning.

Maintenance & Community

The project is actively maintained by Baichuan Intelligent Technology. Updates include the release of Baichuan 2. Community interaction channels are available via WeChat.

Licensing & Compatibility

The source code is licensed under Apache 2.0. Model usage is governed by the "Baichuan-13B Model Community License Agreement." Commercial use is permitted upon registration and written authorization via opensource@baichuan-inc.com.

Limitations & Caveats

The developers disclaim responsibility for any issues arising from the model's use, including data security, public opinion risks, or misuse. Users are urged not to use the model for illegal activities or internet services without proper security review and filing.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.