Baichuan-7B  by baichuan-inc

7B-parameter LLM for commercial use

created 2 years ago
5,688 stars

Top 9.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Baichuan-7B is a 7-billion parameter, bilingual (Chinese/English) large language model developed by BaiChuan-Inc. It is designed for researchers and developers working with LLMs, offering strong performance on Chinese and English benchmarks and supporting commercial use.

How It Works

Built on a Transformer architecture, Baichuan-7B was trained on 1.2 trillion tokens. It utilizes rotary positional embeddings for better extrapolation, SwiGLU activation, and RMSNorm for normalization. The model employs optimized training techniques, including Flash-Attention, operator splitting, mixed precision, and communication optimizations, achieving high throughput on A800 GPUs. Its tokenizer uses Byte-Pair Encoding with optimizations for Chinese language and numerical data.

Quick Start & Requirements

  • Inference: Use the provided Hugging Face transformers Python code.
  • Prerequisites: Python, PyTorch, transformers library. GPU recommended for inference.
  • Training: Requires requirements.txt installation, data preparation, and DeepSpeed configuration.
  • Resources: Model weights are available on Hugging Face and ModelScope.
  • Docs: Hugging Face, ModelScope

Highlighted Details

  • Achieves top results among 7B models on C-Eval (Chinese) and MMLU (English) benchmarks.
  • Supports a 4096 token context window, with good extrapolation beyond 5000 tokens.
  • Tokenizer shows improved compression rates for Chinese compared to LLaMA and Falcon.
  • Training achieved 182 TFLOPS throughput on 1000 A800 GPUs with 58.3% peak utilization.

Maintenance & Community

  • The project has released Baichuan 2 (7B, 13B) as a successor.
  • Community resources include WeChat and links to Hugging Face.

Licensing & Compatibility

  • Source code is licensed under Apache 2.0.
  • Model usage is permitted for commercial purposes, but requires registration and written authorization from BaiChuan-Inc. via opensource@baichuan-inc.com.

Limitations & Caveats

  • The README mentions a "Baichuan-7B Model License Agreement" for commercial use, which may contain specific terms beyond the Apache 2.0 license for the code.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
10 more.

TinyLlama by jzhang38

0.3%
9k
Tiny pretraining project for a 1.1B Llama model
created 1 year ago
updated 1 year ago
Feedback? Help us improve.