Baichuan-7B  by baichuan-inc

7B-parameter LLM for commercial use

Created 2 years ago
5,685 stars

Top 9.0% on SourcePulse

GitHubView on GitHub
Project Summary

Baichuan-7B is a 7-billion parameter, bilingual (Chinese/English) large language model developed by BaiChuan-Inc. It is designed for researchers and developers working with LLMs, offering strong performance on Chinese and English benchmarks and supporting commercial use.

How It Works

Built on a Transformer architecture, Baichuan-7B was trained on 1.2 trillion tokens. It utilizes rotary positional embeddings for better extrapolation, SwiGLU activation, and RMSNorm for normalization. The model employs optimized training techniques, including Flash-Attention, operator splitting, mixed precision, and communication optimizations, achieving high throughput on A800 GPUs. Its tokenizer uses Byte-Pair Encoding with optimizations for Chinese language and numerical data.

Quick Start & Requirements

  • Inference: Use the provided Hugging Face transformers Python code.
  • Prerequisites: Python, PyTorch, transformers library. GPU recommended for inference.
  • Training: Requires requirements.txt installation, data preparation, and DeepSpeed configuration.
  • Resources: Model weights are available on Hugging Face and ModelScope.
  • Docs: Hugging Face, ModelScope

Highlighted Details

  • Achieves top results among 7B models on C-Eval (Chinese) and MMLU (English) benchmarks.
  • Supports a 4096 token context window, with good extrapolation beyond 5000 tokens.
  • Tokenizer shows improved compression rates for Chinese compared to LLaMA and Falcon.
  • Training achieved 182 TFLOPS throughput on 1000 A800 GPUs with 58.3% peak utilization.

Maintenance & Community

  • The project has released Baichuan 2 (7B, 13B) as a successor.
  • Community resources include WeChat and links to Hugging Face.

Licensing & Compatibility

  • Source code is licensed under Apache 2.0.
  • Model usage is permitted for commercial purposes, but requires registration and written authorization from BaiChuan-Inc. via opensource@baichuan-inc.com.

Limitations & Caveats

  • The README mentions a "Baichuan-7B Model License Agreement" for commercial use, which may contain specific terms beyond the Apache 2.0 license for the code.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

0.2%
462
MoE model for research
Created 4 months ago
Updated 4 weeks ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

10.6%
2k
Speculative decoding research paper for faster LLM inference
Created 1 year ago
Updated 1 week ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
8 more.

modded-nanogpt by KellerJordan

0.7%
3k
Language model training speedrun on 8x H100 GPUs
Created 1 year ago
Updated 2 months ago
Starred by Phil Wang Phil Wang(Prolific Research Paper Implementer), Lianmin Zheng Lianmin Zheng(Coauthor of SGLang, vLLM), and
6 more.

Kimi-K2 by MoonshotAI

1.7%
8k
State-of-the-art MoE language model
Created 2 months ago
Updated 1 week ago
Feedback? Help us improve.