ChatGLM2-6B  by zai-org

Bilingual chat LLM for research/commercial use (after registration)

created 2 years ago
15,718 stars

Top 3.1% on sourcepulse

GitHubView on GitHub
Project Summary

ChatGLM2-6B is an open-source, bilingual conversational large language model designed for efficient deployment and strong performance. It targets researchers and developers looking for a capable LLM that can run on consumer hardware, offering significant improvements over its predecessor in various benchmarks and extended context handling.

How It Works

ChatGLM2-6B is built upon the GLM architecture, featuring a mixed objective function and pre-training on 1.4T tokens. It incorporates FlashAttention for extended context windows (up to 32K) and Multi-Query Attention for faster inference and reduced memory usage. The model has undergone human preference alignment training, contributing to its competitive performance on benchmarks like MMLU, CEval, GSM8K, and BBH.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python 3.8+, PyTorch 2.0+, Transformers 4.30.2. GPU with at least 6GB VRAM recommended for INT4 quantization; 13GB for FP16. CUDA is required for GPU acceleration.
  • Setup: Download model weights (approx. 13GB for FP16).
  • Docs: Hugging Face Repo, GitHub

Highlighted Details

  • Context length extended to 32K using FlashAttention.
  • 42% faster inference compared to ChatGLM-6B due to Multi-Query Attention.
  • INT4 quantization allows 8K context on 6GB VRAM.
  • Performance gains: +23% MMLU, +33% CEval, +571% GSM8K, +60% BBH over the first generation.

Maintenance & Community

  • Active development with releases including 32K context and code-specific models.
  • Community support via Discord and WeChat. Twitter

Licensing & Compatibility

  • Code licensed under Apache-2.0.
  • Model weights are fully open for academic research and free for commercial use upon registration via a questionnaire. Restrictions apply against harmful or unvetted uses.

Limitations & Caveats

The model is not guaranteed to be accurate and can be easily misled. The project team has not developed any official applications for the model. The README warns of potential data security and public opinion risks due to model misuse. Compatibility with PyTorch versions below 2.0 may lead to higher memory usage.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
61 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
10 more.

TinyLlama by jzhang38

0.3%
9k
Tiny pretraining project for a 1.1B Llama model
created 1 year ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
2 more.

ChatGLM-6B by zai-org

0.1%
41k
Bilingual dialogue language model for research
created 2 years ago
updated 1 year ago
Feedback? Help us improve.