ChatGLM2-6B by zai-org

Bilingual chat LLM for research/commercial use (after registration)

Created 2 years ago

15,689 stars

Top 3.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Yaowei Zheng

Author of LLaMA-Factory

Binyuan Hui

Research Scientist at Alibaba Qwen

Project Summary

ChatGLM2-6B is an open-source, bilingual conversational large language model designed for efficient deployment and strong performance. It targets researchers and developers looking for a capable LLM that can run on consumer hardware, offering significant improvements over its predecessor in various benchmarks and extended context handling.

How It Works

ChatGLM2-6B is built upon the GLM architecture, featuring a mixed objective function and pre-training on 1.4T tokens. It incorporates FlashAttention for extended context windows (up to 32K) and Multi-Query Attention for faster inference and reduced memory usage. The model has undergone human preference alignment training, contributing to its competitive performance on benchmarks like MMLU, CEval, GSM8K, and BBH.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python 3.8+, PyTorch 2.0+, Transformers 4.30.2. GPU with at least 6GB VRAM recommended for INT4 quantization; 13GB for FP16. CUDA is required for GPU acceleration.
Setup: Download model weights (approx. 13GB for FP16).
Docs: Hugging Face Repo, GitHub

Highlighted Details

Context length extended to 32K using FlashAttention.
42% faster inference compared to ChatGLM-6B due to Multi-Query Attention.
INT4 quantization allows 8K context on 6GB VRAM.
Performance gains: +23% MMLU, +33% CEval, +571% GSM8K, +60% BBH over the first generation.

Maintenance & Community

Active development with releases including 32K context and code-specific models.
Community support via Discord and WeChat. Twitter

Licensing & Compatibility

Code licensed under Apache-2.0.
Model weights are fully open for academic research and free for commercial use upon registration via a questionnaire. Restrictions apply against harmful or unvetted uses.

Limitations & Caveats

The model is not guaranteed to be accurate and can be easily misled. The project team has not developed any official applications for the model. The README warns of potential data security and public opinion risks due to model misuse. Compatibility with PyTorch versions below 2.0 may lead to higher memory usage.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

24 stars in the last 30 days