GLM  by THUDM

General language model for NLU, generation, and blank-filling tasks

created 4 years ago
3,261 stars

Top 15.2% on sourcepulse

GitHubView on GitHub
Project Summary

GLM (General Language Model) is a pretrained autoregressive language model designed for natural language understanding and generation tasks. It offers a flexible framework for various NLP applications, from text infilling to sequence-to-sequence generation, targeting researchers and developers working with large language models.

How It Works

GLM utilizes an autoregressive blank-filling objective, allowing it to predict masked spans of text. This approach, combined with 2D positional encodings and the ability to predict spans in arbitrary orders, enables GLM to achieve strong performance across diverse tasks, outperforming models like BERT, T5, and GPT with comparable parameter counts. It supports specialized mask tokens ([MASK], [sMASK], [gMASK]) for different infilling and generation strategies.

Quick Start & Requirements

  • Installation: Via Hugging Face transformers library (pip install transformers>=4.23.1) or Docker.
  • Prerequisites: PyTorch (v1.7.0 recommended), Apex, CUDA (Docker images available for 10.2 and 11.2). GPU is required for inference and training.
  • Resources: Models range from 110M to 10B parameters. Running larger models (e.g., 10B) requires significant GPU memory; model parallelism is supported.
  • Links: Hugging Face Hub, Docker Hub, Paper, Code

Highlighted Details

  • Supports English and Chinese languages.
  • Achieves state-of-the-art results on SuperGLUE benchmarks and Seq2Seq tasks like CNN/Daily Mail and XSum.
  • Offers zero-shot evaluation capabilities for multiple-choice tasks.
  • Includes scripts for fine-tuning, pre-training, and P-Tuning integration.

Maintenance & Community

The project is associated with THUDM (Tsinghua University). Community interaction channels are not explicitly mentioned in the README.

Licensing & Compatibility

The README does not specify a license. Code is based on Megatron-LM and PET. Commercial use implications are not detailed.

Limitations & Caveats

The README does not explicitly state the license, which could impact commercial use. The project relies on specific versions of PyTorch and Apex, and setting up custom fine-tuning requires implementing custom data processors and PVPs.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
42 stars in the last 90 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Author of SGLang) and Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX).

xgen by salesforce

0%
720
LLM research release with 8k sequence length
created 2 years ago
updated 6 months ago
Feedback? Help us improve.