GLM by THUDM

General language model for NLU, generation, and blank-filling tasks

Created 4 years ago

3,391 stars

Top 14.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Alexander Wu

Founder of MetaGPT

Project Summary

GLM (General Language Model) is a pretrained autoregressive language model designed for natural language understanding and generation tasks. It offers a flexible framework for various NLP applications, from text infilling to sequence-to-sequence generation, targeting researchers and developers working with large language models.

How It Works

GLM utilizes an autoregressive blank-filling objective, allowing it to predict masked spans of text. This approach, combined with 2D positional encodings and the ability to predict spans in arbitrary orders, enables GLM to achieve strong performance across diverse tasks, outperforming models like BERT, T5, and GPT with comparable parameter counts. It supports specialized mask tokens ([MASK], [sMASK], [gMASK]) for different infilling and generation strategies.

Quick Start & Requirements

Installation: Via Hugging Face transformers library (pip install transformers>=4.23.1) or Docker.
Prerequisites: PyTorch (v1.7.0 recommended), Apex, CUDA (Docker images available for 10.2 and 11.2). GPU is required for inference and training.
Resources: Models range from 110M to 10B parameters. Running larger models (e.g., 10B) requires significant GPU memory; model parallelism is supported.
Links: Hugging Face Hub, Docker Hub, Paper, Code

Highlighted Details

Supports English and Chinese languages.
Achieves state-of-the-art results on SuperGLUE benchmarks and Seq2Seq tasks like CNN/Daily Mail and XSum.
Offers zero-shot evaluation capabilities for multiple-choice tasks.
Includes scripts for fine-tuning, pre-training, and P-Tuning integration.

Maintenance & Community

The project is associated with THUDM (Tsinghua University). Community interaction channels are not explicitly mentioned in the README.

Licensing & Compatibility

The README does not specify a license. Code is based on Megatron-LM and PET. Commercial use implications are not detailed.

Limitations & Caveats

The README does not explicitly state the license, which could impact commercial use. The project relies on specific versions of PyTorch and Apex, and setting up custom fine-tuning requires implementing custom data processors and PVPs.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

36 stars in the last 30 days