General language model for NLU, generation, and blank-filling tasks
Top 15.2% on sourcepulse
GLM (General Language Model) is a pretrained autoregressive language model designed for natural language understanding and generation tasks. It offers a flexible framework for various NLP applications, from text infilling to sequence-to-sequence generation, targeting researchers and developers working with large language models.
How It Works
GLM utilizes an autoregressive blank-filling objective, allowing it to predict masked spans of text. This approach, combined with 2D positional encodings and the ability to predict spans in arbitrary orders, enables GLM to achieve strong performance across diverse tasks, outperforming models like BERT, T5, and GPT with comparable parameter counts. It supports specialized mask tokens ([MASK], [sMASK], [gMASK]) for different infilling and generation strategies.
Quick Start & Requirements
transformers
library (pip install transformers>=4.23.1
) or Docker.Highlighted Details
Maintenance & Community
The project is associated with THUDM (Tsinghua University). Community interaction channels are not explicitly mentioned in the README.
Licensing & Compatibility
The README does not specify a license. Code is based on Megatron-LM and PET. Commercial use implications are not detailed.
Limitations & Caveats
The README does not explicitly state the license, which could impact commercial use. The project relies on specific versions of PyTorch and Apex, and setting up custom fine-tuning requires implementing custom data processors and PVPs.
1 year ago
1 week