TransnormerLLM by OpenNLPLab

Faster, better LLM with linear attention

Created 3 years ago

255 stars

Top 98.8% on SourcePulse

Project Summary

Summary

TransnormerLLM is an open-source LLM project focused on re-inventing LLM architecture for superior accuracy and efficiency. It provides open weights and fine-tuning code, targeting individuals, researchers, and businesses for responsible innovation. Its core benefit is outperforming traditional softmax attention models via a novel linear attention mechanism.

How It Works

This project introduces TransnormerLLM, the first LLM leveraging linear attention for enhanced accuracy and efficiency over conventional softmax attention. It builds on TransNormer with LRPE positional embedding, Lightning Attention acceleration, and new gating/normalization mechanisms. Trained on up to 1.4 trillion tokens, it demonstrates competitive performance across diverse benchmarks.

Quick Start & Requirements

Inference: pip install -r requirements.txt. Model weights are available on Hugging Face. Fine-tuning requires cloning the repo, navigating to fine-tune, and pip install -r requirements.txt (install peft for LoRA). Training examples use torchrun and deepspeed. Set export use_triton=False for potential Triton errors.

Highlighted Details

Released Base models include 385M, 1B, and 7B parameter versions, with a 15B model currently in training.
Achieves competitive performance on Chinese, English, and multi-language benchmarks (Commonsense Reasoning, MMLU, CMMLU, C-Eval), often matching or exceeding state-of-the-art open-source models.
Architecture features LRPE positional embedding and Lightning Attention for accelerated processing.

Maintenance & Community

Community support is available via Discord and WeChat. The project acknowledges dependencies on open-source components like Baichuan (tokenizer), metaseq (training), and lm-evaluation-harness (evaluation).

Licensing & Compatibility

Licensed under Apache 2.0 and a Community License. Commercial use is permitted for entities with <1M DAU, not acting as software/cloud providers, and unable to re-license. Approval via email (opennlplab@gmail.com) is mandatory.

Limitations & Caveats

The project disclaims responsibility for misuse, security risks, or legal violations. Users are warned against using models for harmful activities or unvetted services. Despite efforts, data compliance issues may arise due to model/data complexity.

TransnormerLLM by OpenNLPLab

Explore Similar Projects

long-llms-learning by Strivin0311

Stanford-CS336 by YYZhang2025

Mengzi3 by Langboat

based by HazyResearch

Instella by AMD-AGI

Flowformer by thuml

Samba by microsoft

native-sparse-attention-pytorch by lucidrains

Kimi-Linear by MoonshotAI

LLMBox by RUCAIBox

bigbird by google-research

MiniMax-01 by MiniMax-AI