TransnormerLLM  by OpenNLPLab

Faster, better LLM with linear attention

Created 2 years ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

TransnormerLLM is an open-source LLM project focused on re-inventing LLM architecture for superior accuracy and efficiency. It provides open weights and fine-tuning code, targeting individuals, researchers, and businesses for responsible innovation. Its core benefit is outperforming traditional softmax attention models via a novel linear attention mechanism.

How It Works

This project introduces TransnormerLLM, the first LLM leveraging linear attention for enhanced accuracy and efficiency over conventional softmax attention. It builds on TransNormer with LRPE positional embedding, Lightning Attention acceleration, and new gating/normalization mechanisms. Trained on up to 1.4 trillion tokens, it demonstrates competitive performance across diverse benchmarks.

Quick Start & Requirements

Inference: pip install -r requirements.txt. Model weights are available on Hugging Face. Fine-tuning requires cloning the repo, navigating to fine-tune, and pip install -r requirements.txt (install peft for LoRA). Training examples use torchrun and deepspeed. Set export use_triton=False for potential Triton errors.

Highlighted Details

  • Released Base models include 385M, 1B, and 7B parameter versions, with a 15B model currently in training.
  • Achieves competitive performance on Chinese, English, and multi-language benchmarks (Commonsense Reasoning, MMLU, CMMLU, C-Eval), often matching or exceeding state-of-the-art open-source models.
  • Architecture features LRPE positional embedding and Lightning Attention for accelerated processing.

Maintenance & Community

Community support is available via Discord and WeChat. The project acknowledges dependencies on open-source components like Baichuan (tokenizer), metaseq (training), and lm-evaluation-harness (evaluation).

Licensing & Compatibility

Licensed under Apache 2.0 and a Community License. Commercial use is permitted for entities with <1M DAU, not acting as software/cloud providers, and unable to re-license. Approval via email (opennlplab@gmail.com) is mandatory.

Limitations & Caveats

The project disclaims responsibility for misuse, security risks, or legal violations. Users are warned against using models for harmful activities or unvetted services. Despite efforts, data compliance issues may arise due to model/data complexity.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

DeepSeek-V3.2-Exp by deepseek-ai

0.6%
1k
Experimental LLM boosting long-context efficiency
Created 4 months ago
Updated 3 months ago
Feedback? Help us improve.