Discover and explore top open-source AI tools and projects—updated daily.
OpenNLPLabFaster, better LLM with linear attention
Top 99.6% on SourcePulse
Summary
TransnormerLLM is an open-source LLM project focused on re-inventing LLM architecture for superior accuracy and efficiency. It provides open weights and fine-tuning code, targeting individuals, researchers, and businesses for responsible innovation. Its core benefit is outperforming traditional softmax attention models via a novel linear attention mechanism.
How It Works
This project introduces TransnormerLLM, the first LLM leveraging linear attention for enhanced accuracy and efficiency over conventional softmax attention. It builds on TransNormer with LRPE positional embedding, Lightning Attention acceleration, and new gating/normalization mechanisms. Trained on up to 1.4 trillion tokens, it demonstrates competitive performance across diverse benchmarks.
Quick Start & Requirements
Inference: pip install -r requirements.txt. Model weights are available on Hugging Face. Fine-tuning requires cloning the repo, navigating to fine-tune, and pip install -r requirements.txt (install peft for LoRA). Training examples use torchrun and deepspeed. Set export use_triton=False for potential Triton errors.
Highlighted Details
Maintenance & Community
Community support is available via Discord and WeChat. The project acknowledges dependencies on open-source components like Baichuan (tokenizer), metaseq (training), and lm-evaluation-harness (evaluation).
Licensing & Compatibility
Licensed under Apache 2.0 and a Community License. Commercial use is permitted for entities with <1M DAU, not acting as software/cloud providers, and unable to re-license. Approval via email (opennlplab@gmail.com) is mandatory.
Limitations & Caveats
The project disclaims responsibility for misuse, security risks, or legal violations. Users are warned against using models for harmful activities or unvetted services. Despite efforts, data compliance issues may arise due to model/data complexity.
2 years ago
Inactive
google-research
deepseek-ai
MiniMax-AI