Discover and explore top open-source AI tools and projects—updated daily.
Large language & vision-language models based on linear attention
Top 15.3% on SourcePulse
This repository provides the official implementations for MiniMax-Text-01 and MiniMax-VL-01, large-scale language and vision-language models. These models are designed for researchers and developers seeking state-of-the-art performance in long-context understanding and multimodal tasks, offering advanced architectures and competitive benchmark results.
How It Works
MiniMax-Text-01 utilizes a hybrid attention mechanism combining Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE) to achieve a 1 million token context length during training and up to 4 million during inference. It employs parallel strategies like LASP+ and ETP for efficient scaling. MiniMax-VL-01 builds upon this by integrating a Vision Transformer (ViT) and a dynamic resolution mechanism, allowing it to process images at resolutions up to 2016x2016 while maintaining a 336x336 thumbnail for efficient multimodal understanding.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 months ago
1 day