MiniMax-01  by MiniMax-AI

Large language & vision-language models based on linear attention

Created 8 months ago
3,146 stars

Top 15.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official implementations for MiniMax-Text-01 and MiniMax-VL-01, large-scale language and vision-language models. These models are designed for researchers and developers seeking state-of-the-art performance in long-context understanding and multimodal tasks, offering advanced architectures and competitive benchmark results.

How It Works

MiniMax-Text-01 utilizes a hybrid attention mechanism combining Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE) to achieve a 1 million token context length during training and up to 4 million during inference. It employs parallel strategies like LASP+ and ETP for efficient scaling. MiniMax-VL-01 builds upon this by integrating a Vision Transformer (ViT) and a dynamic resolution mechanism, allowing it to process images at resolutions up to 2016x2016 while maintaining a 336x336 thumbnail for efficient multimodal understanding.

Quick Start & Requirements

  • Installation: Models are available via Hugging Face Transformers.
  • Hardware: Requires multiple GPUs (e.g., 8 GPUs for the provided examples).
  • Dependencies: PyTorch, Transformers, and potentially vLLM for deployment.
  • Resources: Significant GPU memory is needed due to the large parameter counts.
  • Links:

Highlighted Details

  • MiniMax-Text-01: 456B total parameters, 45.9B activated per token, 1M context training, 4M inference context.
  • MiniMax-VL-01: Integrates a 303M ViT with MiniMax-Text-01, supports dynamic image resolutions up to 2016x2016.
  • Strong performance across various academic benchmarks (MMLU, GSM8k, HumanEval) and long-context evaluations (Needle In A Haystack, LongBench).
  • Supports INT8 quantization for reduced memory footprint.

Maintenance & Community

  • Official repository from MiniMax.
  • Contact: model@minimaxi.com for API and server inquiries.

Licensing & Compatibility

  • The specific license is not explicitly stated in the README, but models are available on Hugging Face, implying a permissive license for research and potentially commercial use. Further clarification is recommended.

Limitations & Caveats

  • The provided quick-start examples assume a distributed setup across multiple GPUs, indicating significant hardware requirements for effective use.
  • The README does not detail specific installation steps beyond Hugging Face model loading, and deployment guidance points to external tools like vLLM.
Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
37 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.