MiniMax-01  by MiniMax-AI

Large language & vision-language models based on linear attention

created 6 months ago
3,086 stars

Top 15.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official implementations for MiniMax-Text-01 and MiniMax-VL-01, large-scale language and vision-language models. These models are designed for researchers and developers seeking state-of-the-art performance in long-context understanding and multimodal tasks, offering advanced architectures and competitive benchmark results.

How It Works

MiniMax-Text-01 utilizes a hybrid attention mechanism combining Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE) to achieve a 1 million token context length during training and up to 4 million during inference. It employs parallel strategies like LASP+ and ETP for efficient scaling. MiniMax-VL-01 builds upon this by integrating a Vision Transformer (ViT) and a dynamic resolution mechanism, allowing it to process images at resolutions up to 2016x2016 while maintaining a 336x336 thumbnail for efficient multimodal understanding.

Quick Start & Requirements

  • Installation: Models are available via Hugging Face Transformers.
  • Hardware: Requires multiple GPUs (e.g., 8 GPUs for the provided examples).
  • Dependencies: PyTorch, Transformers, and potentially vLLM for deployment.
  • Resources: Significant GPU memory is needed due to the large parameter counts.
  • Links:

Highlighted Details

  • MiniMax-Text-01: 456B total parameters, 45.9B activated per token, 1M context training, 4M inference context.
  • MiniMax-VL-01: Integrates a 303M ViT with MiniMax-Text-01, supports dynamic image resolutions up to 2016x2016.
  • Strong performance across various academic benchmarks (MMLU, GSM8k, HumanEval) and long-context evaluations (Needle In A Haystack, LongBench).
  • Supports INT8 quantization for reduced memory footprint.

Maintenance & Community

  • Official repository from MiniMax.
  • Contact: model@minimaxi.com for API and server inquiries.

Licensing & Compatibility

  • The specific license is not explicitly stated in the README, but models are available on Hugging Face, implying a permissive license for research and potentially commercial use. Further clarification is recommended.

Limitations & Caveats

  • The provided quick-start examples assume a distributed setup across multiple GPUs, indicating significant hardware requirements for effective use.
  • The README does not detail specific installation steps beyond Hugging Face model loading, and deployment guidance points to external tools like vLLM.
Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
4
Star History
553 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
5 more.

Liger-Kernel by linkedin

0.6%
5k
Triton kernels for efficient LLM training
created 1 year ago
updated 2 days ago
Starred by Matei Zaharia Matei Zaharia(Cofounder of Databricks), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

LWM by LargeWorldModel

0.0%
7k
Multimodal autoregressive model for long-context video/text
created 1 year ago
updated 9 months ago
Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

ktransformers by kvcache-ai

0.4%
15k
Framework for LLM inference optimization experimentation
created 1 year ago
updated 3 days ago
Feedback? Help us improve.