MiniMax-M1 by MiniMax-AI

Open-weight reasoning model with hybrid attention

Created 5 months ago

2,999 stars

Top 15.8% on SourcePulse

View on GitHub

4 Experts Love This Project

Cofounder of Lightning AI

Pawel Garbacki

Cofounder of Fireworks AI

Project Summary

MiniMax-M1 is an open-weight, large-scale hybrid-attention reasoning model designed for complex tasks requiring extensive context processing and reasoning. It targets researchers and developers building advanced AI agents, offering significant efficiency gains and strong performance on benchmarks involving coding, software engineering, and long-context understanding.

How It Works

MiniMax-M1 employs a hybrid Mixture-of-Experts (MoE) architecture coupled with a "lightning attention" mechanism. This combination allows for efficient scaling of test-time compute, activating only a fraction of its 456 billion total parameters per token. The model natively supports a 1 million token context length and achieves this with significantly lower FLOPs compared to other models at extended sequence lengths. It was trained using reinforcement learning, featuring a novel CISPO algorithm for importance sampling.

Quick Start & Requirements

Installation: Download model weights from HuggingFace. Recommended deployment via vLLM or Transformers.
Prerequisites: GPU recommended for inference. Specific hardware requirements depend on the chosen deployment method (vLLM/Transformers).
Resources: Model weights are large. Inference requires substantial VRAM.
Links: MiniMax-M1-40k, MiniMax-M1-80k, vLLM Deployment Guide, Transformers Deployment Guide

Highlighted Details

Supports a 1 million token context length.
Consumes 25% of FLOPs compared to DeepSeek R1 at 100K token generation length.
Outperforms strong open-weight models on software engineering and long-context tasks.
Supports function calling for external tool integration.

Maintenance & Community

The project is associated with MiniMax AI. Contact is available via email at model@minimax.io. Further community or roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The model weights are open-weight. Specific licensing terms for commercial use or closed-source linking are not detailed in the README.

Limitations & Caveats

The README does not specify licensing restrictions for commercial use. Performance on certain benchmarks, like HLE (no tools), shows a decrease compared to some competitors. The methodology for SWE-bench excludes 14 test cases due to infrastructure incompatibility.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

65 stars in the last 30 days