MiniMax-M1  by MiniMax-AI

Open-weight reasoning model with hybrid attention

created 2 months ago
2,806 stars

Top 17.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

MiniMax-M1 is an open-weight, large-scale hybrid-attention reasoning model designed for complex tasks requiring extensive context processing and reasoning. It targets researchers and developers building advanced AI agents, offering significant efficiency gains and strong performance on benchmarks involving coding, software engineering, and long-context understanding.

How It Works

MiniMax-M1 employs a hybrid Mixture-of-Experts (MoE) architecture coupled with a "lightning attention" mechanism. This combination allows for efficient scaling of test-time compute, activating only a fraction of its 456 billion total parameters per token. The model natively supports a 1 million token context length and achieves this with significantly lower FLOPs compared to other models at extended sequence lengths. It was trained using reinforcement learning, featuring a novel CISPO algorithm for importance sampling.

Quick Start & Requirements

  • Installation: Download model weights from HuggingFace. Recommended deployment via vLLM or Transformers.
  • Prerequisites: GPU recommended for inference. Specific hardware requirements depend on the chosen deployment method (vLLM/Transformers).
  • Resources: Model weights are large. Inference requires substantial VRAM.
  • Links: MiniMax-M1-40k, MiniMax-M1-80k, vLLM Deployment Guide, Transformers Deployment Guide

Highlighted Details

  • Supports a 1 million token context length.
  • Consumes 25% of FLOPs compared to DeepSeek R1 at 100K token generation length.
  • Outperforms strong open-weight models on software engineering and long-context tasks.
  • Supports function calling for external tool integration.

Maintenance & Community

The project is associated with MiniMax AI. Contact is available via email at model@minimax.io. Further community or roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The model weights are open-weight. Specific licensing terms for commercial use or closed-source linking are not detailed in the README.

Limitations & Caveats

The README does not specify licensing restrictions for commercial use. Performance on certain benchmarks, like HLE (no tools), shows a decrease compared to some competitors. The methodology for SWE-bench excludes 14 test cases due to infrastructure incompatibility.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
147 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Feedback? Help us improve.