LongCat-Flash-Thinking-2601  by meituan-longcat

Powerful Large Reasoning Model for agentic tasks

Created 4 months ago
256 stars

Top 98.5% on SourcePulse

GitHubView on GitHub
Project Summary

LongCat-Flash-Thinking-2601 is a 560 billion parameter Large Reasoning Model (LRM) employing a Mixture-of-Experts (MoE) architecture. It significantly enhances agentic reasoning capabilities through a novel training pipeline combining environment scaling, multi-environment reinforcement learning, and robust training against environmental noise. The model offers top-tier performance on agentic benchmarks, improved generalization to out-of-distribution scenarios, and a specialized "Heavy Thinking Mode" for extremely challenging tasks.

How It Works

The model leverages a domain-parallel training recipe and an innovative MoE architecture (560B total, 27B activated parameters). Its agentic capabilities are strengthened via:

  1. Environment Scaling & Multi-Environment RL: A diverse set of high-quality training environments, each featuring over 60 tools, are used for large-scale reinforcement learning. This approach, extending the DORA infrastructure, fosters generalizable agentic skills. Tasks are constructed with controlled complexity and diversity, with specialized strategies to manage database consistency challenges in large tool sets.
  2. Robust Training against Noisy Environment: To address real-world imperfections, environmental noise is systematically analyzed, injected into training environments, and trained using a curriculum strategy. This enhances the model's resilience to uncertainty.
  3. Heavy Thinking Mode: This mode decomposes complex problems into parallel thinking (broad exploration via multiple trajectories) and summarization (iterative reasoning loops) to scale both reasoning depth and width.

Quick Start & Requirements

Basic adaptations exist for deployment with SGLang and vLLM; detailed instructions are available in a separate Deployment Guide. Example usage in the README utilizes the transformers library. Specific local setup prerequisites (e.g., GPU, CUDA versions, Python versions) are not detailed. A chat interface is available at: https://longcat.ai.

Highlighted Details

  • Achieves state-of-the-art performance across various agentic benchmarks, including tool use, search, and reasoning, with scores often competitive with or exceeding leading models like Claude-Opus and Gemini-3-Pro.
  • Demonstrates substantially improved generalization in arbitrary out-of-distribution real-world agentic scenarios, evaluated via randomly generated complex tasks.
  • Heavy Thinking Mode (indicated by ‡ in benchmarks) further enhances performance on challenging tasks by enabling intensive parallel thinking and iterative reasoning.
  • Exhibits strong resilience to environmental uncertainty, consistently achieving improved performance under imperfect conditions due to robust training methodologies.

Maintenance & Community

Direct contact is available via longcat-team@meituan.com. A WeChat Group is also mentioned for community interaction. The model weights are actively used on the Longcat AI platform.

Licensing & Compatibility

Model weights are released under the MIT License. This license explicitly does not grant rights to use Meituan trademarks or patents. Standard LLM usage considerations apply regarding accuracy, safety, fairness, and compliance with applicable laws and regulations for downstream applications.

Limitations & Caveats

The model has not been comprehensively evaluated for every possible downstream application and may exhibit performance variations across languages. Developers must carefully assess accuracy, safety, and fairness before deployment in sensitive scenarios. Maintaining database consistency can be challenging when environments contain a large number of tools, potentially leading to unverifiable tasks. Compliance with all applicable laws and regulations is the responsibility of the user.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
8 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.