LongCat-Flash-Chat  by meituan-longcat

A 560B parameter MoE language model optimized for efficiency and agentic tasks

Created 4 months ago
1,258 stars

Top 31.4% on SourcePulse

GitHubView on GitHub
Project Summary

LongCat-Flash-Chat is a 560 billion parameter language model designed for efficient computation and high performance, particularly in agentic tasks. It targets developers and researchers seeking a powerful, scalable LLM with advanced reasoning and tool-use capabilities, offering competitive performance against leading models.

How It Works

LongCat-Flash utilizes a Mixture-of-Experts (MoE) architecture with a "zero-computation experts" mechanism, dynamically activating 18.6B-31.3B parameters (average ~27B) per token based on context. This, combined with a Shortcut-connected MoE (ScMoE) design, expands the computation-communication overlap window, enabling efficient scaling and high-throughput inference (over 100 TPS). Its training strategy includes hyperparameter transfer, a model-growth initialization, a stability suite (router-gradient balancing, z-loss), and deterministic computation for reproducibility and error detection. Agentic capabilities are enhanced through a multi-stage training pipeline, including data fusion, extended context length (128k), and a multi-agent synthesis framework for generating complex tasks.

Quick Start & Requirements

  • Deployment: Adaptations available for SGLang and vLLM. Refer to the Deployment Guide.
  • Chat Website: https://longcat.ai
  • Prerequisites: Not explicitly detailed, but large-scale training implies significant computational resources (e.g., thousands of accelerators).

Highlighted Details

  • Achieves over 100 tokens per second (TPS) inference throughput cost-effectively.
  • Supports extended context length up to 128k tokens.
  • Demonstrates strong performance in agentic tasks, outperforming several leading models in τ²-Bench (telecom) and VitaBench.
  • Exhibits competitive scores across general domains, instruction following, mathematical reasoning, coding, and safety benchmarks.

Maintenance & Community

Licensing & Compatibility

  • License: MIT License for model weights and repository contributions.
  • Restrictions: Does not grant rights to use Meituan trademarks or patents. Compatible with commercial use under MIT terms.

Limitations & Caveats

The model has not been evaluated for every possible downstream application and may exhibit performance variations across languages. Developers must carefully assess accuracy, safety, and fairness, and comply with all applicable laws and regulations, especially in sensitive or high-risk scenarios.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
3
Star History
21 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.