Kimi-K2  by MoonshotAI

State-of-the-art MoE language model

Created 2 months ago
8,205 stars

Top 6.3% on SourcePulse

GitHubView on GitHub
Project Summary

Kimi K2 is a series of large language models developed by Moonshot AI, featuring a Mixture-of-Experts (MoE) architecture. It offers both a base model for fine-tuning and an instruct-tuned version optimized for chat and agentic capabilities, targeting researchers and developers building AI applications.

How It Works

Kimi K2 utilizes a 1 trillion total parameter MoE architecture with 32 billion activated parameters, trained using the novel Muon optimizer. This approach allows for efficient scaling and improved performance across various tasks, particularly excelling in agentic intelligence, tool use, and complex reasoning. The model boasts a 128K context length and a 160K vocabulary size.

Quick Start & Requirements

Model checkpoints are available on Huggingface in block-fp8 format. Recommended inference engines include vLLM, SGLang, KTransformers, and TensorRT-LLM. Deployment examples for vLLM and SGLang are provided in the Model Deployment Guide.

Highlighted Details

  • Achieves state-of-the-art (SOTA) performance on several coding benchmarks, including LiveCodeBench v6 (Pass@1: 53.7) and SWE-bench Verified (Agentless Coding Acc: 51.8, Agentic Coding Acc: 65.8).
  • Demonstrates strong tool-calling capabilities, with examples provided for integrating custom tools.
  • Offers an OpenAI/Anthropic-compatible API for easy integration.
  • Supports a 128K context length.

Maintenance & Community

Contact for questions or concerns is support@moonshot.cn.

Licensing & Compatibility

Released under the Modified MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

Some evaluation data points were omitted due to prohibitive costs. The README mentions a paper link is "coming soon."

Health Check
Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
3
Issues (30d)
9
Star History
437 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.