Kimi-K2  by MoonshotAI

State-of-the-art MoE language model

created 1 month ago
7,367 stars

Top 7.2% on sourcepulse

GitHubView on GitHub
Project Summary

Kimi K2 is a series of large language models developed by Moonshot AI, featuring a Mixture-of-Experts (MoE) architecture. It offers both a base model for fine-tuning and an instruct-tuned version optimized for chat and agentic capabilities, targeting researchers and developers building AI applications.

How It Works

Kimi K2 utilizes a 1 trillion total parameter MoE architecture with 32 billion activated parameters, trained using the novel Muon optimizer. This approach allows for efficient scaling and improved performance across various tasks, particularly excelling in agentic intelligence, tool use, and complex reasoning. The model boasts a 128K context length and a 160K vocabulary size.

Quick Start & Requirements

Model checkpoints are available on Huggingface in block-fp8 format. Recommended inference engines include vLLM, SGLang, KTransformers, and TensorRT-LLM. Deployment examples for vLLM and SGLang are provided in the Model Deployment Guide.

Highlighted Details

  • Achieves state-of-the-art (SOTA) performance on several coding benchmarks, including LiveCodeBench v6 (Pass@1: 53.7) and SWE-bench Verified (Agentless Coding Acc: 51.8, Agentic Coding Acc: 65.8).
  • Demonstrates strong tool-calling capabilities, with examples provided for integrating custom tools.
  • Offers an OpenAI/Anthropic-compatible API for easy integration.
  • Supports a 128K context length.

Maintenance & Community

Contact for questions or concerns is support@moonshot.cn.

Licensing & Compatibility

Released under the Modified MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

Some evaluation data points were omitted due to prohibitive costs. The README mentions a paper link is "coming soon."

Health Check
Last commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
55
Star History
7,478 stars in the last 90 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-V2 by deepseek-ai

0.1%
5k
MoE language model for research/API use
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Feedback? Help us improve.