Kimi-k1.5 by MoonshotAI

Research paper on scaling reinforcement learning with LLMs

Created 11 months ago

3,468 stars

Top 13.9% on SourcePulse

View on GitHub

8 Experts Love This Project

Didier Lopes

Founder of OpenBB

Lewis Tunstall

Research Engineer at Hugging Face

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Vincent Weisser

Cofounder of Prime Intellect

and 4 more!

Project Summary

Kimi k1.5 is a multimodal large language model designed to excel in tasks requiring long-context reasoning and complex problem-solving. It targets researchers and developers working with advanced AI, offering state-of-the-art performance on benchmarks like AIME, MATH-500, and LiveCodeBench, significantly outperforming models such as GPT-4o and Claude Sonnet 3.5.

How It Works

Kimi k1.5 leverages a simplistic yet effective Reinforcement Learning (RL) framework for LLMs, focusing on two key innovations: long context scaling and improved policy optimization. The model scales RL context windows to 128k, utilizing partial rollouts for training efficiency by reusing trajectory chunks. Policy optimization is enhanced through a variant of online mirror descent, coupled with an effective sampling strategy, length penalty, and optimized data recipes. This approach enables learned Chain-of-Thought (CoT) reasoning that exhibits planning, reflection, and correction capabilities without relying on complex methods like Monte Carlo tree search or process reward models. The model is also jointly trained on text and vision data, enabling multimodal reasoning.

Quick Start & Requirements

The README does not provide specific installation commands, quick start guides, or detailed requirements beyond the model's capabilities. It is implied that access would be through a research paper or a dedicated platform, with no direct code repository or executable provided in the README.

Highlighted Details

Achieves state-of-the-art short-CoT performance, outperforming GPT-4o and Claude Sonnet 3.5 on AIME, MATH-500, and LiveCodeBench by up to 550%.
Matches o1 performance across multiple modalities (MathVista, AIME, Codeforces) in long-CoT tasks.
Demonstrates planning, reflection, and correction capabilities through scaled context and RL optimization.
Jointly trained on text and vision data for multimodal reasoning.

Maintenance & Community

The project is associated with the "Kimi Team" and is presented as a research output with a citation to an arXiv preprint. No community channels, social handles, or roadmap information are provided in the README.

Licensing & Compatibility

The README does not specify a license. Given its research paper format, commercial use or integration into closed-source projects would require explicit licensing terms not detailed here.

Limitations & Caveats

The README does not detail any limitations, known bugs, or deprecation status. The absence of installation instructions or a public code repository suggests it may be a research preview or not yet publicly released for direct use.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days