Kimi-k1.5  by MoonshotAI

Research paper on scaling reinforcement learning with LLMs

created 6 months ago
3,453 stars

Top 14.3% on sourcepulse

GitHubView on GitHub
Project Summary

Kimi k1.5 is a multimodal large language model designed to excel in tasks requiring long-context reasoning and complex problem-solving. It targets researchers and developers working with advanced AI, offering state-of-the-art performance on benchmarks like AIME, MATH-500, and LiveCodeBench, significantly outperforming models such as GPT-4o and Claude Sonnet 3.5.

How It Works

Kimi k1.5 leverages a simplistic yet effective Reinforcement Learning (RL) framework for LLMs, focusing on two key innovations: long context scaling and improved policy optimization. The model scales RL context windows to 128k, utilizing partial rollouts for training efficiency by reusing trajectory chunks. Policy optimization is enhanced through a variant of online mirror descent, coupled with an effective sampling strategy, length penalty, and optimized data recipes. This approach enables learned Chain-of-Thought (CoT) reasoning that exhibits planning, reflection, and correction capabilities without relying on complex methods like Monte Carlo tree search or process reward models. The model is also jointly trained on text and vision data, enabling multimodal reasoning.

Quick Start & Requirements

The README does not provide specific installation commands, quick start guides, or detailed requirements beyond the model's capabilities. It is implied that access would be through a research paper or a dedicated platform, with no direct code repository or executable provided in the README.

Highlighted Details

  • Achieves state-of-the-art short-CoT performance, outperforming GPT-4o and Claude Sonnet 3.5 on AIME, MATH-500, and LiveCodeBench by up to 550%.
  • Matches o1 performance across multiple modalities (MathVista, AIME, Codeforces) in long-CoT tasks.
  • Demonstrates planning, reflection, and correction capabilities through scaled context and RL optimization.
  • Jointly trained on text and vision data for multimodal reasoning.

Maintenance & Community

The project is associated with the "Kimi Team" and is presented as a research output with a citation to an arXiv preprint. No community channels, social handles, or roadmap information are provided in the README.

Licensing & Compatibility

The README does not specify a license. Given its research paper format, commercial use or integration into closed-source projects would require explicit licensing terms not detailed here.

Limitations & Caveats

The README does not detail any limitations, known bugs, or deprecation status. The absence of installation instructions or a public code repository suggests it may be a research preview or not yet publicly released for direct use.

Health Check
Last commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
160 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.