Slow_Thinking_with_LLMs by RUCAIBox

Collection of technical reports on slow thinking with LLMs

Created 1 year ago

754 stars

Top 46.2% on SourcePulse

Project Summary

This repository provides a collection of technical reports and open-sourced models focused on enhancing Large Language Model (LLM) reasoning capabilities through "slow-thinking" techniques. It targets researchers and developers aiming to improve LLM performance on complex tasks like mathematical problem-solving and information retrieval, offering novel frameworks and reproducible results.

How It Works

The project explores various methods to elicit and improve "slow-thinking" or step-by-step reasoning in LLMs. Key approaches include reward-guided tree search, outcome-based reinforcement learning (RL) for search capabilities, knowledge distillation, self-distillation, and tool manipulation. These techniques aim to guide LLMs through more deliberate and structured reasoning processes, reducing hallucinations and improving accuracy on challenging benchmarks.

Quick Start & Requirements

Install/Run: The README provides a Python code snippet using transformers and vllm for loading and running models.
Prerequisites: Requires Python, transformers, and vllm. Specific models may have varying hardware requirements (e.g., vllm with tensor_parallel_size=8 suggests multi-GPU usage).
Resources: The vllm example specifies gpu_memory_utilization=0.95 and max_model_len=int(1.5 * 20000), indicating significant GPU memory and VRAM are needed.
Links: Project pages and Hugging Face model repositories are linked for specific components.

Highlighted Details

SimpleDeepSearcher: Framework for autonomous web search using knowledge distillation, outperforming RL approaches.
OlymMATH: A challenging benchmark of 200 Olympiad-level math problems in English and Chinese, highlighting LLM limitations.
R1-Searcher: RL-based approach for LLM search capabilities without distillation or SFT.
STILL-3-Tool-32B: Achieves 81.70% accuracy on AIME 2024 using Python code and tool manipulation.
Virgo: A multimodal slow-thinking model demonstrating transferability of reasoning from text to vision.

Maintenance & Community

The project is actively updated with recent reports and model releases (as of April 2025). Links to Hugging Face and Notion pages are provided for specific projects.

Licensing & Compatibility

The README does not explicitly state a single overarching license for the repository's content. Individual models and datasets may have different licenses, with some explicitly open-sourced for research purposes. Compatibility for commercial use is not specified.

Limitations & Caveats

The project acknowledges that its exploration is preliminary, with a capacity gap compared to industry-level systems. Future work focuses on scaling training approaches and extending capabilities to more complex tasks. Some models are released as previews or for research purposes only.

Slow_Thinking_with_LLMs by RUCAIBox

Explore Similar Projects

ToRL by GAIR-NLP

Tool-Star by RUC-NLPIR

Awesome-RL-based-LLM-Reasoning by bruno686

POLARIS by ChenxinAn-fdu

DeepResearcher by GAIR-NLP

R1-Searcher by RUCAIBox

TTRL by PRIME-RL

PRIME by PRIME-RL

ZeroSearch by Alibaba-NLP

WebThinker by RUC-NLPIR

Logic-RL by Unakar

Awesome-LLM-Reasoning by atfortes