r1-overthinker  by qunash

Gradio app for extending DeepSeek R1 reasoning

Created 7 months ago
370 stars

Top 76.3% on SourcePulse

GitHubView on GitHub
Project Summary

This project enables users to extend the reasoning capabilities of DeepSeek R1 models, allowing them to "overthink" and produce more thorough responses. It targets researchers and power users seeking deeper insights from LLMs by providing fine-grained control over the generation process and supporting unlimited context length, limited only by VRAM.

How It Works

The core mechanism involves intercepting early model conclusions and replacing them with prompts that encourage further deliberation. This "budget forcing" technique, validated by the independent "s1: Simple test-time scaling" paper, allows for controlled extension of the model's thinking process until a user-defined threshold is met. It leverages unsloth-optimized models for enhanced performance and VRAM efficiency.

Quick Start & Requirements

  • Install via pip install -e .
  • Requires Python 3.10+ and PyTorch.
  • Supports various DeepSeek R1 models (1.5B to 70B parameters), including Qwen and LLaMA architectures.
  • Models up to 14B parameters can run on a free Google Colab T4 GPU.
  • See unsloth for optimization details.

Highlighted Details

  • Forces models to think longer and more thoroughly.
  • Customizable reasoning extensions and thinking thresholds.
  • Fine-grained control over model parameters (temperature, top-p).
  • Visible thinking process with token count tracking.

Maintenance & Community

  • Developed by anzorq.
  • Credits original idea to vgel's gist.
  • Utilizes unsloth for optimization and Gradio for the app interface.

Licensing & Compatibility

  • MIT License.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The effectiveness of "overthinking" may vary depending on the specific model and task. The project relies on unsloth optimizations, which might introduce specific dependencies or behaviors.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

0.2%
462
MoE model for research
Created 4 months ago
Updated 4 weeks ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

streaming-llm by mit-han-lab

0.1%
7k
Framework for efficient LLM streaming
Created 2 years ago
Updated 1 year ago
Starred by Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
11 more.

mistral.rs by EricLBuehler

0.3%
6k
LLM inference engine for blazing fast performance
Created 1 year ago
Updated 1 day ago
Starred by Lianmin Zheng Lianmin Zheng(Coauthor of SGLang, vLLM), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

MiniCPM by OpenBMB

0.4%
8k
Ultra-efficient LLMs for end devices, achieving 5x+ speedup
Created 1 year ago
Updated 1 week ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
36 more.

unsloth by unslothai

0.6%
46k
Finetuning tool for LLMs, targeting speed and memory efficiency
Created 1 year ago
Updated 14 hours ago
Feedback? Help us improve.