Soft-Thinking  by eric-ai-lab

Enhancing LLM reasoning via continuous concept spaces

Created 5 months ago
265 stars

Top 96.4% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Summary

This project provides the official implementation for "Soft Thinking," a method to enhance Large Language Model (LLM) reasoning capabilities by operating within a continuous concept space. It targets researchers and engineers seeking to unlock deeper analytical potential in LLMs, offering improved reasoning performance.

How It Works

Soft Thinking operates LLMs in a continuous concept space, enabling more nuanced reasoning than discrete token generation. The implementation incorporates optional Dirichlet and Gumbel-Softmax noise injection during sampling, detailed in a related study, to further explore and refine conceptual representations. This approach aims to demystify and improve LLM reasoning mechanisms.

Quick Start & Requirements

  • Installation: Requires Python 3.11. Setup involves creating a conda environment (st), installing core Python packages (torch, transformers, accelerate, flash_attn), and installing a tailored version of SGLang (sglang_soft_thinking_pkg). Docker installation is recommended for environment consistency.
  • Prerequisites: NVIDIA GPUs are necessary (H100s used for experiments). An OpenAI API key is required for LLM judges. flash_attn installation may take up to 20 minutes.
  • Links: Docker setup (docker.sh), environment configuration (configure.sh), baseline script (scripts/baseline/qwq32b.sh), and inference script (run_sglang_softthinking.py).

Highlighted Details

  • Official implementation of the NeurIPS 2025 paper "Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space."
  • Supports Dirichlet and Gumbel-Softmax noise for enhanced sampling strategies.
  • Employs LLM judges (e.g., gpt-4.1-2025-04-14) for result validation.
  • Demonstrates functionality with models like Qwen/QwQ-32B on benchmarks such as aime2024.
  • Provides specific hyperparameter ranges for optimization, including max_topk, min_p, and early stopping thresholds.

Maintenance & Community

The provided README does not detail community channels (e.g., Discord, Slack), roadmap, or notable contributors.

Licensing & Compatibility

The project features a dual licensing structure: original code is under the permissive MIT License, while the modified SGLang package (sglang_soft_thinking_pkg) is licensed under Apache License 2.0. Both licenses generally permit commercial use, with Apache 2.0 requiring standard attribution and notice.

Limitations & Caveats

Reproducibility across different hardware is challenging due to potential precision differences; Docker is strongly recommended. A multiprocessing bug affects coding benchmarks, necessitating a specific execution order using the --reeval flag. flash_attn installation can be lengthy.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
28 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.