Soft-Thinking by eric-ai-lab

Enhancing LLM reasoning via continuous concept spaces

Created 7 months ago

296 stars

Top 89.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Wing Lian

Founder of Axolotl AI

Project Summary

Summary

This project provides the official implementation for "Soft Thinking," a method to enhance Large Language Model (LLM) reasoning capabilities by operating within a continuous concept space. It targets researchers and engineers seeking to unlock deeper analytical potential in LLMs, offering improved reasoning performance.

How It Works

Soft Thinking operates LLMs in a continuous concept space, enabling more nuanced reasoning than discrete token generation. The implementation incorporates optional Dirichlet and Gumbel-Softmax noise injection during sampling, detailed in a related study, to further explore and refine conceptual representations. This approach aims to demystify and improve LLM reasoning mechanisms.

Quick Start & Requirements

Installation: Requires Python 3.11. Setup involves creating a conda environment (st), installing core Python packages (torch, transformers, accelerate, flash_attn), and installing a tailored version of SGLang (sglang_soft_thinking_pkg). Docker installation is recommended for environment consistency.
Prerequisites: NVIDIA GPUs are necessary (H100s used for experiments). An OpenAI API key is required for LLM judges. flash_attn installation may take up to 20 minutes.
Links: Docker setup (docker.sh), environment configuration (configure.sh), baseline script (scripts/baseline/qwq32b.sh), and inference script (run_sglang_softthinking.py).

Highlighted Details

Official implementation of the NeurIPS 2025 paper "Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space."
Supports Dirichlet and Gumbel-Softmax noise for enhanced sampling strategies.
Employs LLM judges (e.g., gpt-4.1-2025-04-14) for result validation.
Demonstrates functionality with models like Qwen/QwQ-32B on benchmarks such as aime2024.
Provides specific hyperparameter ranges for optimization, including max_topk, min_p, and early stopping thresholds.

Maintenance & Community

The provided README does not detail community channels (e.g., Discord, Slack), roadmap, or notable contributors.

Licensing & Compatibility

The project features a dual licensing structure: original code is under the permissive MIT License, while the modified SGLang package (sglang_soft_thinking_pkg) is licensed under Apache License 2.0. Both licenses generally permit commercial use, with Apache 2.0 requiring standard attribution and notice.

Limitations & Caveats

Reproducibility across different hardware is challenging due to potential precision differences; Docker is strongly recommended. A multiprocessing bug affects coding benchmarks, necessitating a specific execution order using the --reeval flag. flash_attn installation can be lengthy.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days