reasoning-with-sampling by aakaran

PyTorch implementation for enhancing LLM reasoning capabilities

Created 3 months ago

371 stars

Top 76.2% on SourcePulse

Project Summary

Summary This repository offers the official PyTorch implementation for "Reasoning with Sampling," a technique designed to enhance the reasoning capabilities of large language models (LLMs). It targets AI researchers and practitioners seeking to improve LLM performance on complex tasks by leveraging a novel sampling approach. The primary benefit is unlocking a base LLM's latent reasoning potential.

How It Works The project implements "power sampling" as its core methodology for LLMs. This involves running sampling scripts on various established benchmarks, including MATH500, HumanEval, GPQA Diamond, and AlpacaEval 2.0. The outputs generated from this sampling process are then utilized to evaluate the LLM's single-shot reasoning abilities and its Pass@k performance. The approach aims to reveal and utilize the inherent reasoning capacity within existing LLM architectures.

Quick Start & Requirements

Installation: Clone the repository (git clone https://github.com/aakaran/reasoning-with-sampling.git), navigate into the directory (cd reasoning-with-sampling), create and activate a Conda environment (conda env create -f environment.yml, conda activate psamp).
Prerequisites: Conda package and environment management. Slurm scripts are provided, suggesting suitability for cluster environments. PyTorch is the underlying framework.
Running: Use bash scripts for setup and sbatch for executing power sampling jobs (e.g., sbatch llm_experiments/scripts/power_samp_math.sh). Evaluation is performed using Python scripts like python llm_experiments/eval_math.py.
Links: Project Page (mentioned, not linked), Official AlpacaEval repo (for its evaluation instructions).

Highlighted Details

Supports evaluation on MATH500, HumanEval, GPQA Diamond, and AlpacaEval 2.0 benchmarks.
Provides mechanisms for evaluating single-shot reasoning and Pass@k performance metrics.
Generates detailed .csv files containing responses, correct answers, and original prompts for in-depth analysis.
Includes functionality to plot Pass@k performance across different configurations.

Maintenance & Community No information regarding maintainers, community channels (like Discord/Slack), or project roadmaps is available in the provided README.

Licensing & Compatibility The README does not specify the project's license or provide any details on compatibility for commercial use or integration with closed-source projects.

Limitations & Caveats The README does not explicitly detail project limitations, alpha status, or known bugs. However, the inclusion of Slurm scripts suggests that a distributed computing environment may be necessary or highly beneficial for running the full suite of experiments, potentially posing an adoption barrier for users without such infrastructure.

reasoning-with-sampling by aakaran

Explore Similar Projects

Awesome-Long2short-on-LRMs by Hongcheng-Gao

ThinkMesh by martianlantern

EXAONE-Deep by LG-AI-EXAONE

Husky-v1 by agent-husky

Awesome-Efficient-Reasoning by hemingkx

ReasonFlux by Gen-Verse

One-Shot-RLVR by ypwang61

deepconf by facebookresearch

lightning-thunder by Lightning-AI

rStar by zhentingqi

Skills by NVIDIA-NeMo

torchtune by meta-pytorch