reasoning-with-sampling  by aakaran

PyTorch implementation for enhancing LLM reasoning capabilities

Created 1 month ago
282 stars

Top 92.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary This repository offers the official PyTorch implementation for "Reasoning with Sampling," a technique designed to enhance the reasoning capabilities of large language models (LLMs). It targets AI researchers and practitioners seeking to improve LLM performance on complex tasks by leveraging a novel sampling approach. The primary benefit is unlocking a base LLM's latent reasoning potential.

How It Works The project implements "power sampling" as its core methodology for LLMs. This involves running sampling scripts on various established benchmarks, including MATH500, HumanEval, GPQA Diamond, and AlpacaEval 2.0. The outputs generated from this sampling process are then utilized to evaluate the LLM's single-shot reasoning abilities and its Pass@k performance. The approach aims to reveal and utilize the inherent reasoning capacity within existing LLM architectures.

Quick Start & Requirements

  • Installation: Clone the repository (git clone https://github.com/aakaran/reasoning-with-sampling.git), navigate into the directory (cd reasoning-with-sampling), create and activate a Conda environment (conda env create -f environment.yml, conda activate psamp).
  • Prerequisites: Conda package and environment management. Slurm scripts are provided, suggesting suitability for cluster environments. PyTorch is the underlying framework.
  • Running: Use bash scripts for setup and sbatch for executing power sampling jobs (e.g., sbatch llm_experiments/scripts/power_samp_math.sh). Evaluation is performed using Python scripts like python llm_experiments/eval_math.py.
  • Links: Project Page (mentioned, not linked), Official AlpacaEval repo (for its evaluation instructions).

Highlighted Details

  • Supports evaluation on MATH500, HumanEval, GPQA Diamond, and AlpacaEval 2.0 benchmarks.
  • Provides mechanisms for evaluating single-shot reasoning and Pass@k performance metrics.
  • Generates detailed .csv files containing responses, correct answers, and original prompts for in-depth analysis.
  • Includes functionality to plot Pass@k performance across different configurations.

Maintenance & Community No information regarding maintainers, community channels (like Discord/Slack), or project roadmaps is available in the provided README.

Licensing & Compatibility The README does not specify the project's license or provide any details on compatibility for commercial use or integration with closed-source projects.

Limitations & Caveats The README does not explicitly detail project limitations, alpha status, or known bugs. However, the inclusion of Slurm scripts suggests that a distributed computing environment may be necessary or highly beneficial for running the full suite of experiments, potentially posing an adoption barrier for users without such infrastructure.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
4
Star History
287 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
15 more.

torchtune by meta-pytorch

0.2%
6k
PyTorch library for LLM post-training and experimentation
Created 2 years ago
Updated 1 day ago
Feedback? Help us improve.