LLM-SR  by deep-symbolic-mathematics

LLM-powered scientific equation discovery and symbolic regression

Created 2 years ago
254 stars

Top 99.1% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

LLM-SR addresses scientific equation discovery and symbolic regression by integrating Large Language Models (LLMs) with evolutionary search. It targets researchers and engineers needing to uncover interpretable mathematical relationships from data, offering superior out-of-domain generalization and accuracy compared to existing methods.

How It Works

This approach leverages LLMs' scientific knowledge and code generation capabilities to propose equation hypotheses represented as program skeletons. These hypotheses are then refined through an evolutionary search process, guided by domain-specific priors. This hybrid methodology enables flexible exploration of the hypothesis space and enhances discovery accuracy.

Quick Start & Requirements

Installation involves creating a Python 3.11.7 conda environment (conda create -n llmsr python=3.11.7, conda activate llmsr) and installing dependencies via pip install -r requirements.txt or conda env create -f environment.yml. Python ≥ 3.9 is supported. For local execution, start an LLM server (e.g., Mixtral-8x7B) using run_server.sh or engine.py, requiring GPU/port setup. Alternatively, runs can utilize OpenAI's API by setting API_KEY and specifying --use_api True with an api_model.

Highlighted Details

  • Official implementation for the ICLR 2025 Oral paper "LLM-SR: Scientific Equation Discovery via Programming with Large Language Models".
  • Introduces LLM-SRBench, a new benchmark for scientific equation discovery (ICML 2025 Oral).
  • Demonstrates superior performance and out-of-domain generalization against state-of-the-art symbolic regression methods on physics, biology, and materials science problems.
  • Supports equation discovery using both local open-source LLMs and OpenAI's API.
  • Offers flexibility in optimization, supporting NumPy+BFGS and Torch+Adam.

Maintenance & Community

The project welcomes questions and issues via GitHub issues. Direct contact is available via email at parshinshojaee@vt.edu and mmeidani@andrew.cmu.edu. No specific community channels like Discord or Slack are listed.

Licensing & Compatibility

The repository is licensed under the MIT License, which generally permits broad use, including commercial applications. The project builds upon other open-source works like FunSearch and PySR.

Limitations & Caveats

The authors observed slightly better performance using NumPy+BFGS optimizers compared to Torch+Adam, attributing this to current LLM backbones' stronger proficiency in generating NumPy code. The README also emphasizes the limitations of existing benchmarks, suggesting the need for more robust evaluation datasets.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Maxime Labonne Maxime Labonne(Head of Post-Training at Liquid AI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
19 more.

llm-course by mlabonne

0.2%
80k
LLM course with roadmaps and notebooks
Created 3 years ago
Updated 4 months ago
Feedback? Help us improve.