LLM-SR by deep-symbolic-mathematics

LLM-powered scientific equation discovery and symbolic regression

Created 2 years ago

262 stars

Top 96.9% on SourcePulse

Project Summary

Summary

LLM-SR addresses scientific equation discovery and symbolic regression by integrating Large Language Models (LLMs) with evolutionary search. It targets researchers and engineers needing to uncover interpretable mathematical relationships from data, offering superior out-of-domain generalization and accuracy compared to existing methods.

How It Works

This approach leverages LLMs' scientific knowledge and code generation capabilities to propose equation hypotheses represented as program skeletons. These hypotheses are then refined through an evolutionary search process, guided by domain-specific priors. This hybrid methodology enables flexible exploration of the hypothesis space and enhances discovery accuracy.

Quick Start & Requirements

Installation involves creating a Python 3.11.7 conda environment (conda create -n llmsr python=3.11.7, conda activate llmsr) and installing dependencies via pip install -r requirements.txt or conda env create -f environment.yml. Python ≥ 3.9 is supported. For local execution, start an LLM server (e.g., Mixtral-8x7B) using run_server.sh or engine.py, requiring GPU/port setup. Alternatively, runs can utilize OpenAI's API by setting API_KEY and specifying --use_api True with an api_model.

Highlighted Details

Official implementation for the ICLR 2025 Oral paper "LLM-SR: Scientific Equation Discovery via Programming with Large Language Models".
Introduces LLM-SRBench, a new benchmark for scientific equation discovery (ICML 2025 Oral).
Demonstrates superior performance and out-of-domain generalization against state-of-the-art symbolic regression methods on physics, biology, and materials science problems.
Supports equation discovery using both local open-source LLMs and OpenAI's API.
Offers flexibility in optimization, supporting NumPy+BFGS and Torch+Adam.

Maintenance & Community

The project welcomes questions and issues via GitHub issues. Direct contact is available via email at parshinshojaee@vt.edu and mmeidani@andrew.cmu.edu. No specific community channels like Discord or Slack are listed.

Licensing & Compatibility

The repository is licensed under the MIT License, which generally permits broad use, including commercial applications. The project builds upon other open-source works like FunSearch and PySR.

Limitations & Caveats

The authors observed slightly better performance using NumPy+BFGS optimizers compared to Torch+Adam, attributing this to current LLM backbones' stronger proficiency in generating NumPy code. The README also emphasizes the limitations of existing benchmarks, suggesting the need for more robust evaluation datasets.

LLM-SR by deep-symbolic-mathematics

Explore Similar Projects

MathCoder by mathllm

awesome-compbio-chatgpt by csbl-br

tamingLLMs by souzatharsis

LLM-from-scratch by Mxoder

LLM-Research-Scripts by harishsg993010

era by google-research

skydiscover by skydiscover-ai

awesome-ai-for-science by ai4s-research

ToolUniverse by mims-harvard

AI-Papers-of-the-Week by dair-ai

mlc-llm by mlc-ai

llm-course by mlabonne