Discover and explore top open-source AI tools and projects—updated daily.
deep-symbolic-mathematicsLLM-powered scientific equation discovery and symbolic regression
Top 99.1% on SourcePulse
Summary
LLM-SR addresses scientific equation discovery and symbolic regression by integrating Large Language Models (LLMs) with evolutionary search. It targets researchers and engineers needing to uncover interpretable mathematical relationships from data, offering superior out-of-domain generalization and accuracy compared to existing methods.
How It Works
This approach leverages LLMs' scientific knowledge and code generation capabilities to propose equation hypotheses represented as program skeletons. These hypotheses are then refined through an evolutionary search process, guided by domain-specific priors. This hybrid methodology enables flexible exploration of the hypothesis space and enhances discovery accuracy.
Quick Start & Requirements
Installation involves creating a Python 3.11.7 conda environment (conda create -n llmsr python=3.11.7, conda activate llmsr) and installing dependencies via pip install -r requirements.txt or conda env create -f environment.yml. Python ≥ 3.9 is supported. For local execution, start an LLM server (e.g., Mixtral-8x7B) using run_server.sh or engine.py, requiring GPU/port setup. Alternatively, runs can utilize OpenAI's API by setting API_KEY and specifying --use_api True with an api_model.
Highlighted Details
Maintenance & Community
The project welcomes questions and issues via GitHub issues. Direct contact is available via email at parshinshojaee@vt.edu and mmeidani@andrew.cmu.edu. No specific community channels like Discord or Slack are listed.
Licensing & Compatibility
The repository is licensed under the MIT License, which generally permits broad use, including commercial applications. The project builds upon other open-source works like FunSearch and PySR.
Limitations & Caveats
The authors observed slightly better performance using NumPy+BFGS optimizers compared to Torch+Adam, attributing this to current LLM backbones' stronger proficiency in generating NumPy code. The README also emphasizes the limitations of existing benchmarks, suggesting the need for more robust evaluation datasets.
10 months ago
Inactive
dair-ai
mlc-ai
mlabonne