MLEvolve  by InternScience

Autonomous system for end-to-end ML algorithm design and optimization

Created 1 month ago
257 stars

Top 98.2% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

MLEvolve is an open-source autonomous system for end-to-end machine learning algorithm design and optimization, targeting Kaggle-style competitions. It employs a multi-agent approach with Progressive Monte Carlo Graph Search (MCGS) and an experience-driven memory. This system automates complex ML engineering tasks, achieving state-of-the-art performance on benchmarks like MLE-bench within competitive resource constraints, benefiting researchers and engineers seeking advanced AutoML capabilities.

How It Works

MLEvolve utilizes a sophisticated MCGS framework enhanced by multi-agent collaboration and an experience-driven memory. Its core innovation, "Progressive MCGS with Cross-Branch Fusion," extends UCT search with adaptive exploration, stagnation detection, and parallel solution branch evolution, merging insights for novel candidates. The "Experience-Driven Memory" layer, using BM25 and FAISS, enables learning from past search history to avoid pitfalls. It supports flexible planning and code generation strategies, adaptively chosen based on search state.

Quick Start & Requirements

Setup involves preparing mle-bench, installing dependencies (pip install --no-deps -r requirements_*.txt), and configuring LLM API access (e.g., Gemini, GPT, Qwen) in config/config.yaml. Prerequisites include OpenAI-compatible LLM APIs with function calling support. Optional memory embedding model configuration exists. Execution is via bash run_single_task.sh [SERVER_ID] [DATASET_DIR] [TASK_ID]. Project details: https://internscience.github.io/MLEvolve/.

Highlighted Details

  • Ranked #1 on MLE-bench (12-hour budget) using Gemini-3-Pro-Preview.
  • Reported MLE-bench performance: 80.30% (Low), 57.89% (Medium), 42.22% (High) "Any Medal" rates.
  • Powers the coding/optimization module within InternAgent for autonomous scientific discovery.
  • Supports OpenAI-compatible APIs (GPT, Qwen, DeepSeek), recommending function-calling models.

Maintenance & Community

The codebase was open-sourced on February 14, 2026. It acknowledges contributions from AIDE, ML-Master, and InternAgent 1.5. Specific community channels or detailed roadmaps are not provided in the README.

Licensing & Compatibility

The README omits explicit license information. This requires clarification regarding terms of use, distribution, and compatibility, especially for commercial applications or integration into closed-source projects.

Limitations & Caveats

As a recently open-sourced project (Feb 2026), MLEvolve may be under active development. Performance is highly dependent on integrated LLMs and requires careful API/dataset configuration. The absence of explicit licensing is a significant adoption caveat.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
97 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect).

GITM by OpenGVLab

0%
638
LLM agent for Minecraft open-world environments
Created 2 years ago
Updated 2 years ago
Feedback? Help us improve.