EvoSkill  by sentient-agi

AI agent skill discovery and self-improvement framework

Created 3 weeks ago

New!

295 stars

Top 89.9% on SourcePulse

GitHubView on GitHub
Project Summary

EvoSkill is an open-source framework designed to automatically discover and synthesize reusable agent skills, significantly enhancing AI agent performance on long-horizon tasks, particularly in coding. It targets researchers and engineers seeking to move beyond manual prompt engineering by providing a self-improving system that iterates on agent configurations.

How It Works

EvoSkill employs an evolutionary loop to refine agent performance. It begins with a base agent attempting benchmark questions, then analyzes failures to propose targeted skill or prompt improvements. A generator creates these changes, and an evaluator scores the new program variants on a validation set. The framework maintains a "frontier" of the top-N performing programs, tracked via git branches for full reproducibility, ensuring that only the best configurations survive to the next iteration.

Quick Start & Requirements

  • Installation: Use uv sync or pip install -e ..
  • Prerequisites: Python 3.12+, uv (recommended) or pip, Docker (for LiveCodeBench evaluation), and an API key for the agent SDK (e.g., export ANTHROPIC_API_KEY=your-key-here).
  • SDK and Model Selection: Configure via --sdk and --model flags (e.g., claude, opencode with deepseek-ai/DeepSeek-V3, google/gemini-2.0-flash-exp).
  • Dataset Preparation: Datasets must be placed in the .dataset/ directory (e.g., .dataset/dabstep_data.csv, .dataset/seal-0.csv).
  • Links: Paper (Preprint) is mentioned but no URL provided. Python API documentation is available within the codebase (src.api).

Highlighted Details

  • Validated on benchmarks like DABStep, SEAL-QA, and OfficeQA, demonstrating performance matching or exceeding hand-tuned agent configurations.
  • Full reproducibility is achieved by tracking agent configurations (system prompt + skills) as git branches.
  • Offers a high-level Python API for programmatic control of the self-improvement loop and evaluations.
  • Designed for extensibility, allowing the addition of new tasks, agents, and custom scorers.

Maintenance & Community

The project is associated with research efforts, indicated by a 2025 Zenodo publication for a related framework. Specific community channels (like Discord or Slack) or detailed contributor information are not detailed in the README.

Licensing & Compatibility

The project is licensed under the Apache 2.0 License. This license is permissive and generally compatible with commercial use and linking within closed-source projects.

Limitations & Caveats

Operation requires specific API keys for agent SDKs and Docker for secure code execution during evaluations. Manual preparation of benchmark datasets into the specified directory structure is necessary before running the framework.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
295 stars in the last 22 days

Explore Similar Projects

Starred by Bryan Helmig Bryan Helmig(Cofounder of Zapier) and Jared Palmer Jared Palmer(SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX).

dspyground by karthikscale3

0.3%
306
Optimize AI agent prompts with DSPy GEPA
Created 6 months ago
Updated 1 month ago
Starred by Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), and
3 more.

Trace by microsoft

0.3%
711
AutoDiff-like tool for end-to-end AI agent training with general feedback
Created 1 year ago
Updated 3 months ago
Feedback? Help us improve.