EvoSkill by sentient-agi

AI agent skill discovery and self-improvement framework

Created 4 months ago

1,009 stars

Top 36.2% on SourcePulse

Project Summary

EvoSkill is an open-source framework designed to automatically discover and synthesize reusable agent skills, significantly enhancing AI agent performance on long-horizon tasks, particularly in coding. It targets researchers and engineers seeking to move beyond manual prompt engineering by providing a self-improving system that iterates on agent configurations.

How It Works

EvoSkill employs an evolutionary loop to refine agent performance. It begins with a base agent attempting benchmark questions, then analyzes failures to propose targeted skill or prompt improvements. A generator creates these changes, and an evaluator scores the new program variants on a validation set. The framework maintains a "frontier" of the top-N performing programs, tracked via git branches for full reproducibility, ensuring that only the best configurations survive to the next iteration.

Quick Start & Requirements

Installation: Use uv sync or pip install -e ..
Prerequisites: Python 3.12+, uv (recommended) or pip, Docker (for LiveCodeBench evaluation), and an API key for the agent SDK (e.g., export ANTHROPIC_API_KEY=your-key-here).
SDK and Model Selection: Configure via --sdk and --model flags (e.g., claude, opencode with deepseek-ai/DeepSeek-V3, google/gemini-2.0-flash-exp).
Dataset Preparation: Datasets must be placed in the .dataset/ directory (e.g., .dataset/dabstep_data.csv, .dataset/seal-0.csv).
Links: Paper (Preprint) is mentioned but no URL provided. Python API documentation is available within the codebase (src.api).

Highlighted Details

Validated on benchmarks like DABStep, SEAL-QA, and OfficeQA, demonstrating performance matching or exceeding hand-tuned agent configurations.
Full reproducibility is achieved by tracking agent configurations (system prompt + skills) as git branches.
Offers a high-level Python API for programmatic control of the self-improvement loop and evaluations.
Designed for extensibility, allowing the addition of new tasks, agents, and custom scorers.

Maintenance & Community

The project is associated with research efforts, indicated by a 2025 Zenodo publication for a related framework. Specific community channels (like Discord or Slack) or detailed contributor information are not detailed in the README.

Licensing & Compatibility

The project is licensed under the Apache 2.0 License. This license is permissive and generally compatible with commercial use and linking within closed-source projects.

Limitations & Caveats

Operation requires specific API keys for agent SDKs and Docker for secure code execution during evaluations. Manual preparation of benchmark datasets into the specified directory structure is necessary before running the framework.

EvoSkill by sentient-agi

Explore Similar Projects

autoharness by kayba-ai

Awesome-Adaptation-of-Agentic-AI by pat-jj

auto-harness by neosigmaai

rosetta by griddynamics

agents-md by FerroxLabs

langchain-skills by langchain-ai

loopkit by Archive228

a-evolve by A-EVO-Lab

awesome-harness-engineering by ai-boost

harness-engineering by deusyu

hermes-agent-orange-book by alchaincyf

agent-lightning by microsoft