Ctx2Skill by S1s-Z

Autonomous skill discovery for LLM context learning

Created 3 months ago

332 stars

Top 82.2% on SourcePulse

Project Summary

Ctx2Skill is a self-evolving framework designed to enhance language models' ability to learn from complex, out-of-distribution contexts. It autonomously discovers, refines, and selects context-specific, natural-language skills without requiring human annotation or external feedback. This framework addresses the prohibitive cost of manual skill annotation for dense technical documents and the lack of feedback in automated skill construction, enabling LLMs to improve their context learning capabilities at inference time.

How It Works

The core of Ctx2Skill is a multi-agent self-play loop involving five distinct, frozen LM agent roles: Challenger, Reasoner, Judge, Proposer, and Generator. This adversarial loop allows the Challenger to generate probing tasks and rubrics, while the Reasoner attempts to solve them using evolving skill sets. The Judge provides verdicts, and the Proposer/Generator agents synthesize and materialize skill updates based on success and failure patterns. To prevent adversarial collapse and ensure generalization, a Cross-Time Replay mechanism collects representative tasks and re-evaluates historical skill sets, selecting the one that maximizes performance across both hard and easy probes.

Quick Start & Requirements

Installation: Clone the repository (git clone https://github.com/S1s-Z/Ctx2Skill.git) and navigate into the directory.
Prerequisites: Python 3.8+ and OpenAI-compatible API access are required.
Data Preparation: Download and place the CL-Bench dataset files (CL-bench-context-dedup.jsonl, CL-bench-with-task-delimiter.jsonl) in the project root. Evaluation logs and responses are also available.
Running Self-Play: Configure API keys (OPENAI_BASE_URL, OPENAI_API_KEY) and run python selfplay_loop.py with specified model configurations and parameters.
Inference: Use python infer.py with discovered skills for augmentation.
Evaluation: Run python eval_ignore_none.py to assess performance.
Links: Project repository: https://github.com/S1s-Z/Ctx2Skill.git, Paper: https://arxiv.org/abs/2604.27660.

Highlighted Details

Consistently improves solve rates across backbone models (GPT-4.1, GPT-5.1, GPT-5.2) on CL-bench tasks, with improvements up to +5.4%.
The framework autonomously discovers and refines skills, requiring no human annotation or external feedback.
Generated natural-language skills are pluggable into any language model at inference time.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or roadmaps are provided in the README.

Licensing & Compatibility

This project is released under the MIT License, which is permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The README suggests that GPT-5.2 yields the most consistent results during early experiments, implying potential variability in performance with other models or configurations. The framework's reliance on OpenAI-compatible APIs introduces an external dependency and potential cost factor.

Ctx2Skill by S1s-Z

Explore Similar Projects

awesome-in-context-rl by dunnolab

MEM1 by MIT-MI

MemSkill by ViktorAxelsen

Fat-Cat by answeryt

OpenCE by sci-m-wang

AutoSkill by ECNU-ICALK

SkillRL by aiming-lab

ace by ace-agent

agentic-context-engine by kayba-ai

MetaClaw by aiming-lab

Voyager by MineDojo

SkillOpt by microsoft