Ctx2Skill  by S1s-Z

Autonomous skill discovery for LLM context learning

Created 1 month ago
285 stars

Top 91.8% on SourcePulse

GitHubView on GitHub
Project Summary

Ctx2Skill is a self-evolving framework designed to enhance language models' ability to learn from complex, out-of-distribution contexts. It autonomously discovers, refines, and selects context-specific, natural-language skills without requiring human annotation or external feedback. This framework addresses the prohibitive cost of manual skill annotation for dense technical documents and the lack of feedback in automated skill construction, enabling LLMs to improve their context learning capabilities at inference time.

How It Works

The core of Ctx2Skill is a multi-agent self-play loop involving five distinct, frozen LM agent roles: Challenger, Reasoner, Judge, Proposer, and Generator. This adversarial loop allows the Challenger to generate probing tasks and rubrics, while the Reasoner attempts to solve them using evolving skill sets. The Judge provides verdicts, and the Proposer/Generator agents synthesize and materialize skill updates based on success and failure patterns. To prevent adversarial collapse and ensure generalization, a Cross-Time Replay mechanism collects representative tasks and re-evaluates historical skill sets, selecting the one that maximizes performance across both hard and easy probes.

Quick Start & Requirements

  • Installation: Clone the repository (git clone https://github.com/S1s-Z/Ctx2Skill.git) and navigate into the directory.
  • Prerequisites: Python 3.8+ and OpenAI-compatible API access are required.
  • Data Preparation: Download and place the CL-Bench dataset files (CL-bench-context-dedup.jsonl, CL-bench-with-task-delimiter.jsonl) in the project root. Evaluation logs and responses are also available.
  • Running Self-Play: Configure API keys (OPENAI_BASE_URL, OPENAI_API_KEY) and run python selfplay_loop.py with specified model configurations and parameters.
  • Inference: Use python infer.py with discovered skills for augmentation.
  • Evaluation: Run python eval_ignore_none.py to assess performance.
  • Links: Project repository: https://github.com/S1s-Z/Ctx2Skill.git, Paper: https://arxiv.org/abs/2604.27660.

Highlighted Details

  • Consistently improves solve rates across backbone models (GPT-4.1, GPT-5.1, GPT-5.2) on CL-bench tasks, with improvements up to +5.4%.
  • The framework autonomously discovers and refines skills, requiring no human annotation or external feedback.
  • Generated natural-language skills are pluggable into any language model at inference time.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or roadmaps are provided in the README.

Licensing & Compatibility

This project is released under the MIT License, which is permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The README suggests that GPT-5.2 yields the most consistent results during early experiments, implying potential variability in performance with other models or configurations. The framework's reliance on OpenAI-compatible APIs introduces an external dependency and potential cost factor.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
156 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.