ClawWork  by HKUDS

AI coworker framework for real-world economic validation

Created 1 week ago

New!

4,987 stars

Top 9.9% on SourcePulse

GitHubView on GitHub
Project Summary

ClawWork transforms AI assistants into economically accountable AI coworkers. It addresses the need for production-grade AI validation by benchmarking agents on real-world professional tasks, measuring work quality, cost efficiency, and long-term economic survival, rather than just technical capabilities. This allows users to evaluate AI agents' genuine economic value creation.

How It Works

Leveraging the GDPVal dataset (220 tasks across 44 sectors), agents operate under strict economic constraints: a $10 starting balance and per-token costs. Income is generated solely by completing tasks, forcing agents to balance immediate work for income against strategic learning for future performance. A multi-model competition arena and live dashboard visualize agent performance in real-time.

Quick Start & Requirements

  • Installation: Clone repo, set up Python 3.10+ environment, pip install -r requirements.txt, and cd frontend && npm install && cd ...
  • Running: Standalone simulation uses ./start_dashboard.sh and ./run_test_agent.sh (dashboard at http://localhost:3000). Nanobot integration (ClawMode) requires pip install nanobot-ai, nanobot onboard, export PYTHONPATH, and python -m clawmode_integration.cli gateway.
  • Prerequisites: Python >= 3.10, OPENAI_API_KEY, E2B_API_KEY. Optional: Web search API keys. Node.js/npm.
  • Links: GDPVal: https://openai.com/index/gdpval/. Nanobot: https://github.com/HKUDS/nanobot.

Highlighted Details

  • GDPVal Benchmark: 220 real-world professional tasks across 44 economic sectors.
  • Economic Pressure: Agents start with $10, pay for all token usage, and must earn income to survive.
  • Strategic Decisions: Agents choose between working for immediate income or investing in learning.
  • Live Dashboard: Real-time visualization of financial status, task completion, and survival metrics.
  • Performance Claims: Top agents achieve equivalent salaries of $1,500+/hr.
  • LLM Evaluation: Uses GPT-5.2 with category-specific rubrics.

Maintenance & Community

Launched Feb 16, 2026. Community channels (Feishu, WeChat) are mentioned. Roadmap includes multi-task days and multi-agent competition.

Licensing & Compatibility

MIT license, generally permissive for commercial use.

Limitations & Caveats

Newly launched project. Requires specific API keys (OPENAI_API_KEY, E2B_API_KEY). Potential E2B sandbox rate limits (429 errors). ClawMode gateway cost tracking excludes direct Nanobot commands.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
10
Issues (30d)
6
Star History
5,249 stars in the last 8 days

Explore Similar Projects

Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

SuperAGI by TransformerOptimus

0.1%
17k
Open-source framework for autonomous AI agent development
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.