ClawWork by HKUDS

AI coworker framework for real-world economic validation

Created 4 months ago

8,227 stars

Top 6.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Travis Fischer

Founder of Agentic

Project Summary

ClawWork transforms AI assistants into economically accountable AI coworkers. It addresses the need for production-grade AI validation by benchmarking agents on real-world professional tasks, measuring work quality, cost efficiency, and long-term economic survival, rather than just technical capabilities. This allows users to evaluate AI agents' genuine economic value creation.

How It Works

Leveraging the GDPVal dataset (220 tasks across 44 sectors), agents operate under strict economic constraints: a $10 starting balance and per-token costs. Income is generated solely by completing tasks, forcing agents to balance immediate work for income against strategic learning for future performance. A multi-model competition arena and live dashboard visualize agent performance in real-time.

Quick Start & Requirements

Installation: Clone repo, set up Python 3.10+ environment, pip install -r requirements.txt, and cd frontend && npm install && cd ...
Running: Standalone simulation uses ./start_dashboard.sh and ./run_test_agent.sh (dashboard at http://localhost:3000). Nanobot integration (ClawMode) requires pip install nanobot-ai, nanobot onboard, export PYTHONPATH, and python -m clawmode_integration.cli gateway.
Prerequisites: Python >= 3.10, OPENAI_API_KEY, E2B_API_KEY. Optional: Web search API keys. Node.js/npm.
Links: GDPVal: https://openai.com/index/gdpval/. Nanobot: https://github.com/HKUDS/nanobot.

Highlighted Details

GDPVal Benchmark: 220 real-world professional tasks across 44 economic sectors.
Economic Pressure: Agents start with $10, pay for all token usage, and must earn income to survive.
Strategic Decisions: Agents choose between working for immediate income or investing in learning.
Live Dashboard: Real-time visualization of financial status, task completion, and survival metrics.
Performance Claims: Top agents achieve equivalent salaries of $1,500+/hr.
LLM Evaluation: Uses GPT-5.2 with category-specific rubrics.

Maintenance & Community

Launched Feb 16, 2026. Community channels (Feishu, WeChat) are mentioned. Roadmap includes multi-task days and multi-agent competition.

Licensing & Compatibility

MIT license, generally permissive for commercial use.

Limitations & Caveats

Newly launched project. Requires specific API keys (OPENAI_API_KEY, E2B_API_KEY). Potential E2B sandbox rate limits (429 errors). ClawMode gateway cost tracking excludes direct Nanobot commands.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

58 stars in the last 30 days