hud-python by hud-evals

AI agent development and evaluation toolkit

Created 1 year ago

290 stars

Top 91.0% on SourcePulse

View on GitHub

2 Experts Love This Project

Will Brown

Research Lead at Prime Intellect

Eric Zhu

Coauthor of AutoGen; Research Scientist at Microsoft Research

Project Summary

This project offers an open-source Reinforcement Learning (RL) environment and evaluation toolkit designed to simplify the creation, benchmarking, and training of AI agents. It addresses the complexity of integrating diverse software into RL-ready environments, providing a unified platform for developers and researchers building sophisticated agents. The primary benefit is accelerated agent development through standardized interfaces and streamlined RL pipelines.

How It Works

The core of hud-python is the Meta-Communication Protocol (MCP), which standardizes agent-environment interaction. It enables wrapping any software as an RL environment, facilitating real-time telemetry for detailed inspection of agent actions and rewards. The system supports browser automation via integrations like AnchorBrowser and Steel, and offers a hot-reloading development loop (hud dev) for rapid iteration on environments. For training, it provides a streamlined hud rl command for one-click RL agent development, supporting both language-only and vision-language models.

Quick Start & Requirements

Installation involves pip install hud-python for the SDK and uv tool install hud-python@latest --python 3.12 for the CLI. Key prerequisites include obtaining a HUD_API_KEY from hud.ai and potentially an ANTHROPIC_API_KEY for specific agents. Docker is essential for environment development. Official documentation is available at docs.hud.ai.

Highlighted Details

MCP Environment Skeleton: Enables seamless integration of any agent with any environment.
Live Telemetry: Provides real-time insights into tool calls, observations, and rewards via hud.ai.
Public Benchmarks: Includes established benchmarks like OSWorld-Verified and SheetBench-50 for standardized evaluation.
Browser Automation: Integrates with cloud browsers (AnchorBrowser, Steel, BrowserBase) for web-based agent tasks.
Rapid Development: Features a hot-reload dev loop (hud dev) for iterative environment creation.
One-Click RL: Simplifies training complex RL agents, including multi-turn interactions with VLM models.

Maintenance & Community

The project actively welcomes contributors and feature requests. The roadmap indicates ongoing development, including expanding environment examples, agent framework integrations, and enhancing RL pipelines. While specific community channels like Discord/Slack are not detailed, direct engagement via issues or calls is encouraged.

Licensing & Compatibility

The project is released under the MIT License, which is permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

Local RL training necessitates a multi-GPU setup (typically 2+). Cloud-based features and hosted training incur costs, detailed in the pricing documentation. Obtaining necessary API keys is a prerequisite for utilizing cloud services and specific agent models.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

51 stars in the last 30 days