cuga-agent  by cuga-project

Enterprise agent for complex web and API task execution

Created 4 months ago
633 stars

Top 52.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

CUGA is an open-source generalist agent for enterprises, simplifying complex web/API task execution. It targets engineers and power users by offering a configurable, composable architecture for faster deployment of domain-specific agents with reduced complexity.

How It Works

CUGA combines planner-executor and code-act patterns with structured planning and variable management for reliability. It features configurable reasoning modes (fast to accurate) and flexible tool integration (OpenAPI, MCP, Langchain). Its composable design allows CUGA to act as a tool in multi-agent systems, with experimental policy-aware execution and save/reuse capabilities.

Quick Start & Requirements

  • Primary Install: Clone repo, set up Python 3.12 uv environment, uv sync, configure .env with API keys.
  • Run Command: cuga start demo.
  • Prerequisites: Python 3.12+, uv. Optional: Docker/Podman (sandbox), Playwright (hybrid).
  • Links: Docs: https://docs.cuga.dev, Discord: https://discord.gg/aH6rAEEW, Hugging Face: https://huggingface.co/spaces/ibm-research/cuga-agent.

Highlighted Details

  • Benchmark Performance: #1 on AppWorld (750 tasks) and top-tier on WebArena.
  • Key Features: High-performance generalist, configurable reasoning modes, seamless OpenAPI/MCP/Langchain integration.
  • Composable Architecture: CUGA can be a tool for other agents, enabling nested reasoning.
  • Task Modes: API-only, Web-only (extension), and Hybrid modes for versatile workflows.
  • Security: Optional secure code execution sandbox via Docker/Podman.

Maintenance & Community

The project encourages community contributions (use cases, features, bugs) via GitHub Issues. A Discord server is available for engagement. The roadmap includes policy support and performance enhancements.

Licensing & Compatibility

The specific open-source license is not explicitly stated in the provided README content. Commercial use compatibility is also not detailed.

Limitations & Caveats

Features like policy-aware instructions, human-in-the-loop, and save-and-reuse are experimental. Default local Python execution is faster but less secure than the optional Docker/Podman sandbox.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
404 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.