nova-act  by aws

AI agents for scalable UI automation

Created 10 months ago
883 stars

Top 41.0% on SourcePulse

GitHubView on GitHub
Project Summary

Amazon Nova Act provides a Python SDK for building and deploying highly reliable AI agents that automate UI-based workflows at scale. It enables users to define complex browser automation tasks using natural language combined with Python code, with built-in escalation to human supervisors when necessary.

How It Works

The core approach leverages a Python SDK to interpret natural language prompts and translate them into browser actions. Agents automate UI interactions, integrate with external tools via APIs or custom Python functions, and can be configured for human-in-the-loop (HITL) scenarios for approvals or direct control. Workflows are orchestrated programmatically, allowing for complex, multi-step automations.

Quick Start & Requirements

  • Installation: pip install nova-act
  • Prerequisites:
    • OS: macOS Sierra+, Ubuntu 22.04+, WSL2, or Windows 10+.
    • Python: 3.10+.
    • Browser: Google Chrome (managed by Playwright).
  • Setup: Initial Playwright module installation may take 1-2 minutes on first run.
  • Links: Web playground: https://nova.amazon.com/act

Highlighted Details

  • IDE Integration: Offers an extension for accelerated development, including chat-to-script generation and debugging.
  • Human-in-the-Loop (HITL): Supports human approval workflows and real-time UI takeover for complex decision points or CAPTCHA resolution.
  • Tool Integration: Extensible via custom Python functions (@tool decorator) and external MCP servers.
  • AWS CLI: Includes a CLI for deploying Python workflows to AWS AgentCore Runtime, automating containerization and deployment.
  • Data Extraction: Advanced data extraction capabilities using Pydantic models via the act_get() function.
  • Session Management: Supports persistent browser sessions, video recording, and S3 storage for session artifacts.

Maintenance & Community

No explicit community channels or contributor information is detailed in the provided README. Bug reports and feedback are directed to nova-act@amazon.com.

Licensing & Compatibility

Usage is governed by nova.amazon.com Terms of Use for API key authentication and AWS Service Terms/Customer Agreements for IAM authentication and AWS service deployment. Standard open-source licenses are not specified, implying proprietary AWS terms.

Limitations & Caveats

The system cannot interact with non-browser applications or browser modals. It is optimized for specific screen resolutions (864x1536 to 1536x2304), with potential performance degradation outside this range. The system is susceptible to prompt injection attacks from untrusted content. Multi-processing workflows are not currently supported, and cross-OS keyboard command translation can be an issue when integrating with AWS AgentCore Browser.

Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
2
Star History
12 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Abubakar Abid Abubakar Abid(Cofounder of Gradio), and
3 more.

owl by camel-ai

0.4%
19k
Multi-agent framework for real-world task automation
Created 11 months ago
Updated 3 days ago
Feedback? Help us improve.