vision-agent by landing-ai

Visual AI agent for generating runnable vision code from image/video prompts

Created 1 year ago

5,197 stars

Top 9.5% on SourcePulse

View on GitHub

4 Experts Love This Project

Andrew Ng

Founder of DeepLearning.AI; Cofounder of Coursera; Professor at Stanford

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Project Summary

Vision Agent is a tool that enables users to build visual AI applications by providing an image or video and a prompt. It automatically selects appropriate vision models, generates runnable code, and includes test cases, allowing for rapid development of AI-powered visual applications. The target audience includes developers and researchers looking to quickly prototype and deploy computer vision solutions.

How It Works

The agent operates by taking a user's prompt and associated media (image/video) to generate a plan for the task. It then produces code and a test case based on this plan, iterating until the test case passes. This approach leverages large language models (LLMs) from providers like Anthropic and Google to interpret the prompt and generate the necessary code, ensuring functional and tested outputs.

Quick Start & Requirements

Installation: pip install vision-agent or uv add vision-agent
Prerequisites: Python 3.9+, Anthropic API key, Google API key. Vision Agent API key is also required for the web app.
Setup: Requires setting API keys as environment variables (VISION_AGENT_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY).
Resources: VisionAgent Library Docs, Web App Docs, Video Tutorials.

Highlighted Details

Automated code generation with integrated testing for visual AI tasks.
Supports direct use of underlying vision tools (e.g., object detection, video tracking) in custom scripts.
Configurable LLM providers (defaulting to Anthropic Claude 3.7 Sonnet and Gemini Flash 2.0).
Includes a web application for a no-code/low-code experience.

Maintenance & Community

Active development indicated by CI status and PyPI versioning.
Community support available via Discord.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Requires API keys from third-party providers (Anthropic, Google), which may incur costs and are subject to their respective rate limits.
The license is not specified, which could impact commercial adoption.

Health Check

Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

37 stars in the last 30 days