vision-agent  by landing-ai

Visual AI agent for generating runnable vision code from image/video prompts

created 1 year ago
4,979 stars

Top 10.2% on sourcepulse

GitHubView on GitHub
Project Summary

Vision Agent is a tool that enables users to build visual AI applications by providing an image or video and a prompt. It automatically selects appropriate vision models, generates runnable code, and includes test cases, allowing for rapid development of AI-powered visual applications. The target audience includes developers and researchers looking to quickly prototype and deploy computer vision solutions.

How It Works

The agent operates by taking a user's prompt and associated media (image/video) to generate a plan for the task. It then produces code and a test case based on this plan, iterating until the test case passes. This approach leverages large language models (LLMs) from providers like Anthropic and Google to interpret the prompt and generate the necessary code, ensuring functional and tested outputs.

Quick Start & Requirements

  • Installation: pip install vision-agent or uv add vision-agent
  • Prerequisites: Python 3.9+, Anthropic API key, Google API key. Vision Agent API key is also required for the web app.
  • Setup: Requires setting API keys as environment variables (VISION_AGENT_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY).
  • Resources: VisionAgent Library Docs, Web App Docs, Video Tutorials.

Highlighted Details

  • Automated code generation with integrated testing for visual AI tasks.
  • Supports direct use of underlying vision tools (e.g., object detection, video tracking) in custom scripts.
  • Configurable LLM providers (defaulting to Anthropic Claude 3.7 Sonnet and Gemini Flash 2.0).
  • Includes a web application for a no-code/low-code experience.

Maintenance & Community

  • Active development indicated by CI status and PyPI versioning.
  • Community support available via Discord.

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • Requires API keys from third-party providers (Anthropic, Google), which may incur costs and are subject to their respective rate limits.
  • The license is not specified, which could impact commercial adoption.
Health Check
Last commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
2
Issues (30d)
4
Star History
463 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Toran Bruce Richards Toran Bruce Richards(Founder of AutoGPT), and
2 more.

OS-Copilot by OS-Copilot

0.1%
2k
OS agent for automating daily tasks
created 1 year ago
updated 10 months ago
Feedback? Help us improve.