This repository serves as a curated hub for experiments leveraging the OpenAI Vision API, targeting developers and researchers interested in visual AI applications. It showcases diverse use cases, from basic image classification to advanced zero-shot learning, aiming to foster collaboration and exploration of the API's capabilities.
How It Works
The project demonstrates various applications of the OpenAI Vision API, including image classification, zero-shot learning, and integration with other models like GroundingDINO and Segment Anything (SAM) for tasks like object detection and segmentation. This combination aims to overcome the Vision API's native limitations, offering more robust visual understanding capabilities.
Quick Start & Requirements
- Requires an OpenAI API key.
- Experiments may involve additional foundational models (e.g., GroundingDINO, SAM) requiring separate setup.
- Refer to individual experiment directories for specific dependencies and setup instructions.
Highlighted Details
- Showcases zero-shot object detection by combining GPT-4V with GroundingDINO.
- Includes experiments comparing GPT-4V with CLIP for classification tasks.
- Features a "screenshot-to-code" experiment.
- Provides links to relevant research papers and blog posts detailing methodologies and findings.
Maintenance & Community
- Contributions are welcomed via issues and pull requests, with a contribution guide available.
- Key contributors include @SkalskiP, @capjamesg, and members of the Roboflow team.
Licensing & Compatibility
- The repository itself appears to be under a permissive license, but the use of the OpenAI Vision API is subject to OpenAI's terms of service and API usage policies.
Limitations & Caveats
- The OpenAI Vision API has a daily request limit per API key.
- Native capabilities are limited for object detection and image segmentation, requiring integration with other models.