openedai-vision by matatonic

OpenAI API-compatible vision server for multimodal image Q&A

Created 1 year ago

267 stars

Top 96.1% on SourcePulse

Project Summary

This project provides an OpenAI API-compatible server for multimodal chat, allowing users to interact with and ask questions about images using various open-source vision-language models. It targets developers and researchers looking for a self-hosted alternative to proprietary vision APIs.

How It Works

The server acts as a wrapper around Hugging Face Transformers, enabling users to load and serve a wide array of vision-language models. It translates standard OpenAI API requests into model-specific inference calls, abstracting away the complexities of different model architectures and their unique requirements. This approach allows for flexible model selection and easy integration into existing workflows.

Quick Start & Requirements

Installation: Docker is the recommended installation method.

# For standard server
docker compose up
# For alternate server
docker compose -f docker-compose.alt.yml up

Prerequisites: Docker with NVIDIA CUDA container support. For manual installation, Python 3.10/3.11 is recommended; Python 3.12 requires special handling for AWQ/GPTQ models.
Configuration: Edit vision.env or vision-alt.env to specify models and settings.
Documentation: API Documentation and Usage Examples.

Highlighted Details

Supports a vast and growing list of vision-language models from Hugging Face (e.g., LLaVA, InternVL, Qwen-VL, CogVLM, Phi-3-vision).
Offers OpenAI API compatibility for seamless integration.
Includes options for 4-bit/8-bit quantization and Flash Attention 2 for performance optimization.
Provides experimental support for multi-image inputs and streaming responses.

Maintenance & Community

The project is actively maintained with frequent updates adding new model support and fixing regressions. Community contributions are welcomed, with specific mentions of users who have helped improve compatibility and add features.

Licensing & Compatibility

The project itself appears to be under a permissive license, but the underlying models used will have their own licenses, which may include restrictions on commercial use or redistribution. Users must verify the licenses of the specific models they choose to deploy.

openedai-vision by matatonic

Explore Similar Projects

openai-go by rakyll

webcamGPT by roboflow

WebcamGPT-Vision by bdekraker

AI-Playground by intel

ChatGPT.Net by PawanOsman

Virtual-Human-for-Chatting by Navi-Studio

rust-genai by jeremychone

react-native-ai by dabit3

jimeng-free-api by LLM-Red-Team

CogVLM by zai-org

ChatGPT_JCM by JunChenMoCode

nexa-sdk by NexaAI