openedai-vision  by matatonic

OpenAI API-compatible vision server for multimodal image Q&A

created 1 year ago
259 stars

Top 98.4% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an OpenAI API-compatible server for multimodal chat, allowing users to interact with and ask questions about images using various open-source vision-language models. It targets developers and researchers looking for a self-hosted alternative to proprietary vision APIs.

How It Works

The server acts as a wrapper around Hugging Face Transformers, enabling users to load and serve a wide array of vision-language models. It translates standard OpenAI API requests into model-specific inference calls, abstracting away the complexities of different model architectures and their unique requirements. This approach allows for flexible model selection and easy integration into existing workflows.

Quick Start & Requirements

  • Installation: Docker is the recommended installation method.
    # For standard server
    docker compose up
    # For alternate server
    docker compose -f docker-compose.alt.yml up
    
  • Prerequisites: Docker with NVIDIA CUDA container support. For manual installation, Python 3.10/3.11 is recommended; Python 3.12 requires special handling for AWQ/GPTQ models.
  • Configuration: Edit vision.env or vision-alt.env to specify models and settings.
  • Documentation: API Documentation and Usage Examples.

Highlighted Details

  • Supports a vast and growing list of vision-language models from Hugging Face (e.g., LLaVA, InternVL, Qwen-VL, CogVLM, Phi-3-vision).
  • Offers OpenAI API compatibility for seamless integration.
  • Includes options for 4-bit/8-bit quantization and Flash Attention 2 for performance optimization.
  • Provides experimental support for multi-image inputs and streaming responses.

Maintenance & Community

The project is actively maintained with frequent updates adding new model support and fixing regressions. Community contributions are welcomed, with specific mentions of users who have helped improve compatibility and add features.

Licensing & Compatibility

The project itself appears to be under a permissive license, but the underlying models used will have their own licenses, which may include restrictions on commercial use or redistribution. Users must verify the licenses of the specific models they choose to deploy.

Limitations & Caveats

  • Some models may not support all features like streaming or quantization, and specific configurations might be required (e.g., vision.sample.env).
  • Python 3.12 has known issues with AWQ and GPTQ models, requiring manual compilation.
  • Recent versions have introduced regressions in memory usage for certain models (e.g., Qwen2/Qwen2.5) and potential issues with GPTQ-Int4/8 quantization.
  • Certain older models are deprecated and require using older Docker images for support.
Health Check
Last commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
10 more.

JARVIS by microsoft

0.1%
24k
System for LLM-orchestrated AI task automation
created 2 years ago
updated 4 days ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
13 more.

open-webui by open-webui

0.9%
105k
Self-hosted AI platform for local LLM deployment
created 1 year ago
updated 1 day ago
Feedback? Help us improve.