visual-openllm  by visual-openllm

Open-source multimodal LLM framework

created 2 years ago
1,201 stars

Top 33.3% on sourcepulse

GitHubView on GitHub
Project Summary

Visual OpenLLM is an open-source tool that connects various visual models through an interactive interface, aiming to replicate the functionality of systems like Visual ChatGPT and Baidu's Wenxin Yiyan. It is designed for researchers and developers interested in multimodal AI applications, offering a flexible framework for integrating different visual models with large language models.

How It Works

The project leverages a modular architecture, connecting a large language model (LLM) like ChatGLM with visual models such as Stable Diffusion. It processes user queries, determines the appropriate visual model for the task, and orchestrates the interaction between the LLM and the visual tools to generate responses. This approach allows for a unified conversational interface to diverse visual AI capabilities.

Quick Start & Requirements

  • Primary install / run command: python run.py --load_llm Chatglm3 (for ChatGLM3-6B) or python run.py --load_llm Chatglm (for ChatGLM).
  • Prerequisites: Python, ChatGLM or ChatGLM3-6B, Stable Diffusion.
  • Links: Demo

Highlighted Details

  • Supports ChatGLM3, VQA, and pix2pix.
  • Built upon ChatGLM, Visual ChatGPT, and Stable Diffusion.
  • Aims to be an open-source alternative to commercial multimodal systems.

Maintenance & Community

The project has seen recent contributions, including support for ChatGLM3 and new visual functionalities. Further development is planned to support multi-turn chat, additional visual tools, and other LLMs.

Licensing & Compatibility

The licensing details are not explicitly stated in the provided README snippet. Compatibility for commercial use or closed-source linking would require further investigation into the specific licenses of its dependencies.

Limitations & Caveats

The project is still under active development, with planned features including multi-turn chat support and integration of more visual tools and LLMs. The current version may have limitations regarding conversational depth and the breadth of supported visual models.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

X-LLM by phellonchen

0.3%
312
Multimodal LLM research paper
created 2 years ago
updated 2 years ago
Feedback? Help us improve.