visual-chatgpt-zh  by wxj630

Visual ChatGPT in Chinese

Created 2 years ago
285 stars

Top 91.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a Chinese-language version of Visual ChatGPT, a system that integrates large language models with visual foundation models. It enables users to perform tasks beyond text-based Q&A, including image Q&A, image generation, and image editing, making it a versatile tool for creative and analytical visual tasks.

How It Works

The system leverages a modular architecture, allowing users to load various visual foundation models (e.g., Image Captioning, Text-to-Image, Image Editing) alongside a language model. Users can specify which models to load and on which devices (GPU or CPU), offering flexibility for different hardware configurations. This approach allows for a unified interface to interact with multiple specialized AI models for diverse visual tasks.

Quick Start & Requirements

  • Install: Clone the repository, create a Conda environment (conda create -n visgpt python=3.8), activate it (conda activate visgpt), and install dependencies (pip install -r requirement.txt).
  • Prerequisites: Requires an OpenAI API key. Models need to be downloaded separately using download_hf_models.sh.
  • Usage: Run the system with python visual_chatgpt_zh.py --load <models> --pretrained_model_dir <path>. Users can specify model loading on CPU or specific CUDA devices for memory management.
  • Links: Official Paper: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models.

Highlighted Details

  • Supports Chinese language for all functionalities.
  • Offers flexible model loading across CPU and GPU devices to manage memory constraints.
  • Integrates with various visual foundation models like ControlNet and Stable Diffusion.
  • Provides detailed technical explanations and setup guides in the README.

Maintenance & Community

The project acknowledges contributions from HuggingFace, ControlNet, and Stable Diffusion. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. It acknowledges other projects, which may have their own licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires significant disk space for downloaded models and can be resource-intensive, potentially needing substantial GPU memory for optimal performance, although CPU offloading is supported. The specific version of Python required is 3.8.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elvis Saravia Elvis Saravia(Founder of DAIR.AI), and
1 more.

InternGPT by OpenGVLab

0%
3k
Interactive demo platform for showcasing AI models
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elvis Saravia Elvis Saravia(Founder of DAIR.AI).

DeepSeek-VL2 by deepseek-ai

0.1%
5k
MoE vision-language model for multimodal understanding
Created 1 year ago
Updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Taranjeet Singh Taranjeet Singh(Cofounder of Mem0), and
16 more.

TaskMatrix by chenfei-wu

0.0%
34k
Visual ChatGPT connects LLMs to visual foundation models
Created 2 years ago
Updated 2 years ago
Feedback? Help us improve.