dolphin  by kaleido-lab

Video interaction platform based on LLMs

created 2 years ago
251 stars

Top 99.8% on sourcepulse

GitHubView on GitHub
Project Summary

Dolphin is a general video interaction platform powered by Large Language Models (LLMs), designed for video understanding, processing, and generation. It targets researchers and developers working with video content and LLMs, offering a unified interface for diverse video-related tasks.

How It Works

Dolphin integrates various "Video Foundation Models" (VFMs) as backend modules, allowing users to interact with video through an LLM interface. This modular architecture enables flexible configuration and expansion, supporting tasks like video Q&A, editing (trimming, subtitle addition), feature extraction (pose, depth, Canny edges), and generation (text-to-video, video-to-video). The platform's extensibility is a key design principle, facilitating the addition of new VFMs and LLMs.

Quick Start & Requirements

  • Install: conda create -n dolphin python=3.8, conda activate dolphin, git clone https://github.com/BUAA-PrismGroup/dolphin.git, cd dolphin, pip install -r requirements.txt.
  • Prerequisites: Python 3.8, Conda environment recommended. GPU memory requirements vary significantly by VFM (e.g., VideoCaptioning ~13GB, MoviepyInterface 0MB).
  • Run: python video_chatgpt.py (default, uses configs/backends.yaml for VFM loading) or specify models and devices (e.g., python video_chatgpt.py --load VideoCaptioning_cpu,ImageCaptioning_cpu,ModelscopeT2V_cpu).
  • Docs: Official Demo

Highlighted Details

  • Supports video understanding (Q&A), processing (trimming, audio extraction, feature extraction), and generation (text-to-video, video-to-video).
  • Highly extensible framework for adding new video foundation models and LLMs.
  • Detailed GPU memory usage breakdown for various foundation models is provided.
  • Offers example commands for both CPU and GPU configurations.

Maintenance & Community

The project is under active development with a stated goal of continued updates and community contributions. Contact information for issues and general inquiries is provided, along with a Twitter handle for updates.

Licensing & Compatibility

The repository does not explicitly state a license in the README.

Limitations & Caveats

The project is explicitly stated to be "still under construction." While a demo and code release are available, features like a unified video model, benchmarking, and service deployment (Gradio, Web, API, Docker) are listed as ongoing tasks.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Author of SGLang), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
1 more.

Open-Sora-Plan by PKU-YuanGroup

0.0%
12k
Open-source project aiming to reproduce Sora-like T2V model
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.