Video interaction platform based on LLMs
Top 99.8% on sourcepulse
Dolphin is a general video interaction platform powered by Large Language Models (LLMs), designed for video understanding, processing, and generation. It targets researchers and developers working with video content and LLMs, offering a unified interface for diverse video-related tasks.
How It Works
Dolphin integrates various "Video Foundation Models" (VFMs) as backend modules, allowing users to interact with video through an LLM interface. This modular architecture enables flexible configuration and expansion, supporting tasks like video Q&A, editing (trimming, subtitle addition), feature extraction (pose, depth, Canny edges), and generation (text-to-video, video-to-video). The platform's extensibility is a key design principle, facilitating the addition of new VFMs and LLMs.
Quick Start & Requirements
conda create -n dolphin python=3.8
, conda activate dolphin
, git clone https://github.com/BUAA-PrismGroup/dolphin.git
, cd dolphin
, pip install -r requirements.txt
.python video_chatgpt.py
(default, uses configs/backends.yaml
for VFM loading) or specify models and devices (e.g., python video_chatgpt.py --load VideoCaptioning_cpu,ImageCaptioning_cpu,ModelscopeT2V_cpu
).Highlighted Details
Maintenance & Community
The project is under active development with a stated goal of continued updates and community contributions. Contact information for issues and general inquiries is provided, along with a Twitter handle for updates.
Licensing & Compatibility
The repository does not explicitly state a license in the README.
Limitations & Caveats
The project is explicitly stated to be "still under construction." While a demo and code release are available, features like a unified video model, benchmarking, and service deployment (Gradio, Web, API, Docker) are listed as ongoing tasks.
2 years ago
Inactive