dolphin by kaleido-lab

Video interaction platform based on LLMs

Created 2 years ago

254 stars

Top 99.1% on SourcePulse

Project Summary

Dolphin is a general video interaction platform powered by Large Language Models (LLMs), designed for video understanding, processing, and generation. It targets researchers and developers working with video content and LLMs, offering a unified interface for diverse video-related tasks.

How It Works

Dolphin integrates various "Video Foundation Models" (VFMs) as backend modules, allowing users to interact with video through an LLM interface. This modular architecture enables flexible configuration and expansion, supporting tasks like video Q&A, editing (trimming, subtitle addition), feature extraction (pose, depth, Canny edges), and generation (text-to-video, video-to-video). The platform's extensibility is a key design principle, facilitating the addition of new VFMs and LLMs.

Quick Start & Requirements

Install: conda create -n dolphin python=3.8, conda activate dolphin, git clone https://github.com/BUAA-PrismGroup/dolphin.git, cd dolphin, pip install -r requirements.txt.
Prerequisites: Python 3.8, Conda environment recommended. GPU memory requirements vary significantly by VFM (e.g., VideoCaptioning ~13GB, MoviepyInterface 0MB).
Run: python video_chatgpt.py (default, uses configs/backends.yaml for VFM loading) or specify models and devices (e.g., python video_chatgpt.py --load VideoCaptioning_cpu,ImageCaptioning_cpu,ModelscopeT2V_cpu).
Docs: Official Demo

Highlighted Details

Supports video understanding (Q&A), processing (trimming, audio extraction, feature extraction), and generation (text-to-video, video-to-video).
Highly extensible framework for adding new video foundation models and LLMs.
Detailed GPU memory usage breakdown for various foundation models is provided.
Offers example commands for both CPU and GPU configurations.

Maintenance & Community

The project is under active development with a stated goal of continued updates and community contributions. Contact information for issues and general inquiries is provided, along with a Twitter handle for updates.

Licensing & Compatibility

The repository does not explicitly state a license in the README.

Limitations & Caveats

The project is explicitly stated to be "still under construction." While a demo and code release are available, features like a unified video model, benchmarking, and service deployment (Gradio, Web, API, Docker) are listed as ongoing tasks.

dolphin by kaleido-lab

Explore Similar Projects

Gausian_native_editor by gausian-AI

t2v-turbo by Ji4chenLi

ComfyUI-WanVideoStartEndFrames by raindrop313

FreeNoise by AILab-CVC

VideoTuna by VideoVerses

Auto-YouTube-Shorts-Maker by Binary-Bytes

Awesome-LLMs-for-Video-Understanding by yunlong10

Director by video-db

ShortGPT by RayVentura

NarratoAI by linyqh

Wan2.2 by Wan-Video

MoneyPrinterTurbo by harry0703