FunClip by modelscope

Video clipping tool using LLM-based AI

Created 2 years ago

5,266 stars

Top 9.4% on SourcePulse

Project Summary

FunClip is an open-source, locally deployable tool for automated video clipping, targeting users who need to extract specific segments from video content. It leverages advanced speech recognition and LLM-based analysis to simplify the process of identifying and isolating desired portions of videos, offering both accuracy and ease of use.

How It Works

FunClip utilizes Alibaba's FunASR Paraformer-Large model for accurate speech-to-text conversion, including integrated timestamp prediction. It enhances this with CAM++ for speaker diarization, allowing clipping based on specific speakers. A key feature is its integration with LLMs (like Qwen, GPT) for "smart clipping," where users can prompt the LLM to identify and extract segments based on semantic content or themes. This LLM integration allows for more sophisticated and context-aware clipping beyond simple text matching.

Quick Start & Requirements

Install: Clone the repository and install requirements via pip install -r ./requirements.txt.
Prerequisites: Python environment. ffmpeg and imagemagick are required for clipping with embedded subtitles. A specific font file (STHeitiMedium.ttc) needs to be downloaded.
Usage: Can be run as a local Gradio service (python funclip/launch.py) or via command line for recognition and clipping stages.
Links: Modelscope Space, HuggingFace Space

Highlighted Details

Supports LLM-based smart clipping with customizable prompts.
Integrates speaker diarization for clipping specific speakers.
Offers hotword customization for enhanced ASR accuracy.
Supports both English and Chinese audio recognition.
Provides multi-segment clipping and generates SRT subtitles.

Maintenance & Community

The project is developed by the FunASR team. Community communication is facilitated via DingTalk and WeChat groups.

Licensing & Compatibility

The repository is open-source. Specific licensing details are not explicitly stated in the README, but it is presented as an open-source tool for research and application. Compatibility for commercial use or closed-source linking would require clarification of the exact license.

Limitations & Caveats

The README mentions that Whisper model support for English audio with timestamp prediction is "coming soon" and requires significant GPU memory. Some installation steps for ImageMagick require manual path adjustments depending on the user's system configuration.

Health Check

Last Commit

6 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

77 stars in the last 30 days