autocut by mli

CLI tool for subtitle-based video editing

Created 3 years ago

7,527 stars

Top 6.8% on SourcePulse

Project Summary

AutoCut is a command-line tool designed for video editing using text-based transcriptions. It automatically generates subtitles for videos, allowing users to select desired sentences from a Markdown file to extract and save corresponding video segments. This approach eliminates the need for traditional video editing software, making the process accessible to users who prefer text-based workflows.

How It Works

AutoCut leverages the Whisper speech-to-text model for transcription, with support for various Whisper models (tiny, base, small, medium, large, large-v3-turbo) and the OpenAI API. After transcription, it generates a Markdown file where users can edit and select segments. The tool then uses FFmpeg to cut and stitch the video clips based on the edited Markdown or SRT files. It also offers a daemon mode to monitor directories for new videos and process them automatically.

Quick Start & Requirements

Install via pip: pip install autocut-sub
Dependencies: Python, FFmpeg. GPU acceleration requires an NVIDIA GPU with CUDA and compatible PyTorch installation.
For faster-whisper support: pip install 'autocut-sub[faster]'
For OpenAI API support: pip install 'autocut-sub[openai]' and set OPENAI_API_KEY.
Docker images are available for both CPU and GPU acceleration.
Official documentation: https://github.com/mli/autocut

Highlighted Details

Supports multiple Whisper model sizes for transcription quality vs. speed trade-offs.
Offers both direct command-line usage and a daemon mode for automated processing.
Allows editing via Markdown or SRT files for segment selection.
Includes options for output encoding and compact SRT generation for easier editing.

Maintenance & Community

The project is actively maintained, with recent updates supporting new Whisper models and pip installation. Contribution guidelines and a basic code structure are provided.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Video export speed on Apple M1 chips may be slower than professional video software due to FFmpeg's limitations on GPU usage. Large Whisper models require significant GPU VRAM; users with insufficient VRAM can opt for smaller models or force CPU usage. Encoding consistency between file generation and editing is crucial to avoid errors.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

33 stars in the last 30 days