transcriptionstream by transcriptionstream

Self-hosted service for offline transcription, diarization, and LLM summarization

Created 2 years ago

908 stars

Top 40.0% on SourcePulse

2 Experts Love This Project

hammer

Jeff Hammerbacher

Cofounder of Cloudera

jmorganca

Cofounder of Ollama

Project Summary

This project provides a self-hosted, offline transcription and diarization service with LLM-based summarization. It targets users needing to process audio files, offering a web interface and SSH drop zones for easy integration into workflows. The service leverages Whisper for transcription/diarization, Ollama with Mistral for summarization, and Meilisearch for full-text search, aiming for a turnkey solution.

How It Works

The system utilizes Docker for deployment, bundling transcription, diarization, summarization, and search functionalities. Audio files can be uploaded via a web UI or dropped via SSH. Whisper-diarization handles speaker identification and transcription, while Ollama integrates with Mistral to generate summaries based on a customizable prompt. Meilisearch provides fast indexing and retrieval of transcribed text.

Quick Start & Requirements

Install/Run: ./start-nobuild.sh (for Docker images) or ./install.sh followed by ./run.sh (for local build).
Prerequisites: NVIDIA GPU (required for ts-gpu image, ~26GB).
Resources: 12GB VRAM may be insufficient for both Whisper-diarization and Ollama Mistral simultaneously.
Docs: install and ts-web walkthrough videos (linked in README).

Highlighted Details

Turnkey, self-hosted, and offline operation.
SSH and Web UI for file upload and management.
LLM summarization via Ollama and Mistral with customizable prompts.
Full-text search powered by Meilisearch.
HTML5 web player with time-synced scrubbing and highlighting.

Maintenance & Community

Developed by transcriptionstream, with contributions acknowledged from MahmoudAshraf97 and jmorganca.
The README notes it is "example code for example purposes and should not be used in production environments without additional security measures."
To-do list includes fixing UI errors and adding Meilisearch controls.

Licensing & Compatibility

The README does not explicitly state a license. The project is presented as community edition.

Limitations & Caveats

The project is explicitly stated as example code not suitable for production without security hardening.
Potential CUDA memory issues exist when running diarization and Ollama Mistral concurrently on the same host due to VRAM limitations.
The web interface has known console errors when summary files are missing.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

5 stars in the last 30 days

Explore Similar Projects

yt-transcriber by pmarreck

YouTube transcription TUI app

Created 1 year ago

Updated 3 weeks ago

yt2doc by shun-liang

CLI tool for transcribing online videos/podcasts to Markdown

Created 1 year ago

Updated 10 months ago

summarize by steipete

Summarize diverse content from any URL or file

Created 3 weeks ago

Updated 4 days ago

meetingmind by misbahsy

AI app for meeting analysis using Next.js, Langflow, and Groq

Created 1 year ago

Updated 1 year ago

intrascribe by weynechen

Local-first, privacy-focused speech-to-text and summarization platform for internal networks

Created 5 months ago

Updated 3 months ago

Starred by

Kyle Mathews

Kyle Mathews(Author of Gatsby).

frogbase by hayabhay

Tool for turning multimedia into searchable knowledge

Created 3 years ago

Updated 2 years ago

tldw by the-crypt-keeper

End-to-end summarizer of long videos

Created 2 years ago

Updated 4 months ago

generate-subtitles by mayeaux

Web app for audio/video transcription and translation

Created 3 years ago

Updated 2 years ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

whisper-plus by kadirnar

Speech-to-text toolkit for enhanced audio processing

Created 2 years ago

Updated 1 month ago

vibe by thewh1teagle

Offline transcription app using OpenAI Whisper

Created 2 years ago

Updated 2 months ago

Whisper-WebUI by jhj0517

Web UI for Whisper-based subtitle generation

Created 2 years ago

Updated 1 week ago

Starred by

Taranjeet Singh

Taranjeet Singh(Cofounder of Mem0),

Dan Guido

Dan Guido(Cofounder of Trail of Bits), and

3 more.

podcastfy by souzatharsis

Open-source API creates multilingual audio conversations from multimodal content

Created 1 year ago

Updated 1 month ago

Feedback? Help us improve.