intrascribe  by weynechen

Local-first, privacy-focused speech-to-text and summarization platform for internal networks

Created 1 month ago
527 stars

Top 60.0% on SourcePulse

GitHubView on GitHub
Project Summary

IntraScribe is a self-hosted, privacy-focused speech-to-text and summarization platform designed for internal network deployment in enterprises, schools, and government organizations. It offers real-time transcription, speaker diarization, batch processing, AI-powered summarization, and title generation, with a fully decoupled architecture allowing for flexible integration of various audio capture and transmission methods. The platform prioritizes data privacy and compliance by keeping all data within the local network.

How It Works

IntraScribe utilizes a modular architecture. Real-time transcription is handled via WebRTC for audio streaming from the browser to the backend, with results returned through Server-Sent Events (SSE). For higher quality and structured output, audio is cached, uploaded to Supabase Storage, and then processed for speaker diarization using pyannote.audio and re-transcription. AI summarization and title generation are managed by LiteLLM, allowing for configurable models and fallback strategies. Data persistence and real-time updates are managed through Supabase, leveraging Postgres for data, Auth for authentication, Storage for files, and Realtime for event subscriptions.

Quick Start & Requirements

  • Installation: Clone the repository, set up Supabase locally, configure environment variables (.env.local for web, .env for backend), install backend dependencies with uv, and start the backend and frontend.
  • Prerequisites:
    • NVIDIA GPU with CUDA (CPU fallback available but untested).
    • Node.js 18+, Python 3.10+, uv.
    • Ollama with a model like qwen3:8b (configurable).
    • FFmpeg.
    • Supabase CLI.
    • Hugging Face token for pyannote.audio.
  • Setup: Initial Supabase setup and model downloads can be time-consuming. Local HTTPS setup with mkcert is recommended for intra-network use.
  • Links: Supabase Local Development, uv Installation, mkcert.

Highlighted Details

  • Supports local, offline, and privacy-sensitive deployments.
  • Features team collaboration with account systems and template sharing.
  • Decoupled frontend allows integration with various hardware and transmission protocols.
  • Editable transcriptions with preserved timestamps and speaker information.

Maintenance & Community

  • MIT License.
  • TODO section mentions plans for hardware integration and AI dialogue features.

Licensing & Compatibility

  • MIT License, generally permissive for commercial use and closed-source linking.

Limitations & Caveats

  • The project has primarily been tested on Ubuntu 22.04.
  • Speaker diarization may fail if the Hugging Face token is not configured or if models require authorization, with a fallback to a single speaker.
  • Audio processing failures may occur if FFmpeg is not installed or not in the system's PATH.
Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
9
Star History
526 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.