speakr by murtaza-nasir

Self-hosted web app for audio transcription, summarization, and chat

Created 9 months ago

2,816 stars

Top 16.6% on SourcePulse

Project Summary

Speakr is a self-hosted web application for transcribing audio recordings, generating summaries and titles, and interacting with the content via chat. It targets individuals and teams seeking to securely manage and analyze their audio data, offering a private alternative to cloud-based transcription services.

How It Works

Speakr leverages OpenAI-compatible APIs for both Speech-to-Text (STT) and Large Language Models (LLMs). Users upload audio files, which are processed in the background. STT APIs convert audio to text, and LLMs then generate summaries, titles, and provide conversational interaction based on the transcript. The architecture supports configurable transcription and output languages, user-specific prompts, and integration of user professional context for more relevant AI responses.

Quick Start & Requirements

Installation: Docker is the only currently functional installation method.
- Clone the repository: git clone https://github.com/murtaza-nasir/speakr.git
- Configure docker-compose.yml with API keys (OpenAI-compatible for STT and LLM) and desired models.
- Start with docker compose up -d.
Prerequisites: Python 3.8+, pip, venv, Docker, and API keys for STT (e.g., Whisper) and LLM (e.g., OpenRouter, OpenAI).
Setup: Requires configuring API endpoints and keys.
Links: GitHub Repository

Highlighted Details

Supports multilingual transcription and AI output.
Offers interactive chat with transcript content.
Includes user authentication, account management, and an admin dashboard.
Metadata editing for recordings (titles, participants, dates, notes).
Customizable summarization prompts and user professional context for AI.

Maintenance & Community

The project is maintained by Murtaza Nasir. Feedback, bug reports, and feature suggestions are welcomed via GitHub Issues. A Contributor License Agreement (CLA) will be required for future code contributions.

Licensing & Compatibility

Dual-licensed under GNU Affero General Public License v3.0 (AGPLv3) and a separate commercial license. AGPLv3 requires sharing source code of modified versions if accessed over a network. Commercial licensing is available for proprietary integration.

Limitations & Caveats

Local development and Linux systemd deployment methods are explicitly stated as not currently working. Users must rely on the Docker installation. The AGPLv3 license has significant implications for commercial use, requiring source code disclosure of network-accessible modifications.

speakr by murtaza-nasir

Explore Similar Projects

wechat-ai-summarize-bot by small-tou

smol-podcaster by FanaHOVA

gpt-voice-conversation-chatbot by Adri6336

muvtuber by cdfmlr

Scriberr by rishikanthc

All-Model-Chat by yeahhe365

superpower-chatgpt by saeedezzati

pdf-to-podcast by knowsuchagency

enchanted by gluonfield

chatgpt-web by Niek

chat-with-gpt by cogentapps

podcastfy by souzatharsis