openreader by richardr1126

Real-time TTS document reader and audiobook generator

Created 1 year ago

278 stars

Top 93.6% on SourcePulse

Project Summary

OpenReader WebUI is an open-source, Next.js-based web application designed for text-to-speech (TTS) document reading. It supports EPUB, PDF, DOCX, MD, and TXT files, offering users a real-time read-along experience with high-quality narration or the ability to extract audiobooks. The project targets users seeking accessible document consumption and audiobook creation, particularly those who prefer or require self-hosted TTS solutions. Its primary benefit lies in its flexibility with TTS providers and advanced playback features.

How It Works

The application leverages a Next.js frontend and supports multiple TTS providers, including cloud services like OpenAI and Deepinfra, as well as self-hosted, OpenAI-compatible endpoints such as Kokoro-FastAPI and Orpheus-FastAPI. Key innovations include real-time text highlighting synchronized with narration, optional word-by-word highlighting powered by server-side generated timestamps from whisper.cpp, and sentence-aware narration that merges sentences across page breaks for a smoother listening experience. It features a local-first architecture using Dexie.js for in-browser storage and offers robust audiobook export capabilities in m4b/mp3 formats with chapter support.

Quick Start & Requirements

Docker: Run docker run --name openreader-webui --restart unless-stopped -p 3003:3003 -v openreader_docstore:/app/docstore ghcr.io/richardr1126/openreader-webui:latest. Requires Docker and an accessible TTS API server. Environment variables like API_KEY and API_BASE can be set during runtime. Access the UI at http://localhost:3003.
Local Kokoro-FastAPI (CPU/GPU): Requires Docker. CPU version: docker run -d --name kokoro-tts --restart unless-stopped -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.4. GPU version requires --gpus all. Set OpenReader's API_BASE to http://host.docker.internal:8880/v1 or similar.
Local Development: Requires Node.js and pnpm (or npm). Optional dependencies include FFmpeg (for m4b), LibreOffice (for DOCX), and whisper.cpp (for word-by-word highlighting). Clone the repo, install dependencies (pnpm i), configure .env, and run (pnpm dev).
Links: Docker Image: ghcr.io/richardr1126/openreader-webui:latest. Kokoro-FastAPI: ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.4 / ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.4. Whisper.cpp: https://github.com/ggml-org/whisper.cpp.git.

Highlighted Details

Supports multiple TTS providers including OpenAI, Deepinfra, Kokoro-FastAPI, and Orpheus-FastAPI.
Features real-time text and word-by-word highlighting during playback.
Provides sentence-aware narration for improved flow.
Offers reliable audiobook export (m4b/mp3) with chapter support.
Employs a local-first architecture with in-browser storage via Dexie.js.
Includes an optimized Next.js TTS proxy with audio caching.

Maintenance & Community

Feature requests should be submitted via the Discussions tab, and issues should be reported on GitHub using the provided template. Contributions are welcomed via pull requests. Specific details on maintainers, community channels (like Discord/Slack), or roadmaps are not provided in the README.

Licensing & Compatibility

The project is licensed under the MIT License, which permits commercial use and integration into closed-source projects.

Limitations & Caveats

Word-by-word highlighting requires the separate installation and configuration of whisper.cpp. DOCX file support necessitates LibreOffice, and m4b audiobook creation requires FFmpeg. Users must ensure a compatible TTS API server is running and accessible, as OpenReader acts as a client to these services. The README explicitly states no responsibility for issues related to external TTS API servers like Kokoro-FastAPI.

openreader by richardr1126

Explore Similar Projects

curses by mmpneo

lobe-tts by lobehub

ttsfm by dbccccccc

tts by zuoban

whisper-playground by saharmor

Scriberr by rishikanthc

Speech-AI-Forge by lenML

LibreTTS by LibreSpark

RealtimeTTS by KoljaB

Whisper-WebUI by jhj0517

mlx-audio by Blaizzy

voice-pro by abus-aikorea