Discover and explore top open-source AI tools and projects—updated daily.
richardr1126Real-time TTS document reader and audiobook generator
Top 97.0% on SourcePulse
OpenReader WebUI is an open-source, Next.js-based web application designed for text-to-speech (TTS) document reading. It supports EPUB, PDF, DOCX, MD, and TXT files, offering users a real-time read-along experience with high-quality narration or the ability to extract audiobooks. The project targets users seeking accessible document consumption and audiobook creation, particularly those who prefer or require self-hosted TTS solutions. Its primary benefit lies in its flexibility with TTS providers and advanced playback features.
How It Works
The application leverages a Next.js frontend and supports multiple TTS providers, including cloud services like OpenAI and Deepinfra, as well as self-hosted, OpenAI-compatible endpoints such as Kokoro-FastAPI and Orpheus-FastAPI. Key innovations include real-time text highlighting synchronized with narration, optional word-by-word highlighting powered by server-side generated timestamps from whisper.cpp, and sentence-aware narration that merges sentences across page breaks for a smoother listening experience. It features a local-first architecture using Dexie.js for in-browser storage and offers robust audiobook export capabilities in m4b/mp3 formats with chapter support.
Quick Start & Requirements
docker run --name openreader-webui --restart unless-stopped -p 3003:3003 -v openreader_docstore:/app/docstore ghcr.io/richardr1126/openreader-webui:latest. Requires Docker and an accessible TTS API server. Environment variables like API_KEY and API_BASE can be set during runtime. Access the UI at http://localhost:3003.docker run -d --name kokoro-tts --restart unless-stopped -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.4. GPU version requires --gpus all. Set OpenReader's API_BASE to http://host.docker.internal:8880/v1 or similar.pnpm i), configure .env, and run (pnpm dev).ghcr.io/richardr1126/openreader-webui:latest. Kokoro-FastAPI: ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.4 / ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.4. Whisper.cpp: https://github.com/ggml-org/whisper.cpp.git.Highlighted Details
Maintenance & Community
Feature requests should be submitted via the Discussions tab, and issues should be reported on GitHub using the provided template. Contributions are welcomed via pull requests. Specific details on maintainers, community channels (like Discord/Slack), or roadmaps are not provided in the README.
Licensing & Compatibility
The project is licensed under the MIT License, which permits commercial use and integration into closed-source projects.
Limitations & Caveats
Word-by-word highlighting requires the separate installation and configuration of whisper.cpp. DOCX file support necessitates LibreOffice, and m4b audiobook creation requires FFmpeg. Users must ensure a compatible TTS API server is running and accessible, as OpenReader acts as a client to these services. The README explicitly states no responsibility for issues related to external TTS API servers like Kokoro-FastAPI.
1 month ago
Inactive
Blaizzy
KoljaB