Discover and explore top open-source AI tools and projects—updated daily.
devonmochiMarkdown-driven AI voice input and image text extraction
Top 99.6% on SourcePulse
A Markdown-driven AI speech input tool, ByeType offers advanced customization for transcription accuracy and text extraction from images. It targets users needing precise voice input for specialized industries or personal habits, and those requiring reliable text extraction from screenshots, by significantly reducing manual post-processing.
How It Works
ByeType employs multimodal large language models to process raw audio directly, integrating user-defined rules and optimizations within a single transcription step. This approach bypasses the error-prone "ASR + LLM post-processing" pipeline. Customization is achieved through editable Markdown files, allowing users to define proprietary vocabulary, transcription logic, and text formatting strategies. The tool also features AI-powered image text extraction that understands visual context, intelligently repairing broken lines and restoring clean code blocks from screenshots, surpassing traditional OCR capabilities.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project encourages community contributions via Issues and Pull Requests. It acknowledges the Linux.do community for its contributions. No specific community channels (like Discord/Slack) or active maintainer details are provided in the README.
Licensing & Compatibility
Limitations & Caveats
Users must supply their own API keys for AI model access. Certain models, like Gemini, may require network proxy configuration for users in specific regions. Auto-pasting of transcribed text relies on accessibility permissions, with manual pasting as a fallback. Transcription speed can be optimized by disabling "Thinking Mode" or selecting lighter models.
1 week ago
Inactive
Michaelliv