Discover and explore top open-source AI tools and projects—updated daily.
Convert websites to LLM-optimized Markdown and context files
Top 86.1% on SourcePulse
Mdream addresses the limitations of traditional HTML-to-Markdown converters, which are often slow, bloated, and poorly suited for Large Language Models (LLMs). It provides a highly optimized, ultra-fast, and token-efficient solution for converting any website into clean Markdown and specialized llms.txt
artifacts. This boosts AI discoverability for sites and generates valuable LLM context for projects, benefiting developers and content creators alike.
How It Works
Mdream's core is a custom-built HTML-to-Markdown converter primitive, optimized for LLM consumption, yielding ~50% fewer tokens. It generates minimal GitHub Flavored Markdown, supporting frontmatter and HTML markup, with ultra-fast streaming (1.4MB HTML to Markdown in ~50ms). The core is tiny (5kB gzip) and zero-dependency. Its extensibility is powered by a robust plugin system for pipeline customization.
Quick Start & Requirements
Installation and usage are primarily via npx
for CLI operations (e.g., npx mdream
for single file conversion, npx @mdream/crawl
for site crawling) or through Docker images (harlanzw/mdream:latest
). For JavaScript-heavy sites, Playwright is leveraged via the --driver playwright
flag or specific Docker images. Node.js is a prerequisite for npx
commands. Official documentation and usage examples are available for various integrations, including GitHub Actions, Vite, and Nuxt.
Highlighted Details
llms.txt
/llms-full.txt
artifacts for LLM context.Maintenance & Community
The project is supported by a Sponsor Program, and a Discord server is available for community help and discussion. The primary developer is active on Twitter (@harlan_zw).
Licensing & Compatibility
Mdream is licensed under the permissive MIT license, allowing for broad compatibility with commercial and closed-source projects.
Limitations & Caveats
The readabilityPlugin
is noted as experimental. While the core is zero-dependency, driver-based operations (like Playwright) introduce external dependencies.
1 day ago
Inactive