Discover and explore top open-source AI tools and projects—updated daily.
harlan-zwConvert websites to LLM-optimized Markdown and context files
Top 49.4% on SourcePulse
Mdream addresses the limitations of traditional HTML-to-Markdown converters, which are often slow, bloated, and poorly suited for Large Language Models (LLMs). It provides a highly optimized, ultra-fast, and token-efficient solution for converting any website into clean Markdown and specialized llms.txt artifacts. This boosts AI discoverability for sites and generates valuable LLM context for projects, benefiting developers and content creators alike.
How It Works
Mdream's core is a custom-built HTML-to-Markdown converter primitive, optimized for LLM consumption, yielding ~50% fewer tokens. It generates minimal GitHub Flavored Markdown, supporting frontmatter and HTML markup, with ultra-fast streaming (1.4MB HTML to Markdown in ~50ms). The core is tiny (5kB gzip) and zero-dependency. Its extensibility is powered by a robust plugin system for pipeline customization.
Quick Start & Requirements
Installation and usage are primarily via npx for CLI operations (e.g., npx mdream for single file conversion, npx @mdream/crawl for site crawling) or through Docker images (harlanzw/mdream:latest). For JavaScript-heavy sites, Playwright is leveraged via the --driver playwright flag or specific Docker images. Node.js is a prerequisite for npx commands. Official documentation and usage examples are available for various integrations, including GitHub Actions, Vite, and Nuxt.
Highlighted Details
llms.txt/llms-full.txt artifacts for LLM context.Maintenance & Community
The project is supported by a Sponsor Program, and a Discord server is available for community help and discussion. The primary developer is active on Twitter (@harlan_zw).
Licensing & Compatibility
Mdream is licensed under the permissive MIT license, allowing for broad compatibility with commercial and closed-source projects.
Limitations & Caveats
The readabilityPlugin is noted as experimental. While the core is zero-dependency, driver-based operations (like Playwright) introduce external dependencies.
3 days ago
Inactive
romansky
Dirk Englund(MIT EECS Professor and Cofounder of Axiomatic AI), and
firecrawl