mdream  by harlan-zw

Convert websites to LLM-optimized Markdown and context files

Created 5 months ago
313 stars

Top 86.1% on SourcePulse

GitHubView on GitHub
Project Summary

Mdream addresses the limitations of traditional HTML-to-Markdown converters, which are often slow, bloated, and poorly suited for Large Language Models (LLMs). It provides a highly optimized, ultra-fast, and token-efficient solution for converting any website into clean Markdown and specialized llms.txt artifacts. This boosts AI discoverability for sites and generates valuable LLM context for projects, benefiting developers and content creators alike.

How It Works

Mdream's core is a custom-built HTML-to-Markdown converter primitive, optimized for LLM consumption, yielding ~50% fewer tokens. It generates minimal GitHub Flavored Markdown, supporting frontmatter and HTML markup, with ultra-fast streaming (1.4MB HTML to Markdown in ~50ms). The core is tiny (5kB gzip) and zero-dependency. Its extensibility is powered by a robust plugin system for pipeline customization.

Quick Start & Requirements

Installation and usage are primarily via npx for CLI operations (e.g., npx mdream for single file conversion, npx @mdream/crawl for site crawling) or through Docker images (harlanzw/mdream:latest). For JavaScript-heavy sites, Playwright is leveraged via the --driver playwright flag or specific Docker images. Node.js is a prerequisite for npx commands. Official documentation and usage examples are available for various integrations, including GitHub Actions, Vite, and Nuxt.

Highlighted Details

  • LLM Optimization: Generates Markdown with ~50% fewer tokens and includes llms.txt/llms-full.txt artifacts for LLM context.
  • Performance: Achieves ultra-fast conversion, processing 1.4MB HTML to Markdown in ~50ms.
  • Extensibility: Features a plugin system for custom content filtering, data extraction (CSS selectors), frontmatter generation, and more.
  • Ecosystem Integration: Offers packages for CLI, Docker, GitHub Actions, Vite, and Nuxt, enabling seamless CI/CD and web development workflows.
  • Playwright Support: Enables crawling and conversion of dynamic, JavaScript-rendered websites.

Maintenance & Community

The project is supported by a Sponsor Program, and a Discord server is available for community help and discussion. The primary developer is active on Twitter (@harlan_zw).

Licensing & Compatibility

Mdream is licensed under the permissive MIT license, allowing for broad compatibility with commercial and closed-source projects.

Limitations & Caveats

The readabilityPlugin is noted as experimental. While the core is zero-dependency, driver-based operations (like Playwright) introduce external dependencies.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
169 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.