huashu-md-html  by alchaincyf

Document conversion and typesetting pipeline for AI workflows

Created 2 weeks ago

New!

685 stars

Top 49.2% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Addresses the AI-era challenge of decoupling Markdown source from polished output formats (HTML, DOCX). It offers a unified pipeline to convert diverse inputs (PDF, URLs, images) to clean Markdown, render it into styled HTML via four distinct themes, or produce publisher-grade DOCX. This empowers users with professional, "anti-AI-slop" aesthetics and streamlined content workflows.

How It Works

This command-line suite provides four core capabilities. It converts any input (PDF, DOCX, images, URLs) to Markdown using markitdown. Markdown is then transformed into styled HTML via pandoc and four custom themes (article, report, reading, interactive), designed to avoid common AI-generated content aesthetics. It also converts HTML (local or URL) back to Markdown using html-to-markdown and trafilatura, intelligently extracting main content. Finally, it generates publisher-ready DOCX files from Markdown using python-docx with professional typesetting presets for books and submissions.

Quick Start & Requirements

Install via npx skills add alchaincyf/huashu-md-html. Requires Python 3 and pandoc. The tool self-checks for dependencies like python-docx and Pillow, providing installation commands. Users on macOS should use python3 -m pip install ... to avoid version conflicts. Detailed cookbooks are in the references/ directory.

Highlighted Details

  • Universal Input: Converts PDF, DOCX, images, audio, YouTube, URLs to Markdown.
  • Four "Anti-AI Slop" HTML Themes: article, report, reading, interactive themes are self-contained, professional, and avoid common AI-generated content pitfalls.
  • Publisher-Grade DOCX: Generates DOCX with automatic covers, TOC, headers/footers, and professional typesetting for editorial review or submission.
  • Intelligent URL Processing: Differentiates between structured data pages and prose pages for optimal Markdown extraction.

Maintenance & Community

Developed by independent creator "花叔" (Huasheng). Contact and community links provided for X/Twitter (@AlchainHust), WeChat, Bilibili, YouTube, and personal websites.

Licensing & Compatibility

MIT License permits free personal and commercial use without authorization.

Limitations & Caveats

Command-line only. URL conversion quality may vary, requiring user comparison. Potential macOS Python environment issues exist.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
4
Star History
689 stars in the last 18 days

Explore Similar Projects

Feedback? Help us improve.