MarkEverythingDown by RoffyS

Markdown conversion tool for LLMs

Created 10 months ago

319 stars

Top 85.0% on SourcePulse

Project Summary

MarkEverythingDown is a versatile tool designed to convert a wide array of document formats—including PDFs, Office files, images, code, and notebooks—into structured Markdown. It targets users who need to prepare documents for LLM applications like RAG or dataset creation, offering a user-friendly interface and powerful AI capabilities for enhanced content understanding and preservation.

How It Works

The tool leverages the Qwen2.5 VL multimodal LLM via OpenAI-compatible APIs, supporting both local inference engines (e.g., LMStudio) and cloud providers. This integration allows for sophisticated processing of visual content, including scanned documents, preserving emojis and image descriptions. It offers dual processing modes (local/cloud), intelligent batching for large PDFs, and configurable parameters like temperature and token limits for fine-tuned output.

Quick Start & Requirements

Installation: Clone the repository, set up a virtual environment, and install dependencies using pip install -r requirements.txt.
Prerequisites: Python 3.x, an OpenAI-compatible API endpoint (local or cloud) for the Qwen2.5 VL model.
Usage: Launch the web UI with python main.py --ui or use the command line for direct conversion (e.g., python main.py sample_pdf.pdf).
Documentation: Project Repository

Highlighted Details

Supports conversion of PDF, DOCX, PPTX, XLSX, PNG, JPG, BMP, Python, R, Jupyter Notebooks, and TXT files.
Features AI-powered processing with Qwen2.5 VL for enhanced visual and document understanding.
Offers both a user-friendly Gradio web UI and a command-line interface.
Includes dynamic batching and token management for efficient handling of large documents and API calls.

Maintenance & Community

The project is actively developed by RoffyS, with contributions welcomed via pull requests and issues. The project was inspired by the need for LLM-optimized documentation formats.

Licensing & Compatibility

The project is released under the MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

While the tool supports a broad range of formats, the quality of conversion for complex layouts or highly specialized document types may vary. The effectiveness of vision-based processing is dependent on the underlying Qwen2.5 VL model's capabilities.

MarkEverythingDown by RoffyS

Explore Similar Projects

llmdocparser by lazyFrogLOL

noted.md by tejas-raskar

attachments by MaximeRivest

llm.nvim by Kurama622

ollama-ai-provider by sgomez

SmartResume by alibaba

langchain-swift by buhe

LLM-Kit by wpydcr

Qwen2API by Rfym21

Lumos by andrewnguonly

OCRFlux by chatdoc-com

inference by xorbitsai