markpdfdown  by MarkPDFdown

CLI tool for converting PDFs to Markdown using multimodal AI

Created 6 months ago
1,582 stars

Top 26.5% on SourcePulse

GitHubView on GitHub
Project Summary

MarkPDFdown is a Python tool that converts PDF and image files into well-formatted Markdown using multimodal large language models. It aims to simplify document conversion for users needing to extract and edit content from PDFs, preserving complex formatting like tables, formulas, and diagrams.

How It Works

The tool leverages multimodal AI models to "visually recognize" and understand the structure and content of PDF documents and images. This approach allows it to go beyond simple OCR, preserving formatting elements such as headings, lists, and tables, and handling complex layouts. Users can also configure the underlying AI model for customized results.

Quick Start & Requirements

  • Install: Use Conda to create an environment (conda create -n markpdfdown python=3.9), activate it (conda activate markpdfdown), clone the repository, and install dependencies (pip install -r requirements.txt).
  • Prerequisites: Python 3.9+, OpenAI API key, and access to a specified multimodal AI model.
  • Usage: Run via python main.py < input.pdf > output.md or python main.py < input_image.png > output.md.
  • Docker: docker run -i -e OPENAI_API_KEY=<your-api-key> jorbenzhu/markpdfdown < input.pdf > output.md
  • Docs: MarkPDFdown English

Highlighted Details

  • Converts both PDF and image files to Markdown.
  • Preserves complex formatting including tables, formulas, and diagrams.
  • Utilizes multimodal AI for visual recognition and understanding.
  • Customizable model configuration.

Maintenance & Community

Contributions are welcome via pull requests.

Licensing & Compatibility

Licensed under the Apache License 2.0. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The tool requires an OpenAI API key and relies on external AI models, which may incur costs and introduce dependencies on third-party services. Specific model requirements are not detailed beyond the need for multimodal capabilities.

Health Check
Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
5
Star History
30 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
1 more.

MinerU by opendatalab

1.2%
44k
PDF extraction tool for converting PDFs to Markdown and JSON
Created 1 year ago
Updated 1 day ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elvis Saravia Elvis Saravia(Founder of DAIR.AI), and
20 more.

markitdown by microsoft

6.7%
77k
Python tool for converting files to Markdown for LLM text analysis
Created 10 months ago
Updated 1 week ago
Feedback? Help us improve.