markpdfdown  by MarkPDFdown

CLI tool for converting PDFs to Markdown using multimodal AI

created 4 months ago
1,505 stars

Top 28.0% on sourcepulse

GitHubView on GitHub
Project Summary

MarkPDFdown is a Python tool that converts PDF and image files into well-formatted Markdown using multimodal large language models. It aims to simplify document conversion for users needing to extract and edit content from PDFs, preserving complex formatting like tables, formulas, and diagrams.

How It Works

The tool leverages multimodal AI models to "visually recognize" and understand the structure and content of PDF documents and images. This approach allows it to go beyond simple OCR, preserving formatting elements such as headings, lists, and tables, and handling complex layouts. Users can also configure the underlying AI model for customized results.

Quick Start & Requirements

  • Install: Use Conda to create an environment (conda create -n markpdfdown python=3.9), activate it (conda activate markpdfdown), clone the repository, and install dependencies (pip install -r requirements.txt).
  • Prerequisites: Python 3.9+, OpenAI API key, and access to a specified multimodal AI model.
  • Usage: Run via python main.py < input.pdf > output.md or python main.py < input_image.png > output.md.
  • Docker: docker run -i -e OPENAI_API_KEY=<your-api-key> jorbenzhu/markpdfdown < input.pdf > output.md
  • Docs: MarkPDFdown English

Highlighted Details

  • Converts both PDF and image files to Markdown.
  • Preserves complex formatting including tables, formulas, and diagrams.
  • Utilizes multimodal AI for visual recognition and understanding.
  • Customizable model configuration.

Maintenance & Community

Contributions are welcome via pull requests.

Licensing & Compatibility

Licensed under the Apache License 2.0. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The tool requires an OpenAI API key and relies on external AI models, which may incur costs and introduce dependencies on third-party services. Specific model requirements are not detailed beyond the need for multimodal capabilities.

Health Check
Last commit

6 days ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
1
Star History
742 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.