im2latex-tensorflow  by ritheshkumar95

TensorFlow implementation of an im2latex system

created 8 years ago
293 stars

Top 91.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a TensorFlow implementation of a deep learning model designed to decompile images of rendered LaTeX formulas into their corresponding LaTeX source code. It targets researchers and developers interested in image-to-markup conversion, offering a solution to the im2latex problem by visually reconstructing mathematical expressions.

How It Works

The system employs an encoder-decoder architecture with an attention mechanism, mirroring the approach in the HarvardNLP paper "What You Get Is What You See: A Visual Markup Decompiler." The encoder processes the input image to extract visual features, while the decoder generates the LaTeX markup token by token. The attention mechanism allows the decoder to focus on relevant image regions during generation, improving accuracy for complex formulas.

Quick Start & Requirements

  • Installation: Clone the repository.
  • Prerequisites: TensorFlow, Python, Pillow, NumPy, Node.js, KaTeX, pdflatex, ImageMagick (convert), Webkit2png.
  • Data: Download training data from Zenodo and follow preprocessing scripts (preprocess_images.py, preprocess_formulas.py, preprocess_filter.py, generate_latex_vocab.py).
  • Training: Run attention.py.
  • Evaluation: Use predict() function in attention.py or Predict.ipynb.
  • Resources: Training requires a GPU (e.g., Nvidia M40 mentioned).

Highlighted Details

  • Implements the HarvardNLP "What You Get Is What You See" paper.
  • Addresses OpenAI's im2latex Request For Research.
  • Achieves low NLL (0.08) after 18 epochs on a 24GB Nvidia M40 GPU.
  • Includes scripts for data preprocessing, vocabulary generation, and evaluation.
  • Provides visualization of the attention mechanism.

Maintenance & Community

The project is a personal implementation by ritheshkumar95, based on the HarvardNLP work. No specific community channels or active maintenance signals are evident in the README.

Licensing & Compatibility

The README does not explicitly state a license. The original HarvardNLP implementation is available under a permissive license. Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

The preprocessing steps are extensive and require careful execution. The project relies on several external tools (Node.js, KaTeX, pdflatex, ImageMagick, Webkit2png) which may add complexity to the setup. The README does not detail performance on hardware other than the mentioned Nvidia M40.

Health Check
Last commit

3 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.