im2latex-tensorflow by ritheshkumar95

TensorFlow implementation of an im2latex system

Created 9 years ago

292 stars

Top 90.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Sasha Rush

Research Scientist at Cursor; Professor at Cornell Tech

Project Summary

This repository provides a TensorFlow implementation of a deep learning model designed to decompile images of rendered LaTeX formulas into their corresponding LaTeX source code. It targets researchers and developers interested in image-to-markup conversion, offering a solution to the im2latex problem by visually reconstructing mathematical expressions.

How It Works

The system employs an encoder-decoder architecture with an attention mechanism, mirroring the approach in the HarvardNLP paper "What You Get Is What You See: A Visual Markup Decompiler." The encoder processes the input image to extract visual features, while the decoder generates the LaTeX markup token by token. The attention mechanism allows the decoder to focus on relevant image regions during generation, improving accuracy for complex formulas.

Quick Start & Requirements

Installation: Clone the repository.
Prerequisites: TensorFlow, Python, Pillow, NumPy, Node.js, KaTeX, pdflatex, ImageMagick (convert), Webkit2png.
Data: Download training data from Zenodo and follow preprocessing scripts (preprocess_images.py, preprocess_formulas.py, preprocess_filter.py, generate_latex_vocab.py).
Training: Run attention.py.
Evaluation: Use predict() function in attention.py or Predict.ipynb.
Resources: Training requires a GPU (e.g., Nvidia M40 mentioned).

Highlighted Details

Implements the HarvardNLP "What You Get Is What You See" paper.
Addresses OpenAI's im2latex Request For Research.
Achieves low NLL (0.08) after 18 epochs on a 24GB Nvidia M40 GPU.
Includes scripts for data preprocessing, vocabulary generation, and evaluation.
Provides visualization of the attention mechanism.

Maintenance & Community

The project is a personal implementation by ritheshkumar95, based on the HarvardNLP work. No specific community channels or active maintenance signals are evident in the README.

Licensing & Compatibility

The README does not explicitly state a license. The original HarvardNLP implementation is available under a permissive license. Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

The preprocessing steps are extensive and require careful execution. The project relies on several external tools (Node.js, KaTeX, pdflatex, ImageMagick, Webkit2png) which may add complexity to the setup. The README does not detail performance on hardware other than the mentioned Nvidia M40.

im2latex-tensorflow by ritheshkumar95

Explore Similar Projects

TokenFlow by ByteVisionLab

X-Omni by X-Omni-Team

T2I-CompBench by Karine-Huang

UltraPixel by catcathh

long_stable_diffusion by sharonzhou

BLIP3o by JiuhaiChen

Qwen-Image by QwenLM

image-gpt by openai

StableCascade by Stability-AI

guided-diffusion by openai

DeepSeek-OCR by deepseek-ai

stable-diffusion by CompVis