Image2Katex  by xiaofengShi

OCR for converting formula images to LaTeX expressions

Created 6 years ago
293 stars

Top 90.1% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides an upgraded OCR solution for converting mathematical formula images into LaTeX expressions. It's designed for researchers and developers working on mathematical content digitization, offering a robust pipeline for accurate LaTeX generation from visual inputs.

How It Works

The core approach utilizes a CNN for feature extraction, followed by a bidirectional RNN on the height dimension to capture contextual dependencies in the image. A GRU-based decoder with an attention mechanism then generates the LaTeX output. Positional encoding, inspired by Transformers, is incorporated into the CNN's final layer. The system handles data preprocessing, including tokenization, padding, and bucketing, to optimize training.

Quick Start & Requirements

Highlighted Details

  • Implements three models: im2katex (image to LaTeX), errorchecker (syntax correction, currently deprecated), and dismodel (discriminator for improving LaTeX renderability).
  • Supports training on handwritten, printed, or merged datasets.
  • Offers both greedy and beam search decoding for LaTeX generation.
  • Pre-trained weights are available via BaiduDisk.

Maintenance & Community

  • The project references OpenAI's research and provides links to related GitHub repositories and explanations.
  • No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. The presence of links to OpenAI's "Requests For Research" and other repositories suggests potential non-commercial or research-focused usage, but explicit licensing terms are absent.

Limitations & Caveats

The errorchecker model, intended for correcting LaTeX syntax errors, is noted as deprecated due to poor performance. The dismodel is an ongoing optimization effort. The lack of explicit licensing information may pose a barrier to commercial adoption.

Health Check
Last Commit

5 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

METER by zdou0830

0%
373
Multimodal framework for vision-and-language transformer research
Created 3 years ago
Updated 2 years ago
Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI) and Phil Wang Phil Wang(Prolific Research Paper Implementer).

Cosmos-Tokenizer by NVIDIA

0.1%
2k
Suite of neural tokenizers for image and video processing
Created 10 months ago
Updated 7 months ago
Feedback? Help us improve.