deep-text-recognition-benchmark  by roatienza

PyTorch code for scene text recognition research paper

Created 4 years ago
306 stars

Top 87.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides PyTorch code for ViTSTR, a Vision Transformer-based model for fast and efficient scene text recognition. It offers comparable accuracy to state-of-the-art models with significantly fewer parameters and FLOPS, making it suitable for researchers and developers working on OCR and text detection tasks.

How It Works

ViTSTR leverages a pre-trained Vision Transformer (ViT) architecture for scene text recognition. This approach capitalizes on the parallel processing capabilities inherent in ViTs, leading to faster inference times compared to traditional recurrent or convolutional models. The model is designed as a single-stage architecture, simplifying its implementation and training.

Quick Start & Requirements

Highlighted Details

  • Achieves competitive accuracy on various benchmarks (IIIT, SVT, IC03, IC15, etc.) with fewer parameters and FLOPS.
  • Demonstrates fast inference times: ~2.57ms on Quadro RTX 6000, ~28ms on CPU.
  • Offers quantized models optimized for x86 and Raspberry Pi 4.
  • Supports training from scratch with detailed configurations for data augmentation and multi-GPU setups.

Maintenance & Community

  • The project is based on a fork of CLOVA AI Deep Text Recognition Benchmark.
  • The primary contributor is Rowel Atienza.
  • The paper associated with this work is "Vision Transformer for Fast and Efficient Scene Text Recognition" (ICDAR 2021).

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. However, being a fork of another project, its licensing status may depend on the original repository. Users should verify licensing for commercial use.

Limitations & Caveats

  • The README does not specify the exact license, which may pose a risk for commercial adoption.
  • Training requires significant computational resources, especially for larger models and extensive data augmentation.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), Lianmin Zheng Lianmin Zheng(Coauthor of SGLang, vLLM), and
2 more.

HunyuanVideo by Tencent-Hunyuan

0.2%
11k
PyTorch code for video generation research
Created 9 months ago
Updated 3 weeks ago
Feedback? Help us improve.