Discover and explore top open-source AI tools and projects—updated daily.
PyTorch code for scene text recognition research paper
Top 87.5% on SourcePulse
This repository provides PyTorch code for ViTSTR, a Vision Transformer-based model for fast and efficient scene text recognition. It offers comparable accuracy to state-of-the-art models with significantly fewer parameters and FLOPS, making it suitable for researchers and developers working on OCR and text detection tasks.
How It Works
ViTSTR leverages a pre-trained Vision Transformer (ViT) architecture for scene text recognition. This approach capitalizes on the parallel processing capabilities inherent in ViTs, leading to faster inference times compared to traditional recurrent or convolutional models. The model is designed as a single-stage architecture, simplifying its implementation and training.
Quick Start & Requirements
pip3 install -r requirements.txt
python3 infer.py --image <path_to_image> --model <model_url_or_path>
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
Inactive