Pytorch wrapper for multi-scale vision feature extraction
Top 73.1% on SourcePulse
This repository provides S²-Wrapper, a PyTorch mechanism for extracting multi-scale features from any vision model. It enables improved performance by scaling image resolution rather than solely relying on larger model sizes, targeting researchers and developers working with vision models, particularly in multimodal contexts.
How It Works
S²-Wrapper operates by wrapping a given vision model's forward pass. It intelligently resizes and potentially splits input images to specified scales, feeds them through the model, and concatenates the resulting features. This approach allows models to process information at multiple resolutions, capturing finer details and broader context without requiring architectural changes to the base model.
Quick Start & Requirements
pip install git+https://github.com/bfshi/scaling_on_scales.git
dev_any_shape
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
5 months ago
1 day