scaling_on_scales by bfshi

Pytorch wrapper for multi-scale vision feature extraction

Created 1 year ago

412 stars

Top 70.8% on SourcePulse

Project Summary

This repository provides S²-Wrapper, a PyTorch mechanism for extracting multi-scale features from any vision model. It enables improved performance by scaling image resolution rather than solely relying on larger model sizes, targeting researchers and developers working with vision models, particularly in multimodal contexts.

How It Works

S²-Wrapper operates by wrapping a given vision model's forward pass. It intelligently resizes and potentially splits input images to specified scales, feeds them through the model, and concatenates the resulting features. This approach allows models to process information at multiple resolutions, capturing finer details and broader context without requiring architectural changes to the base model.

Quick Start & Requirements

Install via pip: pip install git+https://github.com/bfshi/scaling_on_scales.git
Requires PyTorch.
Supports non-square images (experimental branch dev_any_shape).
Official integration and checkpoints available for LLaVA and NVIDIA VILA.
Documentation: https://arxiv.org/abs/2403.13043

Highlighted Details

Enables multi-scale feature extraction with a single line of code.
Integrated into LLaVA and NVIDIA VILA, with performance benchmarks provided.
Supports dynamic aspect ratio processing via Dynamic-S² in NVILA.
Offers options for splitting large images to manage memory usage.

Maintenance & Community

Accepted to ECCV 2024.
Active development with ongoing to-dos for new checkpoints and features.
Integrations with major projects like LLaVA and NVIDIA VILA indicate community adoption.

Licensing & Compatibility

The repository itself does not explicitly state a license in the README. The code is available on GitHub, implying a permissive license unless otherwise specified.
Compatible with standard PyTorch vision models.

Limitations & Caveats

Support for non-square images is noted as experimental.
Training requires specific configuration changes to existing frameworks like LLaVA.

Health Check

Last Commit

9 months ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

imagenie by zhongweili

Desktop app for AI-powered image transformations

Created 1 year ago

Updated 8 months ago

PixelOE by KohakuBlueleaf

Python library for detail-oriented pixel art generation from images

Created 1 year ago

Updated 3 months ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

segment-anything-with-clip by Curt-Park

Segmentation pipeline combining Segment Anything Model (SAM) with CLIP

Created 2 years ago

Updated 1 year ago

CLIP-ImageSearch-NCNN by EdVince

Image search demo using natural language queries

Created 3 years ago

Updated 2 years ago

Awesome-CV-MasterHub by cuixing158

CV paper list for recent computer vision research

Created 9 months ago

Updated 2 days ago

Starred by

Jesse Clark

Jesse Clark(Cofounder of Marqo) and

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

RADIO by NVlabs

Vision foundation model for distilling large models

Created 2 years ago

Updated 4 days ago

Palette-Image-to-Image-Diffusion-Models by Janspiry

PyTorch image-to-image diffusion model implementation

Created 3 years ago

Updated 2 years ago

Starred by

Elie Bursztein

Elie Bursztein(Cybersecurity Lead at Google DeepMind) and

Robin Huang

Robin Huang(Cofounder of Comfy Org).

ComfyUI-Impact-Pack by ltdrdata

ComfyUI custom nodes for image enhancement via detectors, detailers, upscalers

Created 2 years ago

Updated 1 week ago

EfficientDet by xuannianz

Keras/TensorFlow implementation for EfficientDet object detection

Created 6 years ago

Updated 2 years ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and

Kevin Hou

Kevin Hou(Head of Product Engineering at Windsurf).

ImageAI by OlafenwaMoses

Python library for computer vision tasks

Created 7 years ago

Updated 1 year ago

Starred by

Anastasios Angelopoulos

Anastasios Angelopoulos(Cofounder of LMArena),

Chenlin Meng

Chenlin Meng(Cofounder of Pika), and

1 more.

Pytorch-UNet by milesial

PyTorch implementation for image semantic segmentation

Created 8 years ago

Updated 1 year ago

Starred by

Clement Delangue

Clement Delangue(Cofounder of Hugging Face),

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and

26 more.

pytorch-image-models by huggingface

PyTorch image model collection with training, eval, and inference scripts

Created 6 years ago

Updated 3 days ago

Feedback? Help us improve.