VisionLLaMA  by Meituan-AutoML

Vision transformer research paper

created 1 year ago
386 stars

Top 74.1% on SourcePulse

GitHubView on GitHub
Project Summary

VisionLLaMA presents a unified LLaMA-like transformer backbone for diverse vision tasks, including perception and generation. It aims to provide a strong, generic baseline for vision research by adapting the successful transformer architecture from LLMs to image processing.

How It Works

VisionLLaMA adapts the transformer architecture, fundamental to Large Language Models like LLaMA, for 2D image processing. It introduces both plain and pyramid forms of this LLaMA-like vision transformer, specifically tailored for visual data. This unified approach allows for a single model to handle a wide array of vision tasks, potentially offering substantial gains over existing vision transformers.

Quick Start & Requirements

  • Pre-training instructions are available in PRETRAIN.md.
  • Specific instructions for ImageNet 1k Supervised Training, ADE 20k Segmentation, and COCO Detection are provided in separate files.
  • Details for DiTLLaMA and SiTLLaMA are in their respective markdown files.

Highlighted Details

  • Unified LLaMA-like backbone for vision tasks.
  • Plain and pyramid variants available.
  • Evaluated on image perception and generation tasks.
  • Claims substantial gains over prior state-of-the-art vision transformers.

Maintenance & Community

The project is associated with ECCV2024. Further community or maintenance details are not specified in the provided README.

Licensing & Compatibility

The license type and compatibility for commercial or closed-source use are not specified in the provided README.

Limitations & Caveats

The README does not detail specific limitations, known bugs, or the project's maturity level (e.g., alpha/beta status). Compatibility for commercial use is also not clarified.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI) and Sourabh Bajaj Sourabh Bajaj(Cofounder of Uplimit).

OmniSVG by OmniSVG

0.8%
2k
Multimodal SVG generator research paper leveraging VLMs
created 4 months ago
updated 1 week ago
Feedback? Help us improve.