VisionLLaMA  by Meituan-AutoML

Vision transformer research paper

Created 1 year ago
389 stars

Top 73.7% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

VisionLLaMA presents a unified LLaMA-like transformer backbone for diverse vision tasks, including perception and generation. It aims to provide a strong, generic baseline for vision research by adapting the successful transformer architecture from LLMs to image processing.

How It Works

VisionLLaMA adapts the transformer architecture, fundamental to Large Language Models like LLaMA, for 2D image processing. It introduces both plain and pyramid forms of this LLaMA-like vision transformer, specifically tailored for visual data. This unified approach allows for a single model to handle a wide array of vision tasks, potentially offering substantial gains over existing vision transformers.

Quick Start & Requirements

  • Pre-training instructions are available in PRETRAIN.md.
  • Specific instructions for ImageNet 1k Supervised Training, ADE 20k Segmentation, and COCO Detection are provided in separate files.
  • Details for DiTLLaMA and SiTLLaMA are in their respective markdown files.

Highlighted Details

  • Unified LLaMA-like backbone for vision tasks.
  • Plain and pyramid variants available.
  • Evaluated on image perception and generation tasks.
  • Claims substantial gains over prior state-of-the-art vision transformers.

Maintenance & Community

The project is associated with ECCV2024. Further community or maintenance details are not specified in the provided README.

Licensing & Compatibility

The license type and compatibility for commercial or closed-source use are not specified in the provided README.

Limitations & Caveats

The README does not detail specific limitations, known bugs, or the project's maturity level (e.g., alpha/beta status). Compatibility for commercial use is also not clarified.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Douwe Kiela Douwe Kiela(Cofounder of Contextual AI), and
1 more.

lens by ContextualAI

0%
353
Vision-language research paper using LLMs
Created 2 years ago
Updated 2 months ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

Awesome-Visual-Transformer by dk-liang

0.1%
4k
Vision transformer paper collection
Created 4 years ago
Updated 9 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), and
15 more.

taming-transformers by CompVis

0.1%
6k
Image synthesis research paper using transformers
Created 4 years ago
Updated 1 year ago
Feedback? Help us improve.