LLM-in-Vision by DirtyHarryLYL

Curated list of LLM-based CV and multimodal research papers

Created 2 years ago

875 stars

Top 41.1% on SourcePulse

Project Summary

This repository serves as a curated collection of recent research papers and projects at the intersection of Large Language Models (LLMs) and Computer Vision (CV). It aims to provide a comprehensive overview of advancements in multimodal AI, particularly for researchers and practitioners in the field.

How It Works

The project functions as a dynamic, continuously updated bibliography. It aggregates and categorizes papers based on their publication date and topic, focusing on how LLMs are being integrated with or applied to visual tasks. The collection highlights novel approaches in areas like visual reasoning, image/video generation, robotic control, and multimodal understanding.

Highlighted Details

Extensive coverage of LLM applications in CV, spanning from foundational research to specific task-oriented solutions.
Categorization by publication date, allowing users to track the latest trends and developments.
Links to papers and project pages for direct access to research artifacts and code.
Focus on emerging areas such as embodied AI, robotic manipulation, and multimodal instruction tuning.

Maintenance & Community

The repository is actively maintained by DirtyHarryLYL, with an open invitation for contributions and comments from the community.

Licensing & Compatibility

The repository itself is a collection of links and information; licensing details would pertain to the individual projects linked within.

Limitations & Caveats

This is a curated list of research papers and does not provide a unified codebase or framework for direct use. Users must refer to individual linked projects for their specific requirements, dependencies, and licensing.

LLM-in-Vision by DirtyHarryLYL

Explore Similar Projects

Awesome-Multimodality by Yutong-Zhou-cv

Awesome_Multimodel_LLM by Atomic-man007

Awesome-Multimodal-Papers by friedrichor

RoboBrain by FlagOpen

Awesome-Unified-Multimodal-Models by AIDC-AI

Multimodal-AND-Large-Language-Models by Yangyi-Chen

Awesome-VLA4AD by JohnsonJiang1996

Awesome_Matching_Pretraining_Transfering by Paranioar

Awesome-LLM-Eval by onejune2018

Awesome-RL-based-Reasoning-MLLMs by Sun-Haoyuan23

VisionLLM by OpenGVLab

Awesome-Multimodal-Research by Eurus-Holmes