LLM-in-Vision  by DirtyHarryLYL

Curated list of LLM-based CV and multimodal research papers

created 2 years ago
869 stars

Top 42.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a curated collection of recent research papers and projects at the intersection of Large Language Models (LLMs) and Computer Vision (CV). It aims to provide a comprehensive overview of advancements in multimodal AI, particularly for researchers and practitioners in the field.

How It Works

The project functions as a dynamic, continuously updated bibliography. It aggregates and categorizes papers based on their publication date and topic, focusing on how LLMs are being integrated with or applied to visual tasks. The collection highlights novel approaches in areas like visual reasoning, image/video generation, robotic control, and multimodal understanding.

Highlighted Details

  • Extensive coverage of LLM applications in CV, spanning from foundational research to specific task-oriented solutions.
  • Categorization by publication date, allowing users to track the latest trends and developments.
  • Links to papers and project pages for direct access to research artifacts and code.
  • Focus on emerging areas such as embodied AI, robotic manipulation, and multimodal instruction tuning.

Maintenance & Community

The repository is actively maintained by DirtyHarryLYL, with an open invitation for contributions and comments from the community.

Licensing & Compatibility

The repository itself is a collection of links and information; licensing details would pertain to the individual projects linked within.

Limitations & Caveats

This is a curated list of research papers and does not provide a unified codebase or framework for direct use. Users must refer to individual linked projects for their specific requirements, dependencies, and licensing.

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.