Awesome-Video-LMM-Post-Training by yunlong10

Curated research on advanced video reasoning with large multimodal models

Created 11 months ago

289 stars

Top 90.9% on SourcePulse

Project Summary

This repository serves as a comprehensive, curated "Awesome List" for researchers and developers focused on advancing the reasoning capabilities of Video Large Multimodal Models (Video-LMMs) through post-training techniques. It systematically tracks the latest papers, code, and datasets, offering a structured overview of cutting-edge research to accelerate development in this domain.

How It Works

The project categorizes Video-LMM post-training research into three primary paradigms: Reinforced Video-LMMs, which leverage reinforcement learning techniques (e.g., RLHF, DPO, GRPO) and reward models for alignment; SFT for Reasoning, focusing on supervised fine-tuning with reasoning-centric datasets and structured formats like Chain-of-Thought (CoT); and Test-Time Scaling, exploring inference-time strategies such as agentic frameworks, tool use, RAG, and long CoT. This taxonomy provides a clear framework for understanding diverse approaches to enhancing video understanding and reasoning.

Quick Start & Requirements

This repository is a curated list of research resources and does not provide direct installation or execution commands. Users are directed to individual papers for implementation details, dependencies, and setup instructions.

Highlighted Details

Systematic curation of three key post-training paradigms: Reinforcement Learning, Supervised Fine-Tuning, and Test-Time Scaling.
Inclusion of the latest and most challenging benchmarks specifically designed to evaluate complex Video-LMM reasoning abilities.
Regular updates with recent papers, code repositories, and datasets in the rapidly evolving Video-LMM field.

Maintenance & Community

The repository was initially released in June 2025 and features a survey paper published in October 2025. It actively encourages community involvement, welcoming contributions via Pull Requests.

Licensing & Compatibility

The provided README content does not specify an open-source license. This absence may present compatibility concerns for commercial use or integration into proprietary systems without further clarification.

Limitations & Caveats

As a curated list, this repository does not offer runnable code or direct implementations. Users must consult individual research papers for specific technical requirements, dependencies, and performance metrics. The focus is exclusively on post-training methodologies, potentially excluding foundational model development or pre-training aspects.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days