Awesome-Multimodal-Next-Token-Prediction by LMM101

Survey of next token prediction for multimodal intelligence

Created 1 year ago

476 stars

Top 63.4% on SourcePulse

Project Summary

This repository serves as a comprehensive survey of Next Token Prediction (NTP) techniques applied to multimodal intelligence, covering advancements in vision and audio processing. It's a valuable resource for researchers and practitioners exploring the integration of language models with other modalities for enhanced understanding and generation tasks.

How It Works

The survey categorizes and details various approaches to multimodal NTP, focusing on tokenization strategies for vision and audio, state-of-the-art multimodal models, and prompt engineering techniques like In-Context Learning (ICL) and Chain-of-Thought (CoT). It highlights how NTP has become a versatile objective for tasks ranging from image captioning to speech synthesis.

Quick Start & Requirements

This repository is a curated collection of papers and associated code repositories. There are no direct installation or execution commands for the repository itself. Users are directed to individual linked repositories for specific setup instructions and dependencies, which will vary by project.

Highlighted Details

Comprehensive survey of multimodal Next Token Prediction (NTP).
Covers tokenization, models, and prompt engineering for vision and audio.
Includes links to numerous research papers and their corresponding GitHub repositories.
Features recent advancements up to late 2024.

Maintenance & Community

The survey was released on arXiv and GitHub on December 30, 2024. The authors encourage pull requests for seasonal updates to include the latest research.

Licensing & Compatibility

The repository itself does not specify a license. Individual linked repositories will have their own licenses, which may include restrictions on commercial use or linking with closed-source software.

Limitations & Caveats

As a survey, this repository does not provide executable code or models directly. Users must refer to the linked external repositories for implementation details and potential compatibility issues. The rapid pace of research means the survey may not be exhaustive of all cutting-edge developments immediately after release.

Awesome-Multimodal-Next-Token-Prediction by LMM101

Explore Similar Projects

Awesome-Multimodal-Modeling by OpenEnvision

LongCat-Next by meituan-longcat

LaVIT by jy0205

SEED by AILab-CVC

Awesome-Unified-Multimodal-Models by showlab

Awesome_Matching_Pretraining_Transfering by Paranioar

Gemini by kyegomez

PandaGPT by yxuansu

AnyGPT by OpenMOSS

Cosmos-Tokenizer by NVIDIA

Emu3 by baaivision

Janus by deepseek-ai