LAMM by OpenGVLab

Framework for multi-modal large language model (MLLM) training and evaluation

Created 2 years ago

317 stars

Top 85.4% on SourcePulse

Project Summary

LAMM provides a framework and dataset for training and evaluating Multi-modal Large Language Models (MLLMs), enabling the development of AI agents that bridge human ideas and machine execution. It targets researchers and developers seeking to build and test sophisticated multi-modal AI systems.

How It Works

LAMM focuses on language-assisted multi-modal instruction tuning, allowing models to understand and respond to complex instructions involving both text and visual inputs. The framework supports 2D and 3D tasks, facilitating the creation of agents capable of diverse applications, from image quality assessment to embodied AI in simulated environments like Minecraft.

Quick Start & Requirements

Install: Refer to the tutorial for basic usage.
Requirements: Light training framework available for V100 or RTX3090 GPUs. LLaMA2-based finetuning is supported.
Resources: Checkpoints and evaluation code are available on Huggingface.
Links: Project Page, Demo Video (YouTube/Bilibili), Full Paper, LAMM Dataset (Huggingface/OpenDataLab).

Highlighted Details

Accepted by NeurIPS 2023 Datasets & Benchmark Track.
Includes comprehensive evaluation frameworks like ChEF.
Supports advanced MLLM research with projects like Octavius (mitigating task interference) and MP5 (embodied AI).
Features datasets like DepictQA for image quality assessment.

Maintenance & Community

The project is actively updated with new research preprints and framework releases, including Ch3Ef and DepictQA. Checkpoints and leaderboards are maintained on Huggingface.

Licensing & Compatibility

The project is licensed under CC BY NC 4.0, strictly limiting use to non-commercial purposes. Models trained with the dataset are also restricted to research use only.

Limitations & Caveats

The CC BY NC 4.0 license and the restriction on using trained models outside research purposes significantly limit commercial adoption and integration into proprietary systems.

LAMM by OpenGVLab

Explore Similar Projects

mllm_interview_note by wdndev

MMBench by open-compass

Awesome-Multimodal-LLM by HenryHZY

Efficient-Multimodal-LLMs-Survey by swordlidev

Visual-CoT by deepcs233

Awesome-Multimodal-Large-Language-Models by yfzhang114

Multi-Modality-Arena by OpenGVLab

VLM2Vec by TIGER-AI-Lab

Uni-MoE by HITsz-TMG

molmo by allenai

TinyLLaVA_Factory by TinyLLaVA

Awesome-Multimodal-Large-Language-Models by BradyFU