Framework for multi-modal large language model (MLLM) training and evaluation
Top 86.7% on sourcepulse
LAMM provides a framework and dataset for training and evaluating Multi-modal Large Language Models (MLLMs), enabling the development of AI agents that bridge human ideas and machine execution. It targets researchers and developers seeking to build and test sophisticated multi-modal AI systems.
How It Works
LAMM focuses on language-assisted multi-modal instruction tuning, allowing models to understand and respond to complex instructions involving both text and visual inputs. The framework supports 2D and 3D tasks, facilitating the creation of agents capable of diverse applications, from image quality assessment to embodied AI in simulated environments like Minecraft.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is actively updated with new research preprints and framework releases, including Ch3Ef and DepictQA. Checkpoints and leaderboards are maintained on Huggingface.
Licensing & Compatibility
The project is licensed under CC BY NC 4.0, strictly limiting use to non-commercial purposes. Models trained with the dataset are also restricted to research use only.
Limitations & Caveats
The CC BY NC 4.0 license and the restriction on using trained models outside research purposes significantly limit commercial adoption and integration into proprietary systems.
1 year ago
1 week