MotionLLM by IDEA-Research

MotionLLM: Research paper for multimodal human behavior understanding

Created 1 year ago

364 stars

Top 77.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

MotionLLM addresses the challenge of understanding human behavior by jointly modeling video and motion sequences, leveraging Large Language Models (LLMs). It targets researchers and developers working on multi-modal human behavior analysis, offering a unified framework for tasks like motion captioning and spatio-temporal reasoning. The primary benefit is enhanced understanding of nuanced human dynamics by combining coarse video-text data with fine-grained motion-text data.

How It Works

MotionLLM employs a unified video-motion training strategy. It integrates LLMs with motion data (e.g., SMPL sequences) and video data, utilizing a LoRA adapter and a projection layer. This approach allows the model to capture complementary information from both modalities, leading to richer spatial-temporal insights and improved performance in understanding and reasoning about human actions.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Requires a pre-trained LLM (e.g., vicuna-7b-v1.5) prepared using Lit-GPT, and MotionLLM checkpoints (LoRA and projection layer). Specific instructions for obtaining lit_model.pth are linked to an issue.
Demo: Local deployment via Gradio (app.py) or CLI (cli.py).
Resources: Requires downloading pre-trained model weights. Specific hardware requirements (e.g., GPU type) are not explicitly stated but implied by the LLM dependency.
Links: Lit-GPT, MoVid dataset, Online Demo

Highlighted Details

Unified video-motion training strategy for comprehensive human behavior understanding.
Introduces the MoVid dataset and MoVid-Bench for evaluation.
Demonstrates superiority in captioning, spatial-temporal comprehension, and reasoning.
Supports both Gradio and CLI demos for local testing.

Maintenance & Community

The project is actively developed, with recent news regarding dataset and demo releases. It builds upon several existing projects including Video-LLaVA, HumanTOMATO, MotionGPT, lit-gpt, and HumanML3D. Contact information for authors is provided for inquiries.

Licensing & Compatibility

Distributed under an "IDEA LICENSE". Users must also adhere to the licenses of its dependencies. Commercial use implications are not explicitly detailed beyond the custom license.

Limitations & Caveats

The project is in active development, with some planned features like motion demo release and detailed tuning instructions still pending. The "IDEA LICENSE" may have specific restrictions not detailed in the README.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days