MotionLLM  by IDEA-Research

MotionLLM: Research paper for multimodal human behavior understanding

created 1 year ago
341 stars

Top 82.0% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

MotionLLM addresses the challenge of understanding human behavior by jointly modeling video and motion sequences, leveraging Large Language Models (LLMs). It targets researchers and developers working on multi-modal human behavior analysis, offering a unified framework for tasks like motion captioning and spatio-temporal reasoning. The primary benefit is enhanced understanding of nuanced human dynamics by combining coarse video-text data with fine-grained motion-text data.

How It Works

MotionLLM employs a unified video-motion training strategy. It integrates LLMs with motion data (e.g., SMPL sequences) and video data, utilizing a LoRA adapter and a projection layer. This approach allows the model to capture complementary information from both modalities, leading to richer spatial-temporal insights and improved performance in understanding and reasoning about human actions.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Requires a pre-trained LLM (e.g., vicuna-7b-v1.5) prepared using Lit-GPT, and MotionLLM checkpoints (LoRA and projection layer). Specific instructions for obtaining lit_model.pth are linked to an issue.
  • Demo: Local deployment via Gradio (app.py) or CLI (cli.py).
  • Resources: Requires downloading pre-trained model weights. Specific hardware requirements (e.g., GPU type) are not explicitly stated but implied by the LLM dependency.
  • Links: Lit-GPT, MoVid dataset, Online Demo

Highlighted Details

  • Unified video-motion training strategy for comprehensive human behavior understanding.
  • Introduces the MoVid dataset and MoVid-Bench for evaluation.
  • Demonstrates superiority in captioning, spatial-temporal comprehension, and reasoning.
  • Supports both Gradio and CLI demos for local testing.

Maintenance & Community

The project is actively developed, with recent news regarding dataset and demo releases. It builds upon several existing projects including Video-LLaVA, HumanTOMATO, MotionGPT, lit-gpt, and HumanML3D. Contact information for authors is provided for inquiries.

Licensing & Compatibility

Distributed under an "IDEA LICENSE". Users must also adhere to the licenses of its dependencies. Commercial use implications are not explicitly detailed beyond the custom license.

Limitations & Caveats

The project is in active development, with some planned features like motion demo release and detailed tuning instructions still pending. The "IDEA LICENSE" may have specific restrictions not detailed in the README.

Health Check
Last commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
29 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.