EgoX  by DAVIAN-Robotics

Egocentric video generation from exocentric input

Created 3 months ago
644 stars

Top 51.7% on SourcePulse

GitHubView on GitHub
Project Summary

EgoX is a novel framework for generating egocentric (first-person) videos from a single exocentric (third-person) video input. It addresses realistic viewpoint transformation while maintaining temporal consistency and scene structure. Designed for researchers in egocentric video synthesis, EgoX offers a powerful tool for creating immersive first-person perspectives by leveraging external observations and egocentric priors.

How It Works

The framework builds upon large-scale video diffusion models trained on the Ego-Exo4D dataset. EgoX employs a unified conditioning strategy integrating spatial and channel information within latent representations for realistic viewpoint transformation. A key advantage is its lightweight adaptation mechanism using LoRA-based fine-tuning, significantly reducing customization computational burden.

Quick Start & Requirements

  • Installation: Requires Python 3.10, CUDA 12.1+, and compatible PyTorch. Installation involves conda environment setup, PyTorch installation, and pip installing dependencies.
  • Hardware: Substantial GPU VRAM is mandatory: ≥ 80GB for inference, ≥ 140GB for training.
  • Model Weights: Pretrained Wan2.1-I2V-14B and EgoX LoRA weights must be downloaded from Hugging Face/Google Drive.
  • Inference: Quick testing uses example data via shell scripts (scripts/infer_itw.sh, scripts/infer_ego4d.sh). Custom data inference requires specific directory structures and metadata preparation.
  • Links: Teaser Video: `https://github.com/user
Health Check
Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
4
Star History
48 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.