Cap3D by crockwell

Research paper for scalable 3D captioning using pretrained models

Created 2 years ago

276 stars

Top 93.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Ajay Jain

Cofounder of Genmo

Project Summary

This repository provides a scalable solution for 3D object captioning, leveraging pretrained models to generate detailed descriptions of 3D objects. It is designed for researchers and developers working with 3D data and natural language processing, offering a large dataset of 3D-object-caption pairs and associated code.

How It Works

The project consolidates multi-view information from 3D objects using a pipeline that integrates pretrained models for captioning, alignment, and large language models (LLMs). This approach allows for the generation of comprehensive and accurate descriptions by effectively combining visual features from various perspectives.

Quick Start & Requirements

Data and code are available on Hugging Face.
Key dependencies include PyTorch, Blender, PyTorch3D, BLIP2, and CLIP.

Highlighted Details

Provides 1,002,422 descriptive captions for 3D objects in Objaverse and Objaverse-XL.
Includes corresponding point clouds and rendered images with camera, depth, and MatAlpha information.
Offers code for 3D captioning pipelines and text-to-3D evaluation/fine-tuning.

Maintenance & Community

This project is associated with NeurIPS 2023 and has contributions from researchers at institutions like LG AI Research and NSF.

Licensing & Compatibility

The README does not explicitly state the license. However, the project relies on and acknowledges several open-source projects (PyTorch, Blender, PyTorch3D, BLIP2, CLIP), suggesting a general compatibility with open-source ecosystems.

Limitations & Caveats

The README does not detail specific limitations or caveats regarding the code's functionality, performance, or potential issues.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days