Cap3D  by crockwell

Research paper for scalable 3D captioning using pretrained models

created 2 years ago
269 stars

Top 96.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a scalable solution for 3D object captioning, leveraging pretrained models to generate detailed descriptions of 3D objects. It is designed for researchers and developers working with 3D data and natural language processing, offering a large dataset of 3D-object-caption pairs and associated code.

How It Works

The project consolidates multi-view information from 3D objects using a pipeline that integrates pretrained models for captioning, alignment, and large language models (LLMs). This approach allows for the generation of comprehensive and accurate descriptions by effectively combining visual features from various perspectives.

Quick Start & Requirements

  • Data and code are available on Hugging Face.
  • Key dependencies include PyTorch, Blender, PyTorch3D, BLIP2, and CLIP.

Highlighted Details

  • Provides 1,002,422 descriptive captions for 3D objects in Objaverse and Objaverse-XL.
  • Includes corresponding point clouds and rendered images with camera, depth, and MatAlpha information.
  • Offers code for 3D captioning pipelines and text-to-3D evaluation/fine-tuning.

Maintenance & Community

This project is associated with NeurIPS 2023 and has contributions from researchers at institutions like LG AI Research and NSF.

Licensing & Compatibility

The README does not explicitly state the license. However, the project relies on and acknowledges several open-source projects (PyTorch, Blender, PyTorch3D, BLIP2, CLIP), suggesting a general compatibility with open-source ecosystems.

Limitations & Caveats

The README does not detail specific limitations or caveats regarding the code's functionality, performance, or potential issues.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.