CVPR2023-3D-Occupancy-Prediction by CVPR2023-3D-Occupancy-Prediction

3D occupancy prediction benchmark for autonomous driving scene perception

Created 2 years ago

861 stars

Top 41.7% on SourcePulse

Project Summary

This repository hosts the CVPR 2023 3D Occupancy Prediction Challenge, providing a benchmark for autonomous driving scene perception. It addresses the limitations of traditional 3D bounding box detection by enabling dense, voxel-wise prediction of scene occupancy and semantics from surround-view images. The target audience includes researchers and engineers in autonomous driving and computer vision.

How It Works

The challenge focuses on predicting the occupancy state (free or occupied) and semantic class for each voxel in a 3D scene, using only camera images as input. This approach allows for a more detailed representation of the environment compared to bounding boxes, capturing complex object shapes and background elements. The benchmark utilizes a voxelized representation derived from the nuScenes dataset, requiring models to perform dense 3D prediction.

Quick Start & Requirements

Baseline: A baseline model based on BEVFormer is provided. Refer to getting_started for details.
Data: The dataset is based on nuScenes, with mini (440MB), trainval (32GB), and test (6GB) splits available for download.
Submission: Results are submitted via an evaluation server, requiring a specific .npz format for each frame.

Highlighted Details

Benchmark: The first large-scale 3D occupancy benchmark for autonomous driving.
Data: Voxelized representation with occupancy state and semantics, derived from nuScenes.
Evaluation: Primarily ranked by mean Intersection over Union (mIoU).
Input: Camera images only; no future frames allowed during inference.

Maintenance & Community

The challenge server remains active. For leaderboard updates, contact contact@opendrivelab.com.
Challenge website: https://opendrivelab.com/AD23Challenge.html

Licensing & Compatibility

Dataset: Subject to nuScenes dataset terms of use.
Code: MIT License.

Limitations & Caveats

The nuScenes dataset has known issues with z-axis translation, potentially affecting precise 6D localization and point cloud accumulation. Some data exhibits ground stratification. The evaluation uses a mask_camera to exclude voxels not visible to cameras.

CVPR2023-3D-Occupancy-Prediction by CVPR2023-3D-Occupancy-Prediction

Explore Similar Projects

PonderV2 by OpenGVLab

ml-cubifyanything by apple

DepthLM_Official by facebookresearch

Awesome-4D-Spatial-Intelligence by yukangcao

SCube by nv-tlabs

RoboBEV by worldbench

3D-Occupancy-Perception by HuaiyuanXu

BEV-Perception by vasgaowei

DDAD by TRI-ML

GaussianObject by chensjtu

NSVF by facebookresearch

ml-hypersim by apple