[CVPR 2024] Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
The official code an data for the benchmark with baselines for our paper: Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
This work has been accepted by CVPR 2024 :tada:
Junyi Ma#, Xieyuanli Chen#, Jiawei Huang, Jingyi Xu, Zhen Luo, Jintao Xu, Weihao Gu, Rui Ai, Hesheng Wang*
If you use Cam4DOcc in an academic work, please cite our paper:
@inproceedings{ma2024cvpr,
author = {Junyi Ma and Xieyuanli Chen and Jiawei Huang and Jingyi Xu and Zhen Luo and Jintao Xu and Weihao Gu and Rui Ai and Hesheng Wang},
title = {{Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications}},
booktitle = {Proc.~of the IEEE/CVF Conf.~on Computer Vision and Pattern Recognition (CVPR)},
year = 2024
}
conda create -n cam4docc python=3.7 -y
conda activate cam4docc
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
conda install -c omgarcia gcc-6
pip install mmcv-full==1.4.0
pip install mmdet==2.14.0
pip install mmsegmentation==0.14.1
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v0.17.1 # Other versions may not be compatible.
python setup.py install
pip install timm
pip install open3d-python
pip install PyMCubes
pip install spconv-cu113
pip install fvcore
pip install setuptools==59.5.0
pip install lyft_dataset_sdk # for lyft dataset
git clone [email protected]:haomo-ai/Cam4DOcc.git
cd Cam4DOcc
export PYTHONPATH=“.”
python setup.py develop
Note that the folders under cam4docc
will be generated automatically once you first run our training or evaluation scripts.
Cam4DOcc
├── data/
│ ├── nuscenes/
│ │ ├── maps/
│ │ ├── samples/
│ │ ├── sweeps/
│ │ ├── lidarseg/
│ │ ├── v1.0-test/
│ │ ├── v1.0-trainval/
│ │ ├── nuscenes_occ_infos_train.pkl
│ │ ├── nuscenes_occ_infos_val.pkl
│ ├── nuScenes-Occupancy/
│ ├── lyft/
│ │ ├── maps/
│ │ ├── train_data/
│ │ ├── images/ # from train images, containing xxx.jpeg
│ ├── cam4docc
│ │ ├── GMO/
│ │ │ ├── segmentation/
│ │ │ ├── instance/
│ │ │ ├── flow/
│ │ ├── MMO/
│ │ │ ├── segmentation/
│ │ │ ├── instance/
│ │ │ ├── flow/
│ │ ├── GMO_lyft/
│ │ │ ├── ...
│ │ ├── MMO_lyft/
│ │ │ ├── ...
Alternatively, you could manually modify the path parameters in the config files instead of using the default data structure, which are also listed here:
occ_path = "./data/nuScenes-Occupancy"
depth_gt_path = './data/depth_gt'
train_ann_file = "./data/nuscenes/nuscenes_occ_infos_train.pkl"
val_ann_file = "./data/nuscenes/nuscenes_occ_infos_val.pkl"
cam4docc_dataset_path = "./data/cam4docc/"
nusc_root = './data/nuscenes/'
We directly integrate the Cam4DOcc dataset generation pipeline into the dataloader, so you can directly run training or evaluate scripts and just wait :smirk:
Optionally, you can set only_generate_dataset=True
in the config files to only generate the Cam4DOcc data without model training and inference.
OCFNetV1.1 can forecast inflated GMO and others. In this case, vehicle and human are considered as one unified category.
For the nuScenes dataset, please run
bash run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1.py 8
For the Lyft dataset, please run
bash run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1_lyft.py 8
OCFNetV1.2 can forecast inflated GMO including bicycle, bus, car, construction, motorcycle, trailer, truck, pedestrian, and others. In this case, vehicle and human are divided into multiple categories for clearer evaluation on forecasting performance.
For the nuScenes dataset, please run
bash run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.2.py 8
For the Lyft dataset, please run
bash run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.2_lyft.py 8
If you only want to test the performance of occupancy prediction for the present frame (current observation), please set test_present=True
in the config files. Otherwise, forecasting performance on the future interval is evaluated.
bash run_eval.sh $PATH_TO_CFG $PATH_TO_CKPT $GPU_NUM
# e.g. bash run_eval.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1.py ./work_dirs/OCFNet_in_Cam4DOcc_V1.1/epoch_20.pth 8
Please set save_pred
and save_path
in the config files once saving prediction results is needed.
VPQ
evaluation of 3D instance prediction will be refined in the future.
Please install the dependencies as follows:
sudo apt-get install Xvfb
pip install xvfbwrapper
pip install mayavi
where Xvfb
may be needed for visualization in your server.
Visualize ground-truth occupancy labels. Set show_time_change = True
if you want to show the changing state of occupancy in time intervals.
cd viz
python viz_gt.py
Visualize occupancy forecasting results. Set show_time_change = True
if you want to show the changing state of occupancy in time intervals.
cd viz
python viz_pred.py
There is still room for improvement. Camera-only 4D occupancy forecasting remains challenging, especially for predicting over longer time intervals with many moving objects. We envision this benchmark as a valuable evaluation tool, and our OCFNet can serve as a foundational codebase for future research on 4D occupancy forecasting.
Some basic information as well as key parameters for our current version.
Type | Info | Parameter |
---|---|---|
train | 23,930 sequences | train_capacity |
val | 5,119 frames | test_capacity |
voxel size | 0.2m | voxel_x/y/z |
range | [-51.2m, -51.2m, -5m, 51.2m, 51.2m, 3m] | point_cloud_range |
volume size | [512, 512, 40] | occ_size |
classes | 2 for V1.1 / 9 for V1.2 | num_cls |
observation frames | 3 | time_receptive_field |
future frames | 4 | n_future_frames |
extension frames | 6 | n_future_frames_plus |
Our proposed OCFNet can still perform well while being trained with partial data. Please try to decrease train_capacity
if you want to explore more details with sparser supervision signals.
In addition, please make sure that n_future_frames_plus <= time_receptive_field + n_future_frames
because n_future_frames_plus
means the real prediction number. We estimate more frames including the past ones rather than only n_future_frames
.
We will provide our pretrained models of the erratum version. Your patience is appreciated.
Deprecated:
Please download our pretrained models (for epoch=20) to resume training or reproduce results.
Version | Google Drive | Baidu Cloud | Config |
---|---|---|---|
V1.1 | link | link | OCFNet_in_Cam4DOcc_V1.1.py |
V1.2 | link | link | OCFNet_in_Cam4DOcc_V1.2.py |
We also provide the evaluation on the forecasting performance of other baselines in Cam4DOcc.
The tutorial is being updated ...
We will release our pretrained models as soon as possible. OCFNetV1.3 and OCFNetV2 are on their way ...
We thank the fantastic works OpenOccupancy, PowerBEV, and FIERY for their pioneer code release, which provide codebase for this benchmark.