SphereFormer Save

The official implementation for "Spherical Transformer for LiDAR-based 3D Recognition" (CVPR 2023).

Project README

Spherical Transformer for LiDAR-based 3D Recognition (CVPR 2023)

This is the official PyTorch implementation of SphereFormer (CVPR 2023).

Spherical Transformer for LiDAR-based 3D Recognition [Paper]

Xin Lai, Yukang Chen, Fanbin Lu, Jianhui Liu, Jiaya Jia

Highlight

SphereFormer is a plug-and-play transformer module. We develop radial window attention, which significantly boosts the segmentation performance of distant points, e.g., from 13.3% to 30.4% mIoU on nuScenes lidarseg val set.
It achieves superior performance on various outdoor semantic segmentation benchmarks, e.g., nuScenes, SemanticKITTI, Waymo, and also shows competitive results on nuScenes detection dataset.
This repository employs a fast and memory-efficient library for sparse transformer with varying token numbers, SparseTransformer.

Get Started

For object deteciton, please go to the detection/ directory. (or click Here)

The below guide is for semantic segmentation.

Environment

Install dependencies (we test on python=3.7.9, pytorch==1.8.0, cuda==11.1, gcc==7.5.0)

git clone https://github.com/dvlab-research/SphereFormer.git --recursive
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch_scatter==2.0.9
pip install torch_geometric==1.7.2
pip install spconv-cu114==2.1.21
pip install torch_sparse==0.6.12 cumm-cu114==0.2.8 torch_cluster==1.5.9
pip install tensorboard timm termcolor tensorboardX

Install sptr

cd third_party/SparseTransformer && python setup.py install

Note: Make sure you have installed gcc and cuda, and nvcc can work (if you install cuda by conda, it won't provide nvcc and you should install cuda manually.)

Datasets Preparation

nuScenes

Download the nuScenes dataset from here. Unzip and arrange it as follows. Then fill in the data_root entry in the .yaml configuration file.

nuscenes/
|--- v1.0-trainval/
|--- samples/
|------- LIDAR_TOP/
|--- lidarseg/
|------- v1.0-trainval/

Then, fill in the data_path and save_dir in data/nuscenes_preprocess_infos.py, then generate the infos by

pip install nuscenes-devkit pyquaternion
cd data && python nuscenes_preprocess_infos.py

SemanticKITTI

Download the SemanticKIITI dataset from here. Unzip and arrange it as follows. Then fill in the data_root entry in the .yaml configuration file.

dataset/
|--- sequences/
|------- 00/
|------- 01/
|------- 02/
|------- 03/
|------- .../

Waymo Open Dataset

Download the Waymo Open Dataset from here. Unzip and arrange it as follows. Then fill in the data_root entry in the .yaml configuration file.

waymo/
|--- training/
|--- validation/
|--- testing/

Then, transfer the raw files into the format of SemanticKITTI as follows. (Note: do not use GPU here, and CPU works well already)

cd data/waymo_to_semanticKITTI
CUDA_VISIBLE_DEVICES="" python convert.py --load_dir [YOUR_DATA_ROOT] --save_dir [YOUR_SAVE_ROOT]

Training

nuScenes

python train.py --config config/nuscenes/nuscenes_unet32_spherical_transformer.yaml

SemanticKITTI

python train.py --config config/semantic_kitti/semantic_kitti_unet32_spherical_transformer.yaml

Waymo Open Dataset

python train.py --config config/waymo/waymo_unet32_spherical_transformer.yaml

Validation

For validation, you need to modify the .yaml config file. (1) fill in the weight with the path of model weight (.pth file); (2) set val to True; (3) for testing-time augmentation, set use_tta to True and set vote_num accordingly. After that, run the following command.

python train.py --config [YOUR_CONFIG_PATH]

Pre-trained Models

dataset	Val mIoU (tta)	Val mIoU	mIoU_close	mIoU_medium	mIoU_distant	Download
nuScenes	79.5	78.4	80.8	60.8	30.4	Model Weight
SemanticKITTI	69.0	67.8	68.6	60.4	17.8	Model Weight
Waymo Open Dataset	70.8	69.9	70.3	68.6	61.9	N/A

Note: Pre-trained weights on Waymo Open Dataset are not released due to the regulations.

SpTr Library

The SpTr library is highly recommended for sparse transformer, particularly for 3D point cloud attention. It is fast, memory-efficient and easy-to-use. The github repository is https://github.com/dvlab-research/SparseTransformer.git.

Citation

If you find this project useful, please consider citing:

@inproceedings{lai2023spherical,
  title={Spherical Transformer for LiDAR-based 3D Recognition},
  author={Lai, Xin and Chen, Yukang and Lu, Fanbin and Liu, Jianhui and Jia, Jiaya},
  booktitle={CVPR},
  year={2023}
}

Our Works on 3D Point Cloud

Spherical Transformer for LiDAR-based 3D Recognition (CVPR 2023) [Paper] [Code] : A plug-and-play transformer module that boosts performance for distant region (for 3D LiDAR point cloud)
Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022): [Paper] [Code] : Point-based window transformer for 3D point cloud segmentation
SparseTransformer (SpTr) Library [Code] : A fast, memory-efficient, and easy-to-use library for sparse transformer with varying token numbers.

Open Source Agenda is not affiliated with "SphereFormer" Project. README Source: dvlab-research/SphereFormer

Stars

278

Open Issues

Last Commit

11 months ago

Repository

dvlab-research/SphereFormer

License

Apache-2.0

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/sphereformer"><img src="https://www.opensourceagenda.com/projects/sphereformer/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022