[CVPR2021 Oral] FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation.
This is the official source code for the CVPR2021 Oral work, FFB6D: A Full Flow Biderectional Fusion Network for 6D Pose Estimation. (Arxiv, Video_Bilibili, Video_YouTube)
FFB6D is a general framework for representation learning from a single RGBD image, and we applied it to the 6D pose estimation task by cascading downstream prediction headers for instance semantic segmentation and 3D keypoint voting prediction from PVN3D(Arxiv, Code, Video). At the representation learning stage of FFB6D, we build bidirectional fusion modules in the full flow of the two networks, where fusion is applied to each encoding and decoding layer. In this way, the two networks can leverage local and global complementary information from the other one to obtain better representations. Moreover, at the output representation stage, we designed a simple but effective 3D keypoints selection algorithm considering the texture and geometry information of objects, which simplifies keypoint localization for precise pose estimation.
Please cite FFB6D & PVN3D if you use this repository in your publications:
@InProceedings{He_2021_CVPR,
author = {He, Yisheng and Huang, Haibin and Fan, Haoqiang and Chen, Qifeng and Sun, Jian},
title = {FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021}
}
@InProceedings{He_2020_CVPR,
author = {He, Yisheng and Sun, Wei and Huang, Haibin and Liu, Jianran and Fan, Haoqiang and Sun, Jian},
title = {PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
See our demo video on YouTube or bilibili.
Install CUDA 10.1 / 10.2
Set up python3 environment from requirement.txt:
pip3 install -r requirement.txt
Install apex:
git clone https://github.com/NVIDIA/apex
cd apex
export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.5" # set the target architecture manually, suggested in issue https://github.com/NVIDIA/apex/issues/605#issuecomment-554453001
pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ..
Install normalSpeed, a fast and light-weight normal map estimator:
git clone https://github.com/hfutcgncas/normalSpeed.git
cd normalSpeed/normalSpeed
python3 setup.py install --user
cd ..
Install tkinter through sudo apt install python3-tk
Compile RandLA-Net operators:
cd ffb6d/models/RandLA/
sh compile_op.sh
LineMOD: Download the preprocessed LineMOD dataset from onedrive link or google drive link (refer from DenseFusion). Unzip it and link the unzipped Linemod_preprocessed/
to ffb6d/datasets/linemod/Linemod_preprocessed
:
ln -s path_to_unzipped_Linemod_preprocessed ffb6d/dataset/linemod/
Generate rendered and fused data following raster_triangle.
YCB-Video: Download the YCB-Video Dataset from PoseCNN. Unzip it and link the unzippedYCB_Video_Dataset
to ffb6d/datasets/ycb/YCB_Video_Dataset
:
ln -s path_to_unzipped_YCB_Video_Dataset ffb6d/datasets/ycb/
Train the model for the target object. Take object ape for example:
cd ffb6d
# commands in train_lm.sh
n_gpu=8
cls='ape'
python3 -m torch.distributed.launch --nproc_per_node=$n_gpu train_lm.py --gpus=$n_gpu --cls=$cls
The trained checkpoints are stored in train_log/linemod/checkpoints/{cls}/
, train_log/linemod/checkpoints/ape/
in this example.
A tip for saving GPU memory: you can open the mixed precision mode to save GPU memory by passing parameters opt_level=O1
to train_lm.py
. The document for apex mixed precision trainnig can be found here. If you use less than 8 GPU and the batch size is less than "3x8=24", it's recommended to use mixed precision trainning and increase the mini_batch_size
in common.py
as large as possible.
# commands in test_lm.sh
cls='ape'
tst_mdl="./linemod_pretrained/FFB6D_${cls}_best.pth.tar"
python3 -m torch.distributed.launch --nproc_per_node=1 train_lm.py --gpu '0' --cls $cls -eval_net -checkpoint $tst_mdl -test -test_pose # -debug
You can evaluate different checkpoint by revising tst_mdl
to the path of your target model.FFB6D_ape_best.pth.tar
to train_log/linemod/checkpoints/ape/
. Then revise tst_mdl=train_log/linemod/checkpoints/ape/FFB6D_ape_best.path.tar
for testing.# commands in demo_lm.sh
cls='ape'
tst_mdl=train_log/linemod/checkpoints/${cls}/FFB6D_${cls}_best.pth.tar
python3 -m demo -dataset linemod -checkpoint $tst_mdl -cls $cls -show
The visualization results will be stored in train_log/linemod/eval_results/{cls}/pose_vis
Start training on the YCB-Video Dataset by:
# commands in train_ycb.sh
n_gpu=8 # number of gpu to use
python3 -m torch.distributed.launch --nproc_per_node=$n_gpu train_ycb.py --gpus=$n_gpu
The trained model checkpoints are stored in train_log/ycb/checkpoints/
A tip for saving GPU memory: you can open the mixed precision mode to save GPU memory by passing parameters opt_level=O1
to train_ycb.py
. The document for apex mixed precision trainnig can be found here. If you use less than 8 GPU and the batch size is less than "3x8=24", it's recommended to use mixed precision trainning and increase the mini_batch_size
in common.py
as large as possible.
# commands in test_ycb.sh
tst_mdl=train_log/ycb/checkpoints/FFB6D_best.pth.tar # checkpoint to test.
python3 -m torch.distributed.launch --nproc_per_node=1 train_ycb.py --gpu '0' -eval_net -checkpoint $tst_mdl -test -test_pose # -debug
You can evaluate different checkpoints by revising the tst_mdl
to the path of your target model.train_log/ycb/checkpoints/
and modify tst_mdl
for testing.# commands in demo_ycb.sh
tst_mdl=train_log/ycb/checkpoints/FFB6D_best.pth.tar
python3 -m demo -checkpoint $tst_mdl -dataset ycb
The visualization results will be stored in train_log/ycb/eval_results/pose_vis
.Evaluation result without any post refinement on the YCB-Video dataset:
PoseCNN | PointFusion | DenseFusion | PVN3D | Our FFF6D | ||||||
---|---|---|---|---|---|---|---|---|---|---|
ADDS | ADD(S) | ADDS | ADD(S) | ADDS | ADD(S) | ADDS | ADD(S) | ADDS | ADD(S) | |
ALL | 75.8 | 59.9 | 83.9 | - | 91.2 | 82.9 | 95.5 | 91.8 | 96.6 | 92.7 |
Evaluation result on the LineMOD dataset:
RGB | RGB-D | |||||||
---|---|---|---|---|---|---|---|---|
PVNet | CDPN | DPOD | PointFusion | DenseFusion(iterative) | G2L-Net | PVN3D | FFF6D | |
MEAN | 86.3 | 89.9 | 95.2 | 73.7 | 94.3 | 98.7 | 99.4 | 99.7 |
Robustness upon occlusion:
Parameters | Network Forward | Pose Estimation | All time | |
---|---|---|---|---|
PVN3D | 39.2M | 170ms | 20ms | 190ms |
FFF6D |
33.8M | 57ms | 18ms | 75ms |
Install and generate required mesh info following DSTOOL_README.
Modify info of your new dataset in FFB6D/ffb6d/common.py
Write your dataset preprocess script following FFB6D/ffb6d/datasets/ycb/ycb_dataset.py
. Note that you should modify or call the function that get your model info, such as 3D keypoints, center points, and radius properly.
(Very Important!) Visualize and check if you process the data properly, eg, the projected keypoints and center point, the semantic label of each point, etc. For example, you can visualize the projected center point (red point) and selected keypoints (orange points) as follow by running python3 -m datasets.ycb.ycb_dataset
.
For inference, make sure that you load the 3D keypoints, center point, and radius of your objects in the object coordinate system properly in FFB6D/ffb6d/utils/pvn3d_eval_utils.py
.
Check that all setting are modified properly by using the ground truth information for evaluation. The result should be high and close to 100 if everything is correct. For example, testing ground truth on the YCB_Video dataset by passing -test_gt
parameters to train_ycb.py
will get results higher than 99.99:
tst_mdl=train_log/ycb/checkpoints/FFB6D_best.pth.tar
python3 -m torch.distributed.launch --nproc_per_node=1 train_ycb.py --gpu '0' -eval_net -checkpoint $tst_mdl -test -test_pose -test_gt
Licensed under the MIT License.