VINS Fusion GPU BA Save

A CUDA reimplementation of Bundle Adjustment for VINS-Fusion

Project README

VINS-Fusion-GPU-BA

This repository is a version of VINS-Fusion with a CUDA reimplementation of Bundle Adjustment.

Modifications are as follow :

The codes of Bundle Adjustment reimplementation are in vins_estimator/src/cuda_bundle_adjustment.
Estimator::optimization() in estimator.cpp is rewritten as follow :

void Estimator::optimization() {
    if(frame_count == WINDOW_SIZE) {
        optimization_with_cuda();  // solve and marginalize with cuda when the sliding window is full
    } else {
        optimization_with_ceres(); // solve with ceres when the sliding window is not yet full
    }
}

Dependencies

The essential software environment is as same as VINS-Fusion, tested on Ubuntu 18.04 & Ros Melodic.

While the Bundle Adjustment in estimator.cpp is reimplemented with CUDA and Eigen, this repository still requires ceres solver for non-linear optimization for :

Bundle Adjustment in estimator.cpp before frame_count turn to WINDOW_SIZE
GlobalSFM::construct() in initial_sfm.cpp
PoseGraph::optimize4DoF() in pose_graph.cpp.

Meanwhile, the CUDA reimplementation of Bundle Adjustment in estimator.cpp requires :

C++14
CUDA (>= 11.0)
CUBLAS
CUSOLVER
Eigen (>= 3.3.9)

How To Build

Before build this repo, some CMAKE variables in vins_estimator/src/cuda_bundle_adjustment/CMakeLists.txt need to be modified to fit your enviroment :

set(CMAKE_CUDA_COMPILER  /usr/local/cuda/bin/nvcc)        # set it to your path to nvcc
set(CUDA_TOOLKIT_ROOT_DIR  /usr/local/cuda/bin/nvcc)      # set it to your path to nvcc
set(CMAKE_CUDA_ARCHITECTURES  52)    # for example, if your device's compute capability is 6.2, then set it to 62

If your device's compute capability is >= 6.0, you can just change MyAtomicAdd() in vins_estimator/src/cuda_bundle_adjustment/cuda_kernel_funcs/device_utils.cu into the following :

template<typename T>
__device__ T MyAtomicAdd(T* address, T val) { return atomicAdd(address, val); }

Or, you can just replace MyAtomicAdd() with atomicAdd() wherever MyAtomicAdd() is called.

The basic steps to compile and run this repo is as same as VINS-Fusion.

Speed-up

Sequence	CPU (Intel I7-6700K)			GPU (Nvidia 980TI)
Sequence	iterations	solve	marginalization	solve (at least 10 iterations) & marginalization
MH_01_easy WINDOW_SIZE == 10 max feature count == 150	5.73 no RVIZ 5.73 with RVIZ	61.17ms no RVIZ 74.74ms with RVIZ	12.25ms no RVIZ 21.21ms with RVIZ	29.38ms no RVIZ 37.39ms with RVIZ
MH_05_difficult WINDOW_SIZE == 10 max feature count == 150	6.46 no RVIZ 6.46 with RVIZ	63.17ms no RVIZ 74.36ms with RVIZ	9.73ms no RVIZ 17.71ms with RVIZ	27.23ms no RVIZ 35.49ms with RVIZ
2011_10_03_drive_0027_sync WINDOW_SIZE == 10 max feature count == 200	4.71 no RVIZ 4.70 with RVIZ	18.79ms no RVIZ 19.81ms with RVIZ	6.07ms no RVIZ 7.18ms with RVIZ	19.98ms no RVIZ 22.27ms with RVIZ
MH_01_easy WINDOW_SIZE == 20 max feature count == 300	7.13 no RVIZ 7.13 with RVIZ	126.09ms no RVIZ 163.89ms with RVIZ	20.47ms no RVIZ 28.88ms with RVIZ	51.56ms no RVIZ 65.35ms with RVIZ
MH_05_difficult WINDOW_SIZE == 20 max feature count == 300	6.61 no RVIZ 6.61 with RVIZ	103.50ms no RVIZ 137.44ms with RVIZ	15.66ms no RVIZ 23.88ms with RVIZ	45.66ms no RVIZ 58.60ms with RVIZ
2011_10_03_drive_0027_sync WINDOW_SIZE == 20 max feature count == 400	4.94 no RVIZ 4.93 with RVIZ	44.14ms no RVIZ 46.91ms with RVIZ	9.22ms no RVIZ 11.99ms with RVIZ	34.18ms no RVIZ 38.40ms with RVIZ

Since the theoretical FP64 performance of Nvidia 980TI GPU (compute capability = 5.2) is only 189.4 GFLOPS, and atomicAdd() for FP64 is not available on devices with compute capability lower than 6.0, expecting a better speed-up with more recent hardwares is plausible.

P.S. It seems that RVIZ will largely slow down the speed of this reimplementation.

Precision

MH_01_easy (WINDOW_SIZE == 10, graphs are generated by evo) :

drawing

drawing drawing drawing

drawing drawing

KITTI 2011_10_03_drive_0027_sync :

WINDOW_SIZE is 10
green path is generated by VO (estimator.cpp)
blue path is generated by fusing VO with GPS (globalOpt.cpp)

drawing

Some Implementation Details

Use Levenberg-Marquart to solve delta, at least 10 iterations.

Since the bottom right part (which corresponds to inverse depths) of the big hessian matrix is diagonal, a schur complement trick is used to solve the system states before solving the inverse depths

All jacobians, residuals, robust info matrices, hessian blocks, rhs blocks are computed on GPU.

No explicit big jacobian matrix and big residual vector is formed.

The big hessian matrix and rhs vector are formed this way :

Multiply tiny jacobian blocks and residual blocks to form tiny hessian blocks and rhs blocks inside kernel functions.
Add tiny hessian blocks and rhs blocks to the big hessian and rhs by calling atomicAdd() inside kernel functions.

In the original implementation with Ceres, a eigen decomposition is done in the process of marginalization to form a MarginalizationFactor for the next frame. This is because ceres-1.14 has to use jacobians to form hessians, instead of just taking formed hessians from API. Ceres-1.14 will also evaluates the error of MarginalizationFactor at new linearization points as part of the iteration strategy. However, in this reimplemetation, we don't use ceres and we ignore the error of MarginalizationFactor (we still use new linearization points to update hessian prior and rhs prior), so we don't need the eigen decomposition of the hessian prior (which usually take about 5~6ms on my Nvidia 980TI GPU) in the process of marginalization.

Limitations

Currently only support scenarios where the following conditions are all satisfied :

STEREO == true
ESTIMATE_TD == false
ESTIMATE_EXTRINSIC == false

Data type must be FP64. A version with FP32 is also implemented, however, it drifts away.

Acknowledgements

This repository is based upon VINS-Fusion. Also, many thanks to VINS-Course for its step-by-step demonstrations of how Bundle Adjustment works.

Star History

Open Source Agenda is not affiliated with "VINS Fusion GPU BA" Project. README Source: qdLMF/VINS-Fusion-GPU-BA

Stars

Open Issues

Last Commit

3 months ago

Repository

qdLMF/VINS-Fusion-GPU-BA

License

GPL-3.0

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/vins-fusion-gpu-ba"><img src="https://www.opensourceagenda.com/projects/vins-fusion-gpu-ba/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022