ICRA 2018 "Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image" (Torch Implementation)
This repo implements the training and testing of deep regression neural networks for "Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image" by Fangchang Ma and Sertac Karaman at MIT. A video demonstration is available on YouTube. This repo offers the original implementation of the paper in Torch. The PyTorch version can be found here.
This repo can be used for training and testing of
See the installation instructions for a step-by-step guide.
uarocks install nn
uarocks install cunn
uarocks install cudnn
uarocks install optnet
udo apt-get update
udo apt-get install -y libhdf5-serial-dev hdf5-tools
it clone https://github.com/davek44/torch-hdf5.git
d torch-hdf5
uarocks make
d ..
data
folder. The downloading process might take an hour or so. The NYU dataset requires 32G of storage space, and KITTI requires 81G.
d data
get http://datasets.lids.mit.edu/sparse-to-dense/data/kitti.tar.gz
ar -xvf kitti.tar.gz && rm -f kitti.tar.gz
get http://datasets.lids.mit.edu/sparse-to-dense/data/nyudepthv2.tar.gz
ar -xvf nyudepthv2.tar.gz && rm -f nyudepthv2.tar.gz
d ..
pretrained
folder.
d pretrained
get https://d2j0dndfm35trm.cloudfront.net/resnet-50.t7
get https://d2j0dndfm35trm.cloudfront.net/resnet-18.t7
d ..
The training scripts come with several options, which can be listed with the --help
flag.
th main.lua --help
To run the training, simply run main.lua. By default, the script runs the RGB-based prediction network on NYU-Depth-V2 with 1 GPU and 2 data-loader threads without using pretrained weights.
th main.lua
To train networks with different datasets, input modalities, loss functions, and components, see the example below:
th main.lua -dataset kitti -inputType rgbd -nSample 100 -criterion l1 -encoderType conv -decoderType upproj -pretrain true
Training results will be saved under the results
folder.
Parameter | Options | Remarks |
---|---|---|
datasets | nyudepthv2, kitti | |
inputType | rgb, rgbd, d, g, gd | d:sparse depth only; g: grayscale |
nSample | non-negative integer (0 for rgb and g) | |
criterion | l1, l2, berhu | |
pretrain | false, true | |
rep | linear, log, inverse | representation of input depth |
encoderType | conv, depthsep, channeldrop | depthsep: depthwise separable convolution |
decoderType | upproj, upconv, deconv2, deconv3 | deconv_n: transposed convolution with kernel size n-by-n |
To test the performance of a trained model, simply run main.lua with the -testOnly true
option, along with other model options. For instance,
th main.lua -testOnly true -dataset kitti -inputType rgbd -nSample 100 -criterion l1 -encoderType conv -decoderType upproj -pretrain true
Download our trained models at http://datasets.lids.mit.edu/sparse-to-dense/results/ to the results
folder. For instance,
cd results
wget -r -np -nH --cut-dirs=2 --reject "index.html*" http://datasets.lids.mit.edu/sparse-to-dense/results/nyudepthv2.input=rgbd.nsample=200.rep=linear.encoder=conv.decoder=upproj.criterion=l1.lr=0.01.bs=16.pretrained=true/
cd ..
More trained models will be released.
Error metrics on NYU Depth v2:
RGB | rms | rel | delta1 | delta2 | delta3 |
---|---|---|---|---|---|
Roy & Todorovic (CVPR 2016) | 0.744 | 0.187 | - | - | - |
Eigen & Fergus (ICCV 2015) | 0.641 | 0.158 | 76.9 | 95.0 | 98.8 |
Laina et al (3DV 2016) | 0.573 | 0.127 | 81.1 | 95.3 | 98.8 |
Ours-RGB | 0.514 | 0.143 | 81.0 | 95.9 | 98.9 |
RGBd-#samples | rms | rel | delta1 | delta2 | delta3 |
---|---|---|---|---|---|
Liao et al (ICRA 2017)-225 | 0.442 | 0.104 | 87.8 | 96.4 | 98.9 |
Ours-20 | 0.351 | 0.078 | 92.8 | 98.4 | 99.6 |
Ours-50 | 0.281 | 0.059 | 95.5 | 99.0 | 99.7 |
Ours-200 | 0.230 | 0.044 | 97.1 | 99.4 | 99.8 |
Error metrics on KITTI dataset:
RGB | rms | rel | delta1 | delta2 | delta3 |
---|---|---|---|---|---|
Make3D | 8.734 | 0.280 | 60.1 | 82.0 | 92.6 |
Mancini et al (IROS 2016) | 7.508 | - | 31.8 | 61.7 | 81.3 |
Eigen et al (NIPS 2014) | 7.156 | 0.190 | 69.2 | 89.9 | 96.7 |
Ours-RGB | 6.266 | 0.208 | 59.1 | 90.0 | 96.2 |
RGBd-#samples | rms | rel | delta1 | delta2 | delta3 |
---|---|---|---|---|---|
Cadena et al (RSS 2016)-650 | 7.14 | 0.179 | 70.9 | 88.8 | 95.6 |
Ours-50 | 4.884 | 0.109 | 87.1 | 95.2 | 97.9 |
Liao et al (ICRA 2017)-225 | 4.50 | 0.113 | 87.4 | 96.0 | 98.4 |
Ours-100 | 4.303 | 0.095 | 90.0 | 96.3 | 98.3 |
Ours-200 | 3.851 | 0.083 | 91.9 | 97.0 | 98.6 |
Ours-500 | 3.378 | 0.073 | 93.5 | 97.6 | 98.9 |
Note: our networks are trained on the KITTI odometry dataset, using only sparse labels from laser measurements.
If you use our code or method in your work, please consider citing the following:
@article{Ma2017SparseToDense,
title={Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image},
author={Ma, Fangchang and Karaman, Sertac},
booktitle={ICRA},
year={2018}
}
@article{ma2018self,
title={Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera},
author={Ma, Fangchang and Cavalheiro, Guilherme Venturelli and Karaman, Sertac},
journal={arXiv preprint arXiv:1807.00275},
year={2018}
}
Please direct any questions to Fangchang Ma at [email protected].