Monocular depth estimation using Feature Pyramid Network implemented in PyTorch 1.1.0
Monocular depth estimation using Feature Pyramid Network implemented in PyTorch 1.1.0
model-test
.nyuv2.rar
to current directory.fyn_model.pt
and save to current directory.test.ipynb
.Note: the model I provide was trained only on a dataset containing 1k images of scenes in basements. The purpose of this model was only to test whether the network architecture works for depth estimation. The test dataset provided in model-test
folder also contains only images of basements.
As the dataset is very large, I didn't provide them directly in this repository. However, you can download the depth dataset and load them with your own code.
Modify the code of the data loading part in fyn_main.py
first to ensure that you can load the dataset correctly. In my original code, I store the RGB and depth images in two folders and have a pickle file to relate them, both for train dataset and test dataset.
process_dataset.py
to process the data you download. Again, after you have processed the data, you need to check the data loading part to see if you can load the dataset correctly.Width x Height x 3
RGB image as input and ouputs a 1/4 Width x 1/4 Height x 1
grayscale image.After you make sure the dataset can be loaded correctly, you can run fyn_main.py
to start training.
We use the NYU Depth V2 Dataset for training and testing. The RGB image and the depth image in the dataset are both of size 640x480. During training, the RGB image is loaded as 640x480 and the depth image is loaded and then resized to 160x120. The input of the network is a RGB image with size 640x480, and the output is a grayscale image of size 160x120, which is the depth map we need.
NyuV2 dataset contains images of indoor scenes, which is a limitation for depth estimation on other scenes. You can check other dataset to include more scenes in your model.
We employed a self-defined loss function in our model -- the Gradient Loss:
The gradient of depth maps is obtained by a Sobel filter; the gradient loss is the L1 norm of the difference.
Here are some results on test dataset that contains scenes of basements: