Real time semantic slam in ROS with a hand held RGB-D camera
Author: Xuan Zhang, Supervisor: David Filliat
Research internship @ENSTA ParisTech
Semantic SLAM can generate a 3D voxel based semantic map using only a hand held RGB-D camera (e.g. Asus xtion) in real time. We use ORB_SLAM2 as SLAM backend, a CNN (PSPNet) to produce semantic prediction and fuse semantic information into a octomap. Note that our system can also be configured to generate rgb octomap without semantic information.
Semantic octomap | RGB octomap |
---|---|
This work cannot be done without many open source projets. Special thanks to
This project is released under a GPLv3 license.
sudo apt-get install ros-kinetic-openni2-launch
We use ORB_SLAM2 as SLAM backend. Please refer to the official repo for installation dependencies.
PyTorch 0.4.0
For other dependencies, please see semantic_slam/package.xml
After installing dependencies for ORB_SLAM. You should first build the library.
cd ORB_SLAM2
./build.sh
rosdep install semantic_slam
cd <your_catkin_work_space>
catkin_make
roslaunch semantic_slam camera.launch
roslaunch semantic_slam slam.launch
When the slam system has finished initialization, try to move the camera and check if the camera trajectory in the viewer is reasonable, reset SLAM if not.
You can now run the semantic_cloud node and the octomap_generator node. You will have to provide trained models, see links below.
roslaunch semantic_slam semantic_mapping.launch
This will also launch rviz for visualization.
You can then move around the camera and construct semantic map. Make sure SLAM is not losing itself.
If you are constructing a semantic map, you can toggle the display color between semantic color and rgb color by running
rosservice call toggle_use_semantic_color
If you want to test semantic mapping without a camera, you can also run a rosbag.
roslaunch semantic_slam semantic mapping.launch
rosbag play demo.bag
Evaluation is done on a computer with 6 Xeon 1.7 GHz CPU and one GeForce GTX Titan Z GPU. Input image size is 640×480 recorded by a camera Asus Xtion Pro.
When our system works together, ORB-SLAM works at about 15 Hz (the setting is 30 Hz). Point cloud generation alone can run at 30 Hz. Semantic segmentation runs at about 2 to 3 Hz. Octomap insertion and visualization works at about 1 Hz. Please refer to section 3.6.2 of the project report for more analysis of run times.
To cite this project in your research:
@misc{semantic_slam,
author = {Xuan, Zhang and David, Filliat},
title = {Real-time voxel based 3D semantic mapping with a hand held RGB-D camera},
year = {2018},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/floatlazer/semantic_slam}},
}
You can change parameters for launch. Parameters are in ./semantic_slam/params
folder.
Note that you can set octomap/tree_type and semantic_cloud/point_type to 0 to generate a map with rgb color without doing semantic segmantation.
namespace octomap
namespace camera
namespace semantic_pcl