People detection and tracking in stationary RGB cameras
This is a research project developed during my Master studies in University of Hamburg 2017.
The Target is to detect and track any moving human in RGB videos captured by a single stationary camera.
The approach splits into two main phases:
I implemented two approaches for detecting people, the first is by using background subtraction and supporting it by a neural network trained to classify humans, I retrained Inception v3 model via tensorflow to classify humans vs. non-humans (the model is included in the project).
The second detection method is Yolo which gave very good detection results.
I used Unscented Kalman filter to keep track of the dynamics of the motion of each detected human, and used the Hungarian algorithm to solve the assignment problem.
I used the tracking submodule from Smorodov's Multitarget-tracker but I modified the state change function and the initialization of the initial state.
Background Subtraction approach (without the classifier) is pretty fast and suitable for real-time applications but the problem is that it is very sensitive to any change in illumination and consequently produce lots of false positives and wrong detections, which make it a non-robust approach for most real-world applications. Even if an adaptive variant of the algorithm is used the detections propsed by it are still not very accurate.
Yolo on the other hand gives very good detections for humans (in the 2 datasets I tested) and is robust to the change in lighting conditions, however it is very slow if you are to run it on a CPU (takes from 6 to 20 seconds per image depending on the CPU!). You can run it on an expensive GPU and get the image done in less than a second, but that might not always be feasible.
Linear Kalman filter performs very bad in tracking the motion of humans since people's motion is highly nonlinear, Unscented kalman filter is better suited for this case.
Kalman filter is good for keeping track of occluded persons since it keeps predicting their current position -based on their previous dynamics- even when thier detections disappear for some frames.
Most importantly Solving the assignment problem by using only the Euclidean distance (Hungarian algorithm) is not efficient and usually leads to mixing the tracking ids when occlusions happen since the distance will be nearly identical for the two colliding objects.
The main bottleneck in the tracking problem is solving the assignment problem, as a poor solution usually leads to the mixing of the tracks. Therfore considering color information is important and logical to reduce this probelm. A good approach in my opinion should combine both color features and euclidean distance information.
Project Usage:
data/yolo/data/
in PedestrianTracking project.config.cfg
file in the project root directory with your own paths and desired parameters.data/inception/model/
subdirectory. Otherwise you can set useClassifier
entry to false in config.cfg
and not use it.importDetectionsFromFiles
entry to true in config.cfg
and set detectionsCoordsImportDir
entry to the path of the desired dataset/method results under the subdirectory data/detections/
.Dependencies:
Third-party software
GNU GPLv3: http://www.gnu.org/licenses/gpl-3.0.txt