Creating a software for automatic monitoring in online proctoring
Project to create an automated proctoring system where the user can be monitored automatically through the webcam and microphone. The project is divided into two parts: vision and audio based functionalities. An explanation of some functionalities of the project can be found on my medium article.
To run the programs in this repo, do the following:
python -m venv venv
cd ./venv/Scripts/activate
(windows users)source ./venv/bin/activate
(mac and linux users)pip install --upgrade pip
(to upgrade pip)pip install -r requirements.txt
Once the requirements have been installed, The programs will run successfully.
Except for the person_and_phone.py
script which requires a model to be downloaded.
More on that later.
For vision:
Tensorflow>2
OpenCV
sklearn=0.19.1 # for face spoofing.
The model used was trained with this version and does not support recent ones.
For audio:
pyaudio
speech_recognition
nltk
It has six vision based functionalities right now:
Earlier, Dlib's frontal face HOG detector was used to find faces. However, it did not give very good results. In face_detection different face detection models are compared and OpenCV's DNN module provides best result and the results are present in this article.
It is implemented in face_detector.py
and is used for tracking eyes, mouth opening detection, head pose estimation, and face spoofing.
An additional quantized model is also added for face detector as described in Issue 14. This can be used by setting the parameter quantized
as True when calling the get_face_detector()
. On quick testing of face detector on my laptop the normal version gave ~17.5 FPS while the quantized version gave ~19.5 FPS. This would be especially useful when deploying on edge devices due to it being uint8 quantized.
Earlier, Dlib's facial landmarks model was used but it did not give good results when face was at an angle. Now, a model provided in this repository is used. A comparison between them and the reason for choosing the new Tensorflow based model is shown in this article.
It is implemented in face_landmarks.py
and is used for tracking eyes, mouth opening detection, and head pose estimation.
If you want to use dlib models then checkout the old-master branch.
eye_tracker.py
is to track eyes. A detailed explanation is provided in this article. However, it was written using dlib.
mouth_opening_detector.py
is used to check if the candidate opens his/her mouth during the exam after recording it initially. It's explanation can be found in the main article, however, it is using dlib which can be easily changed to the new models.
person_and_phone.py
is for counting persons and detecting mobile phones. YOLOv3 is used in Tensorflow 2 and it is explained in this article for more details.
head_pose_estimation.py
is used for finding where the head is facing. An explanation is provided in this article
face_spoofing.py
is used for finding whether the face is real or a photograph or image. An explanation is provided in this article. The model and working is taken from this Github repo.
Functionality | On Intel i5 |
---|---|
Eye Tracking | 7.1 |
Mouth Detection | 7.2 |
Person and Phone Detection | 1.3 |
Head Pose Estimation | 8.5 |
Face Spoofing | 6.9 |
If you testing on a different processor a GPU consider making a pull request to add the FPS obtained on that processor.
It is divided into two parts:
The code for this part is available in audio_part.py
Speech to text conversion which might not work well for all dialects.
If you have any other ideas or do any step of to do consider making a pull request . Please update the README as well in the pull request.
This project is licensed under the MIT License - see the LICENSE.md file for details. However, the facial landmarks detection model is trained on non-commercial use datasets so I am not sure if that is allowed to be used for commercial purposes or not.