Sign Language Gesture Recognition From Video Sequences Using RNN And CNN
SIGN LANGUAGE GESTURE RECOGNITION FROM VIDEO SEQUENCES USING RNN AND CNN
The Paper on this work is published here
Please do cite it if you find this project useful. :)
UPDATE:
operations
in the inception model.pip install tensorflow
pip install tflearn
Create two folders with any name say train_videos and test_videos in the project root directory. It should contain folders corresponding to each cateogry, each folder containing corresponding videos.
For example:
train_videos
├── Accept
│ ├── 050_003_001.mp4
│ ├── 050_003_002.mp4
│ ├── 050_003_003.mp4
│ └── 050_003_004.mp4
├── Appear
│ ├── 053_003_001.mp4
│ ├── 053_003_002.mp4
│ ├── 053_003_003.mp4
│ └── 053_003_004.mp4
├── Argentina
│ ├── 024_003_001.mp4
│ ├── 024_003_002.mp4
│ ├── 024_003_003.mp4
│ └── 024_003_004.mp4
└── Away
├── 013_003_001.mp4
├── 013_003_002.mp4
├── 013_003_003.mp4
└── 013_003_004.mp4
usage:
video-to-frame.py [-h] gesture_folder target_folder
Extract frames from gesture videos.
positional arguments:
gesture_folder: Path to folder containing folders of videos of different
gestures.
target_folder: Path to folder where extracted frames should be kept.
optional arguments:
-h, --help show the help message and exit
The code involves some hand segmentation (based on the data we used) for each frame. (You can remove that code if you are working on some other data set)
python3 "video-to-frame.py" train_videos train_frames
Extract frames from gestures in train_videos
to train_frames
.
python3 "video-to-frame.py" test_videos test_frames
Extract frames from gestures in test_videos
to test_frames
.
Download retrain.py.
curl -LO https://github.com/tensorflow/hub/raw/master/examples/image_retraining/retrain.py
Note: This link may change in the future. Please refer Tensorflow retrain tutorial
Run the following command to retrain the inception model.
python3 retrain.py --bottleneck_dir=bottlenecks --summaries_dir=training_summaries/long --output_graph=retrained_graph.pb --output_labels=retrained_labels.txt --image_dir=train_frames
This will create two file retrained_labels.txt
and retrained_graph.pb
For more information about the above command refer here.
usage:
predict_spatial.py [-h] [--input_layer INPUT_LAYER]
[--output_layer OUTPUT_LAYER] [--test]
[--batch_size BATCH_SIZE]
graph frames_folder
positional arguments:
- graph graph/model to be executed
- frames_folder Path to folder containing folders of frames of
different gestures.
optional arguments:
-h, --help show this help message and exit
--input_layer INPUT_LAYER
name of input layer
--output_layer OUTPUT_LAYER
name of output layer
--test passed if frames_folder belongs to test_data
--batch_size BATCH_SIZE
batch Size
Each Video is represented by a sequence of n
dimensional vectors (probability distribution or output of softmax) one for each frame. Here n
is the number of classes.
On Training Data
python3 predict_spatial.py retrained_graph.pb train_frames --batch=100
This will create a file predicted-frames-final_result-train.pkl
that will be used by RNN.
On Test Data
python3 predict_spatial.py retrained_graph.pb test_frames --batch=100 --test
This will create a file predicted-frames-final_result-test.pkl
that will be used by RNN.
Each Video represented by a sequence of 2048 dimensional vectors (output of last Pool Layer) one for each frame
On Training Data
python3 predict_spatial.py retrained_graph.pb train_frames \
--output_layer="module_apply_default/InceptionV3/Logits/GlobalPool" \
--batch=100
This will create a file predicted-frames-GlobalPool-train.pkl
that will be used by RNN.
On Test Data
python3 predict_spatial.py retrained_graph.pb train_frames \
--output_layer="module_apply_default/InceptionV3/Logits/GlobalPool" \
--batch=100 \
--test
This will create a file predicted-frames-GlobalPool-test.pkl
that will be used by RNN.
usage
rnn_train.py [-h] [--label_file LABEL_FILE] [--batch_size BATCH_SIZE]
input_file_dump model_file
positional arguments
input_file_dump file containing intermediate representation of gestures from inception model
model_file Name of the model file to be dumped. Model file is
created inside a checkpoints folder
optional arguments
-h, --help show this help message and exit
--label_file LABEL_FILE
path to label file generated by inception, default='retrained_labels.txt'
--batch_size BATCH_SIZE
batch Size, default=32
python3 rnn_train.py predicted-frames-final_result-train.pkl non_pool.model
This will train the RNN model on the softmax based representation of gestures for 10 epochs and save the model with name non_pool.model
in a folder named checkpoints.
python3 rnn_train.py predicted-frames-GlobalPool-train.pkl pool.model
This will train the RNN model on the pool layer based representation of gestures for 10 epochs and save the model with name pool.model
in a folder named checkpoints.
usage
rnn_eval.py [-h] [--label_file LABEL_FILE] [--batch_size BATCH_SIZE]
input_file_dump model_file
positional arguments
input_file_dump file containing intermediate representation of gestures from inception model
model_file Name of the model file to be used for prediction.
optional arguments
-h, --help show this help message and exit
--label_file LABEL_FILE
path to label file generated by inception, default='retrained_labels.txt'
--batch_size BATCH_SIZE
batch Size, default=32
python3 rnn_eval.py predicted-frames-final_result-test.pkl non_pool.model
This will use the non_pool.model
to predict the labels of the softmax based representation of the test videos.
Predictions and corresponding gold labels for each test video will be dumped in to results.txt
python3 rnn_eval.py predicted-frames-GlobalPool-test.pkl pool.model
This will use the pool.model
to predict the labels of the pool layer based representation of the test videos.
Predictions and corresponding gold labels for each test video will be dumped in to results.txt
Happy Coding :)