LipReading Save

Speech Recognition without audio input

Project README


Main repository for LipReading with Deep Neural Networks


The goal is to implement LipReading: Similar to how end-to-end Speech Recognition systems work, mapping high-fidelity speech audio to sensible characters and word level outputs, we will do the same for "speech visuals". In particular, we will take video frame input, extract the relevant mouth/chin signals as input to map to characters and words.



A high level overview of some TODO items. For more project details please see the Github project

  • Download Data (926 videos)
  • Build Vision Pipeline (1 week) in review
  • Build NLP Pipeline (1 week) wip
  • Build Loss Fn and Training Pipeline (2 weeks) wip
  • Train :train: and Ship :ship: wip


There are two primary interconnected pipelines: a "vision" pipeline for extracting the face and lip features from video frames, along with a "nlp-inspired" pipeline for temporally correlating the sequential lip features into the final output.

Here's a quick dive into tensor dimensionalities

Vision Pipeline

Video -> Frames       -> Face Bounding Box Detection      -> Face Landmarking    
Repr. -> (n, y, x, c) -> (n, (box=1, y_i, x_i, w_i, h_i)) -> (n, (idx=68, y, x))   

NLP Pipeline

 -> Letters  ->  Words    -> Language Model 
 -> (chars,) ->  (words,) -> (sentences,)


  • all: 926 videos (projected, not generated yet)
  • large: 464 videos (failed at 35/464)
  • medium: 104 videos (currently at 37/104)
  • small: 23 videos
  • micro: 6 videos
  • nano: 1 video


  1. Clone this repository and install the requirements. We will be using python3.

Please make sure you run python scripts, setup your PYTHONPATH at ./, as well as a workspace env variable.

git clone [email protected]:joseph-zhong/LipReading.git 
# (optional, setup venv) cd LipReading; python3  -m venv .
  1. Once the repository is cloned, the last step for setup is to setup the repository's PYTHONPATH and workspace environment variable to take advantage of standardized directory utilities in ./src/utils/

Copy the following into your ~/.bashrc

export PYTHONPATH="$PYTHONPATH:/path/to/LipReading/" 
export LIP_READING_WS_PATH="/path/to/LipReading/"
  1. Install the simple requirements.txt with PyTorch with CTCLoss, SpaCy, and others.

On MacOS for CPU capabilities only.

pip3 install -r requirements.macos.txt

On Ubuntu, for GPU support

pip3 install -r requirements.ubuntu.txt

SpaCy Setup

We need to install a pre-built English model for some capabilities

python3 -m spacy download en

Data Directories Structure

This allows us to have a simple standardized directory structure for all our datasets, raw data, model weights, logs, etc.

  --/datasets (numpy dataset files for dataloaders to load)
  --/raw      (raw caption/video files extracted from online sources)
  --/weights  (model weights, both for training/checkpointing/running)
  --/tb       (Tensorboard logging)

See ./src/utils/ for more.

Getting Started

Now that the dependencies are all setup, we can finally do stuff!


Each of our "standard" scripts in ./src/scripts (i.e. not ./src/scripts/misc) take the standard argsparse-style arguments. For each of the "standard" scripts, you will be able to pass --help to see the expected arguments. To maintain reproducibility, cmdline arguments can be written in a raw text file with one argument per line.

e.g. for ./config/gen_dataview/nano


Represent the arguments to pass to ./src/scripts/, automatically passable via

./src/scripts/ $(cat ./config/gen_dataview/nano)

The arguments will be used from left-to-right order, so if arguments are repeated, they will be overwritten by the latter settings. This allows for modularity in configuring hyperparameters.

(For demonstration purposes, not a working example)

./src/scripts/ \
    $(cat ./config/dataset/large) \
    $(cat ./config/train/model/small-model) \
    $(cat ./config/train/model/rnn/lstm) \

Train Model

  1. Train Model


Training on Micro

./src/scripts/ $(cat ./config/train/micro)

Tensorboard Visualization


Other Resources

This is a collection of external links, papers, projects, and otherwise potentially helpful starting points for the project.

Other Projects

Other Academic Papers

Academic Datasets

Open Source Agenda is not affiliated with "LipReading" Project. README Source: joseph-zhong/LipReading
Open Issues
Last Commit
3 years ago

Open Source Agenda Badge

Open Source Agenda Rating