Project 4 for Metis bootcamp. Objective was generation of character-level RNN trained on Donald Trump's statements using Keras. Also generated Markov chains, and quick pyTorch RNN as baseline. Attempted semi-supervised GAN, but was unable to test in time.
This respository trains and executes a character-level recurrent neural network using GRUs on President Donald Trump's tweets and official recorded statements and speeches. Equivalent Markov chains are also trained, to limited effect.
The primary entry points for this repository are main.py
and markov.py
. They may be accessed
by the command python [filename]
.
Running main.py
as-is will load and transform both tweets and statements, training 6 and dumping
RNN models each for a total of 12 models. This is extraordinarily computationally expensive.
Running markov.py
will generate frequency tables and sentences given a seed. Frequency table
generation is an expensive task, and may take hours.
Pre-trained models are stored in /data/models/
in h5py
format and may be used directly.
Core Functionality
keras
sklearn
pandas
numpy
unidecode
EDA Functionality (topic modeling, visualizations, etc.)
gensim
yellowbrick
bin
- Source code files
pytorch
- pyTorch implementation of GRU, unoptimized for CUDA or GPUdata
clean
- Cleaned data, stored in serialized formatmodels
- Trained char-RNNs, stored in h5py
formatmarkov
- Markov chain frequency tables and sample outputdocs
- Slides and supporting documentation for projectFeel free to contact me with feedback or questions.
Email
- lzhou95
at gmail
.com
LinkedIn
- zhouleon
Medium
- @confusionmatrix