Classify movie posters by genre
A simple demo / tutorial / experiment / portfolio project for me to better understand the concepts of Machine Learning.
Use Convolutional Neural Network (CNN) to classify movies posters by genre. It is a multi-label classification problem (movies can belong to multiple genres). Each instance (movie poster) has an independent probability to belong to each label (genre).
The implementation is based on Keras and TensorFlow.
With 14,265 train samples and 2,826 validation samples (movies from 1977 to 2017), 106x161 images and after 50 epochs, the results look like this ([!] indicates the predicted genre is not found in the original dataset):
The Matrix (1999) ['Action: 91%', 'Drama[!]: 25%', 'Adventure[!]: 13%']
The Others (2001) ['Drama[!]: 76%', 'Horror: 65%', 'Action[!]: 41%']
Alien: Resurrection (1997) ['Horror: 67%', 'Action: 64%', 'Drama[!]: 43%']
The Martian (2015) ['Drama: 95%', 'Adventure: 81%', 'Comedy[!]: 23%']
The Truman Show (1998) ['Comedy: 98%', 'Drama: 76%', 'Romance[!]: 7%']
Pretty Woman (1990) ['Romance: 99%', 'Comedy: 99%', 'Drama[!]: 22%']
Whatever Works (2009) ['Drama[!]: 86%', 'Comedy: 78%', 'Romance: 76%']
Bienvenue chez les C.. (2008) ['Comedy: 98%', 'Romance: 98%', 'Drama[!]: 7%']
Paprika (2006) ['Animation: 66%', 'Comedy[!]: 58%', 'Adventure: 31%']
Spirited Away (2001) ['Animation: 83%', 'Drama[!]: 57%', 'Adventure: 42%']
Castle in the Sky (1986) ['Animation: 88%', 'Adventure: 78%', 'Comedy[!]: 30%']
Zootopia (2016) ['Animation: 62%', 'Adventure: 59%', 'Comedy: 49%']
Overall accuracy is 45% (I'm actually not sure it's the most suited metrics for this).
The dataset was found on Kaggle and contains about 27,000 posters.
It is split as followed:
Module movies_dataset.py
provides functions to access the dataset easily (parse MovieGenres.csv, list movies,
get movie genres, get poster, etc).
-download
to download the posters from Amazon (based on the URLs provided in MovieGenre.csv)-resize
to create smaller posters (30%, 40%, etc)-min_year=1980
to filter out the oldest moviespython3 get_data.py -download -resize
This script builds and trains models. Models are saved to 'saved_models'. One or multiple models (with different parameters) can be produced.
python3 __main__.py
This script iterates through all the saved models in 'saved_models' and evaluates them on the test data.
python3 tests.py
Use Deep Convolutional Generative Adversarial Networks (DCGAN) to generate movie posters:
1. Download the forked DCGAN-tensorflow.
git clone https://github.com/benckx/DCGAN-tensorflow.git
2. Prepare dataset with the parameters you want (git clone this project and download posters first if you didn't):
python3 prepare_dcgan_dataset.py -min_year=1980 -exclude_genres=Animation,Comedy,Family -ratio=60
This will create a folder 'dcgan_movies_posters' with all the posters selected from the parameters values.
3. Move folder 'dcgan_movies_posters' to DCGAN-tensorflow/data/dcgan_movies_posters
4. In DCGAN-tensorflow, run the command with the parameters you need (the parameters I added or removed are documented here):
python3 main.py --dataset dcgan_movies_posters --grid_height=6 --grid_width=10 -sample_rate=2 --train
AWS EC2:
source activate tensorflow_p36
to activate the correct Anaconda environment.A few things I'm currently working on or thinking about: