MovieLens recommendation system using reinforcement learning (GYM + PPO)
Table of contents
The purpose of this project was to experiment with the application
of deep reinforcement learning to recommendation systems.
More specifically, this project applies Stable-Baselines
algorithms to the MovieLens 100k
data set.
To that end, the goal of the agent is to predict what rating
a user
will give to a given movie
.
The simulator is set up as a POMDP problem, using OpenAI's Gym framework as the base class.
The reward scheme is based on prediction accuracy:
The observation is based on derived features from the MovieLens data set:
The Proximal Policy Optimization (PPO) algorithm is chosen to be the agent, since the recommendation problem is stateless (or single-state), thereby making a policy-base approach, opposed to a value-based, more appropriate.
Understanding that a given state does not impact future states in this problem, the $\gamma$ parameter is set to $0.0$ (gamma is used to discount future rewards).
An Multi-Layer Perceptron (MLP) is chosen for the agent's function approximator. Given the feature set is relatively small (~51 features), MLP with 2x layers of 64 neurons is sufficient and does not appear to lead to overfitting.
Two features provide the agent with an unfair advantage when making predictions:
When collecting these features, the averages are taken over the entire data set to prevent a $ValueError$ in Python, if a given movie_id or user_id has not been seen before. This conflict can be avoided with exception handling rules, but is out of scope for this experiment.
gym_recommendation/
data/ ...MovieLens 100k data set
envs/ ...MDP style environment extending GYM
tests/ ...test cases for utilities and GYM
utils.py ...helper functions for downloading data and evaluating the environment
ppo_experiment.py ...entry point for running experiments
requirements.txt ...project dependencies
setup.py
git clone https://github.com/sadighian/recommendation-gym.git
cd recommendation-gym # change to project directory
virtualenv -p python3 venv # create the virtual environment
source venv/bin/activate # start using the venv
pip3 install -e . # execute command inside install directory
python3 ppo_experiment.py --training_steps=100000 --evaluation-steps=10000
Refer to ppo_experiment.py
for all the flags.
@misc{Recommendation-Gym,
author = {Jonathan Sadighian},
title = {Recommendation Gym for MovieLens},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/sadighian/recommendation-gym}},
}