GAN Q Learning Save

Unofficial Implementation of GAN Q Learning https://arxiv.org/abs/1805.04874

Project README

This code implements the "GAN Q-Learning" algorithm found in https://arxiv.org/abs/1805.04874.

Modifications From Paper

The published algorithm has a typo in it (in the form of the discriminator loss)
Currently, there seems to be a situation which causes the discriminator to (eventually) perfectly discriminate against the generator (even before learning the actual distribution) on the cartpole environment. I've experimented with different hyperparamters, but this is definitely there. For example, even when I update the generater 10 times per discriminator update, the training graph is still as follows

graph

Final Results

In the end, I was unable to reproduce the results given in the paper since my computer couldn't sweep enough hyperparameters. After verifying that the algorithm is correct, I found that the classic problems of training GANs arose. In particular, the discriminator easily overfit the reward distribution, meaning that the generator got stuck and the reward function couldn't learn. Even with significant artchitecture modifications, these problems persisted.

Open Source Agenda is not affiliated with "GAN Q Learning" Project. README Source: louaaron/GAN-Q-Learning

Stars

Open Issues

Last Commit

3 years ago

Repository

louaaron/GAN-Q-Learning

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/gan-q-learning"><img src="https://www.opensourceagenda.com/projects/gan-q-learning/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022