Train a DQN Agent to play CarRacing 2d using TensorFlow and Keras.
Training machines to play CarRacing 2d from OpenAI GYM by implementing Deep Q Learning/Deep Q Network(DQN) with TensorFlow and Keras as the backend.
We can see that the scores(time frames elapsed) stop rising after around 500 episodes as well as the rewards. Thus let's terminate the training and evaluate the model using the last three saved weight files trial_400.h5
, trial_500.h5
, and trial_600.h5
.
The model knows it should follow the track to acquire rewards after training 400 episodes, and it also knows how to take short cuts. However, making a sharp right turn still seems difficult to it, which results in getting stuck out of the track.
The model can now drive faster and smoother after training 500 episodes with making seldom mistakes.
To acquire more rewards greedily, the model has gone bad that learns how to drive recklessly and thus making it going off the track when reaching sharp turns.
To play the game with your keyboard, execute the following command.
python play_car_racing_with_keyboard.py
left
and right
key.space
key.shift
key.python train_model.py [-m save/trial_XXX.h5] [-s 1] [-e 1000] [-p 1.0]
-m
The path to the trained model if you wish to continue training after it.-s
The starting training episode, default to 1.-e
The ending training episode, default to 1000.-p
The starting epsilon of the agent, default to 1.0.After having the DQN model trained, let's see how well did the model learned about playing CarRacing.
python play_car_racing_by_the_model.py -m save/trial_XXX.h5 [-e 1]
-m
The path to the trained model.-e
The number of episodes should the model play.train_model.py
The training program.common_functions.py
Some functions that will be used in multiple programs will be put in here.CarRacingDQNAgent.py
The core DQN class. Anything related to the model is placed in here.play_car_racing_by_the_model.py
The program for playing CarRacing by the model.play_car_racing_with_keyboard.py
The program for playing CarRacing with the keyboard.save/
The default folder to save the trained model.Deep Q Learning/Deep Q Network(DQN) is just a variation of Q Learning. It makes the neural network act like the Q table in Q Learning thus avoiding creating an unrealistic huge Q table containing Q values for every state and action.
Q value is the expected rewards given by taking the specific action during the specific state. In a more mathematical saying, Q value can be written as:
Q(s,a) = r(s,a) + γ(maxQ(s',A))
s
is the current states'
is the next future statea
is the particular actionA
is the action spaceQ(s,a)
is the Q value given by taking the action a
during the state s
r(s,a)
is the rewards given by taking the action a
during the state s
maxQ(s',A)
is the maximum Q value given by taking any action in the action space A
during the state s'
γ
is the discount rate that will discount the future Q value because the future Q value is less importantThe Q value given the state s
and the action a
is the sum of the rewards given the state s
and the action a
and the maximum Q value given any action in the action space and the next state s'
multiplied by the discount rate.
Therefore, we should always choose the action with the highest Q value to maximize our rewards.
The Deep Q Network(DQN) takes 3 consecutive top views of the current state of the 2d car racing game as the input and outputs the Q value for each action.
?
dimension is used for batch input.