Mobile Information Systems

Research Article

The Implementation of Deep Reinforcement Learning in E-Learning and Distance Learning: Remote Practical Work

The DQN algorithm and experience replay.

Initialize memory M to capacity C
	Initialize value function Q (action) with random weights
	For episode = 1, N do (for each episode N)
	Initialize state S₁ = (X₁) (starting computer screen pixels) at the beginning of each episode
	//Pre-process and feed the computer screen to our DQN,
	//which will regress the Q-values of all possible actions in the state.
	For t = 1, T do
	Chose an action using the epsilon-greedy policy.
	//With the prospect epsilon, we chose a random action a and with probability 1-epsilon,
	//chose an action that has a maximum Q-value, such as
	Execute action A_tin emulator and observe reword and image X_t+1
	Set S_t+1 = S_t, A_t, X_t+1and pre-process
	Store the transition in M
	Sample random mini-batch of the transitions from M
	Set
	figure the loss function
	//which is just the squared difference between goal Q and predicted Q.
	Do gradient descent with respect to our actual network parameters in order to reduce the loss function.
	After every “” iteration, copy our actual network weights to the goal network weights.
	End for
	End for