LSTM to approximate value function of reinforcement learning (RL) algorithm
Network outputs correspond to values of various actions,
learned through Advantage Learning RL algorithm
In contrast with supervised learning tasks, now LSTM determines
its own subsequent inputs, by means of its outputs!
Back to J. Schmidhuber's Recurrent neural network page