Experiments: first some LSTM limitations
Was tested on classical time series that feedforward nets learn well when tuned (MackeyGlass...)
LSTM: 1 input unit, 1 input at a time (memory overhead) FNN: 6 input units (no need to learn what to store)
LSTM extracts basic wave; but best FNN better!
Parity: random weight search outperforms all!
So: use LSTM only when simpler approaches fail! Do not shoot sparrows with cannons.
Experience: LSTM likes sparse coding.
Back to J. Schmidhuber's Recurrent neural network page