Next: Error path integral Up: Exponential error decay Previous: Exponential error decay

### Gradients of the error function

The results we are going to prove hold regardless of the particular kind of cost function used (as long as its continuous in the output) and regardless of the particular algorithm which is employed to compute the gradient. Here we shortly explain how gradients are computed by the standard BPTT algorithm (e.g., [27], see also crossreference Chapter 14 for more details) because its analytical form is better suited to the forthcoming analyses. The error at time is denoted by . Considering only the error at time , output unit 's error signal is

and some non-output unit 's backpropagated error signal at time is

where

is unit 's current net input,

is the activation of a non-input unit with differentiable transfer function , and is the weight on the connection from unit to . The corresponding contribution to 's total weight update is , where is the learning rate, and stands for an arbitrary unit connected to unit .

Next: Error path integral Up: Exponential error decay Previous: Exponential error decay
Juergen Schmidhuber 2003-02-19