The history compression technique formulated above defines expectation-mismatches in a yes-or-no fashion: Each input unit whose activation is not predictable at a certain time gives rise to an unexpected event. Each unexpected event provokes an update of the internal state of a higher-level predictor. The updates always take place according to the conventional activation spreading rules for recurrent neural nets. There is no concept of a partial mismatch or of a `near-miss'. There is no possibility of updating the higher-level net `just a little bit' in response to a `nearly expected input'. In practical applications, some `epsilon' has to be used to define an acceptable mismatch.
In reply to the above criticism, continuous history compression is based on the following ideas:
We use local input representation.
The components of
are forced to sum up to 1 and are
interpreted as a prediction of the probability distribution
of the possible
:
is
interpreted as the prediction of the
probability that
is 1.
The output entropy
How much information is conveyed by
(relative to
the current predictor), once it is observed?
According to [23] it is