All our methods are instances of a strategy known as ``predictive
coding'' or ``model-based coding''. We use a neural predictor network
which is trained to
approximate the conditional probability distribution
of possible characters,
given previous characters.
's outputs are fed into
coding algorithms that
generate short codes for characters with low information
content (characters with high predicted probability)
and long codes for characters conveying a lot of
information (highly unpredictable characters).
Why not use a look-up table instead of a network?
Because look-up tables tend to be inefficient.
A look-up table requires
entries for all the conditional
probabilities
of
possible characters,
given
previous characters.
In addition, a special procedure is required for dealing with
previously unseen character combinations.
In contrast, the size of a neural net typically grows in proportion
to
(assuming the number of hidden units grows in proportion to the
number of inputs), and its inherent ``generalization capability''
takes care of
previously unseen character combinations
(hopefully by coming up with good predicted probabilities).
We will make the distinction between on-line and off-line
variants of our approach.
With off-line methods,
is trained on a separate set
of training files.
After training, the weights are frozen and copies
of
are installed at all machines functioning as
message receivers or senders.
From then on,
is used to encode and decode unknown files
without being changed any more.
The weights become part of the code of the compression algorithm.
The storage occupied by the network
weights does not have to be taken into account
to measure the performance
on unknown files - just like
the code for a conventional data compression algorithm
does not have to be taken into account.
The on-line variants are based on the insight that even if the predictor learns during compression, the modified weights need not be sent from the sender to the receiver across the communication channel - as long as the predictor employed for decoding uses exactly the same initial conditions and learning algorithm as the predictor used for encoding (this observation goes back to Shannon). Since on-line methods can adapt to the statistical properties of specific files, they promise significantly better performance than off-line methods. But there is a price to pay: on-line methods tend to be computationally more expensive.
Section IV will show that even off-line methods can sometimes achieve excellent results. We will briefly come back to on-line methods in the final section of this paper.