PPMD.WCX
========

Author : Andreas Muegge
eMail  : andreas.muegge@gmx.de
Date   : 2001-10-23

Version: 1.0

This is a plugin for the very popular Total Commander (www.ghisler.com)
using PPM as compression method. Archives are usually smaller than BZip, GZip,
ACE or other, especially if the alphabet size is small (as it is the case in
normal textfiles) but compression and decompression times are higher. With
modern CPUs this problem should become less important so that PPM is practical
for daily use. Try and decide yourself!

Similiar to BZip or GZip, every archive contains only one file.

The original code is based on PPMd variant H by Dmitry Shkarin (available at
ftp://ftp.nsk.su/.3/windows/compress/ppmde.zip).


Installation
============

1. Unzip the WCX to the Total Commander directory (usually c:\wincmd)
2. In Total Commander, choose Configuration - Options
3. Open the 'Packer' page
4. Click 'Configure packer extension WCXs
5. type  ppm  as the extension
6. Click 'new type', and select the ppmd.wcx
7. Click OK


Using the Plugin
================

If you click on "Configure" you can set some options:

a) Order (2-16)

Roughly spoken, the order is the number of characters which are remembered to
predict the upcoming character (see "Theoretical Background" at the end of the
file for more). A higher value increases the chance that the character can be
"guessed" which means it can be compressed. On the other hand there is an 
overhead with every order so that higher orders may or may not lead to higher
compression. Also, more memory is required for (de)compression.

A value of 5 is a good starting point, just play around with the order to get
a feeling for it. For files with many equal letters or files with very fixed
structures (XML, HTML) it is useful to use very high orders!


b) Memory (1-256 MByte)

This is the upper limit which should be used for compression. Keep in mind that
the same amount is necessary for decompression so make sure that enough virtual
and physical memory is available. 16 MByte is enough for files < 3 MBytes.

If you turn on "Show Statistic" you will see how much memory was actually 
needed for a file.


c) Show Statistic

Display a messagebox after compression with some information.


For Developers
==============

The sourcecode is available per eMail. I have written a C++ wrapper class 
around the original code and modified the I/O so that the new iostream classes
are used. Everything is in an easy-to-use LIB which you can use for your own
projects (for no charge of course).


Theoretical Background
======================

(from Unbounded length contexts for PPM by John G. Cleary, W. J. Teahan, 
Ian H. Witten )


Prediction by partial matching, or PPM, is a finitecontext statistical 
modeling technique that can be viewed as blending together several fixedorder
context models to predict the next character in the input sequence. Prediction
probabilities for each context in the model are calculated from frequency 
counts which are updated adaptively; and the symbol that actually occurs is 
encoded relative to its predicted distribution using arithmetic coding. The 
maximum context length is a fixed constant, and it has been found that 
increasing it beyond about six or so does not generally improve compression. 

The basic idea of PPM is to use the last few characters in the input stream 
to predict the upcoming one. Models that condition their predictions on a few 
immediately preceding symbols are called finitecontext models of order k, 
where k is the number of preceding symbols used. PPM employs a suite of 
fixedorder context models with different values of k, from 0 up to some 
predetermined maximum, to predict upcoming characters. 

For each model, a note is kept of all characters that have followed every 
lengthk subsequence observed so far in the input, and the number of times 
that each has occurred. Prediction probabilities are calculated from these 
counts. The probabilities associated with each character that has followed the
last k characters in the past are used to predict the upcoming character. 
Thus from each model, a separate predicted probability distribution is 
obtained. 

These distributions are effectively combined into a single one, and arithmetic 
coding is used to encode the character that actually occurs, relative to that 
distribution. The combination is achieved through the use of escape 
probabilities. Recall that each model has a different value of k. The model 
with the largest k is, by default, the one used for coding. However, if a novel
character is encountered in this context, which means that the context cannot 
be used for encoding it, an escape symbol is transmitted to signal the decoder
to switch to the model with the next smaller value of k. The process continues 
until a model is reached in which the character is not novel, at which point it 
is encoded with respect to the distribution predicted by that model. To ensure 
that the process terminates, a model is assumed to be present below the lowest 
level, containing all characters in the coding alphabet. This mechanism 
effectively blends the different order models together in a proportion that 
depends on the values actually used for escape probabilities. 

