Network Working Group                  Richard Price, Siemens/Roke Manor 
INTERNET-DRAFT                        Robert Hancock, Siemens/Roke Manor 
Expires: May 2001                     Stephen McCann, Siemens/Roke Manor 
                                         Mark A West, Siemens/Roke Manor 
                                     Abigail Surtees, Siemens/Roke Manor 
                                              Christian Schmidt, Siemens 
                                                                         
                                                       November 17, 2000 
    
           Efficient Protocol Independent Compression (EPIC) 
                      <draft-price-rohc-epic-00.txt> 
    
Status of this Memo 
 
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of [RFC-2026]. 
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups. Note that other 
   groups may also distribute working documents as Internet-Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time. It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress." 
    
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt 
    
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html. 
    
   This document is a submission to the IETF ROHC WG. Comments should be 
   directed to the mailing list of ROHC, rohc@cdt.luth.se. 
    
    
Abstract 
    
   The RObust Header Compression [ROHC5] scheme is designed to compress 
   packet headers over error prone channels. It is structured into two 
   main parts. Firstly there is an extensible core framework covering 
   the basic ROHC protocol and packet formats, which is instantiated for 
   a particular protocol stack by means of a ROHC profile. Secondly 
   [ROHC5] also includes a number of specific profiles for particular 
   protocol stacks (RTP/UDP/IP etc.). 
    
   The Efficient Protocol Independent Compression (EPIC) scheme is a 
   general mechanism for providing additional profiles for use within 
   the ROHC framework. It can produce a set of compressed header formats 
   for an arbitrary protocol stack and a chosen level of robustness. As 
   with [ROHC5], the decompressed headers are semantically identical to 
   the original uncompressed headers. Moreover the headers generated by 
   EPIC have the special property that, in a well defined sense, they 
   are optimally efficient. 

Price et al.                                                  [PAGE 1] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
Table of contents  
    
   1.  Introduction.................................................3 
   2.  Terminology..................................................4 
   3.  Existing Solutions...........................................4 
   3.1.  ROHC framework.............................................4 
   3.2.  Location of EPIC in ROHC...................................4 
   3.3. ROHC profiles...............................................5 
   4.  Header compression using EPIC................................5 
   4.1.  Compression and decompression procedures...................5 
   4.2.  Huffman compression and EPIC inputsets.....................7 
   5.  EPIC Inputset................................................9 
   5.1.  General parameters.........................................9 
   5.1.1.  Lookup tables............................................9 
   5.1.2.  Control of header alignment..............................9 
   5.1.3.  Available memory.........................................10 
   5.2.  Field-specific parameters..................................11 
   6.  EPIC Encoding Toolbox........................................12 
   6.1.  Toolbox members............................................13 
   6.1.1.  STATIC...................................................13 
   6.1.2.  IRREGULAR(k).............................................13 
   6.1.3.  VALUE(v).................................................13 
   6.1.4.  READ(LTABLE,i)...........................................13 
   6.1.5.  WRITE(k,LTABLE,i)........................................14 
   6.1.6.  LSB(k,p).................................................14 
   6.1.7.  UNCOMPRESSED.............................................14 
   6.1.8.  INFERRED.................................................14 
   6.2.  Hints on applying the toolbox..............................15 
   6.2.1.  Compressing variable-length fields.......................15 
   6.2.2.  Compressing extension headers............................15 
   7.  Control Fields...............................................17 
   7.1.  CRC........................................................17 
   7.2.  Master Sequence Number.....................................18 
   7.3.  Miscellaneous Information..................................18 
   8.  Extensibility................................................19 
   8.1.  Other protocol stacks......................................19 
   8.2.  New toolbox methods........................................19 
   8.3.  New control fields.........................................20 
   8.4.  Learning version of EPIC...................................20 
   9.  EPIC Processing..............................................21 
   9.1.  Structure of the Huffman tree..............................21 
   9.2.  Compressor operation.......................................22 
   9.2.1.  Step 1: Compression front end............................23 
   9.2.2.  Step 2: Compressing the uncompressed header..............23 
   9.2.3.  Step 3: Adding the UNCOMPRESSED fields...................23 
   9.3.  Decompressor operation.....................................23 
   9.3.1.  Step 1: Decompressing the compressed header..............23 
   9.3.2.  Step 2: Reconstructing the INFERRED fields...............24 
   10.  Security considerations.....................................24 
   11.  Acknowledgements............................................24 
   12.  Intellectual Property Considerations........................24 
   13.  References..................................................25 
   14.  Authors' Addresses..........................................25 
   Appendix A.  EPIC Description in Pseudocode......................26 

Price et al.                                                  [PAGE 2] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   A.1.  Structure of the Hierarchical Huffman Tree.................26 
   A.2.  Compressor.................................................27 
   A.2.1.  Step 1: Compression front end............................27 
   A.2.2.  Step 2: Compressing the uncompressed header..............28 
   A.2.3.  Step 3: Adding the UNCOMPRESSED fields...................29 
   A.3.  Decompressor...............................................29 
   A.3.1.  Step 1: Decompressing the compressed header..............29 
   A.3.2.  Step 2: Decompression front end..........................30 
   A.3.3.  Step 3: Reconstructing the INFERRED fields...............31 
   A.4.  Hierarchical Huffman tree construction.....................32 
   A.4.1.  Step 1: Process the EPIC inputset........................33 
   A.4.2.  Step 2: Resolve complex encoding methods.................33 
   A.4.3.  Step 3: Tree pruning.....................................35 
   A.4.4.  Step 4: Creating the HH Tree.............................35 
   A.5.  EPIC implementation issues.................................38 
   A.5.1.  Arithmetic...............................................38 
   Appendix B.  Example Profile for RTP/UDP/IPv4....................40 
   B.1.  General parameters.........................................40 
   B.2.  Field-specific parameters..................................40 
   B.3.  Changes required for a random IP ID........................49 
   Appendix C: Proof of Optimal Efficiency..........................50 
    
1.  Introduction 
    
   This document describes a method for generating new compressed header 
   formats for [ROHC5]. The Efficient Protocol Independent Compression 
   (EPIC) scheme takes as its input a list of fields in the protocol 
   stack to be compressed, and for each field a choice of compression 
   techniques to be made available for the field. Using this input EPIC 
   derives a set of compressed header formats stored in the form of a 
   tree. These compressed header formats can then be used to quickly 
   compress and decompress headers. 
    
   An important property of EPIC is that for any chosen protocol it 
   generates a set of compressed headers for which the average header 
   size is provably minimal. In particular, for the same level of 
   robustness the compressed headers produced by EPIC are 5 - 10% more 
   efficient than those currently used in [ROHC5]. A subset of EPIC has 
   been implemented in accordance with this draft to demonstrate the 
   operation of the algorithms, and this is available from [IMPL]. 
    
   Chapter 3 gives an overview of [ROHC5] and explains how EPIC fits 
   into the ROHC framework. 
   Chapter 4 describes the steps involved in the EPIC scheme. 
   Chapter 5 lists the inputset needed to create a new set of compressed 
   header formats. 
   Chapter 6 describes the eight basic compression techniques available 
   in the EPIC toolbox. 
   Chapter 7 discusses the EPIC control fields. 
   Chapter 8 considers the extensibility of EPIC. 
   Chapter 9 contains a step-by-step worked example of EPIC in action. 
   Appendix A gives a normative description of EPIC in pseudocode. 
   Appendix B lists an example EPIC inputset for RTP/UDP/IPv4. 
   Appendix C states and proves the optimal efficiency claim for EPIC. 

Price et al.                                                  [PAGE 3] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
2.  Terminology 
    
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this 
   document are to be interpreted as described in [RFC-2119]. 
    
    
3.  Existing Solutions 
    
   This chapter describes the overall ROHC framework and the location of 
   EPIC within [ROHC5]. 
    
3.1.  ROHC framework 
    
   [ROHC5] defines a general compressed header format that is used for 
   any compressed packet as well as feedback and IR (Initialization and 
   Refresh) packets. The general header format includes a CID (Context 
   Identifier) to indicate the packet stream to which the header 
   belongs. It also includes a packet type indicator to specify whether 
   the packet is a feedback, initialization or general compressed 
   header, whether it is segmented, and whether it contains padding. 
    
   The following packet type indicators are reserved in the overall 
   [ROHC5] framework: 
    
   1110:     Padding or Add-CID octet 
   11110:    Feedback 
   111110:   IR-DYNAMIC packet 
   1111110:  IR packet 
   1111111:  Segment 
    
   The remaining packet types depend on the protocol stack to be 
   compressed. A ROHC profile is the specification of how to compress a 
   certain kind of header stream over a certain kind of link. Each ROHC 
   profile has its own set of compressed packet formats to complement 
   the general packet types common to all ROHC profiles. 
    
   In general the compressed header formats for a profile are octet-
   aligned and do not start with the bit pattern 111XXXXX (since this is 
   reserved by the overall ROHC framework). 
    
   Note that as well as defining a new set of compressed header formats, 
   each ROHC profile also describes how the compressor and decompressor 
   should be initialized and when to change state or mode. 
    
3.2  Location of EPIC in ROHC 
    
   The EPIC scheme generates new compressed header formats for use in 
   ROHC profiles. 
    
   The packet formats generated by EPIC are fully [ROHC5]-compatible. 
   The location of the EPIC compressed header within the general ROHC 
   packet is shown below: 
    

Price et al.                                                  [PAGE 4] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
        0                           7 
       --- --- --- --- --- --- --- --- 
      |         Add-CID octet         | if for CID 1-15 and small CIDs 
      +---+--- --- --- ---+--- --- ---+ 
      |    EPIC compressed header     | 1 octet 
      +---+--- ---+---+---+--- --- ---+ 
      |                               | 
      /    0, 1, or 2 octets of CID   / 1 or 2 octets if large CIDs 
      |                               | 
      +---+---+---+---+---+---+---+---+ 
      /    EPIC compressed header     / variable 
      +---+---+---+---+---+---+---+---+ 
    
        Figure 1 : Location of EPIC within the general ROHC packet 
    
    
   Note that to ensure ROHC compatibility, the first octet of an EPIC 
   compressed header never uses the bit pattern 111XXXXX. 
    
3.3.  ROHC profiles 
    
   [ROHC5] currently contains four profiles: 
    
   Profile 0 for sending uncompressed IP packets 
   Profile 1 for RTP/UDP/IP compression 
   Profile 2 for UDP/IP compression 
   Profile 3 for ESP/IP compression 
    
   The basic function of EPIC is to generate new profiles for [ROHC5]. 
   Each profile includes a set of compressed header formats that can 
   efficiently compress one protocol stack. Within each profile, the 
   size of the CRC checksum, length of the sequence number and memory 
   requirements for the profile can also be customized. 
 
    
4.  Header Compression using EPIC 
    
   This chapter outlines the EPIC scheme. 
    
4.1.  Compression and decompression procedures 
    
   Any ROHC-compatible header compression scheme can be considered to 
   operate as follows: 
   1. For an input uncompressed header, extract the non-redundant 
      information from the header fields using available context data as 
      appropriate. 
   2. Format this information into a compressed header (as in Figure 1) 
      for transfer to the decompressor. 
   3. Use this information, along with context data at the decompressor, 
      to reconstruct the original header. 
   This section describes in overview how each stage is carried out 
   using the EPIC scheme. 
    


Price et al.                                                  [PAGE 5] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   Figure 2a illustrates the per-packet processing which is done for 
   each header to be compressed and decompressed. 
    
    (1)   |                                                  ^ 
          | {Uncompressed                      {Uncompressed | 
          |    header}                            header}    | 
          v                                                  | 
        +-----------------+                                  | 
        |    Classify     |                                  | 
        |     packet      |                                  | 
        +-----------------+                                  | 
          |                                                  | 
          | {Header &                                        | 
          |  format}                         +-----------------+ 
          v                                  | Apply decoding  | 
        +-----------------+                  |   methods and   | 
        | Apply encoding  |                  |   reconstruct   | 
        |     methods     |                  |     header      | 
        | (Section A.2.1) |                  | (Section A.3.2) | 
        +-----------------+                  +-----------------+ 
          |                                                  ^ 
          | {Format &                              {Format & | 
          |  M value}                               M value} | 
          v                                                  | 
    (2) +-----------------+                  +-----------------+ 
        |      Tree       |                  |      Tree       | 
        |    traversal    |                  |    traversal    | 
        | (Section A.2.2) |                  | (Section A.3.1) | 
        +-----------------+                  +-----------------+ 
          |                                                  ^ 
          | {Codewords}                          {Codewords} | 
          v                  {Compressed                     | 
        +-----------------+    header}   (3) +-----------------+ 
        |     Header      |                  |     Header      | 
        |  formatting &   | ---------------> |   reception &   | 
        |  transmission   |                  |    parsing      | 
        +-----------------+                  +-----------------+ 
                                      
                Figure 2a : EPIC compression/decompression 
    
   For (1) EPIC carries out two principal steps. The first is to choose 
   a "compressed header format" for the uncompressed header. This format 
   defines how the non-redundant information will be extracted from each 
   field in the uncompressed header. Note that the set of available 
   formats depends on the chosen profile, and optionally on other 
   factors such as the compressor/decompressor mode. 
    
   For the second step of (1) the information is extracted from each 
   field in turn and the results combined together into a single bit 
   pattern, which can be thought of as a single large integer known as 
   the "M value". The process used to do this extraction for any given 
   field is called the "encoding method" for that field. The set of 
   fundamental encoding methods is called the "EPIC toolbox", and is 
   essentially the same as those used for the profiles defined in 

Price et al.                                                  [PAGE 6] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   [ROHC5]. The capabilities of EPIC are extensible simply by adding to 
   this toolbox (all other parts of the process are unaffected); 
   extensibility issues are considered further in Chapter 8. 
    
   The next step (2) involves taking the format and M value and 
   constructing the compressed header itself. The header formats are 
   stored using a tree, with each leaf node corresponding to exactly one 
   compressed header format. EPIC follows the tree from leaf to root, 
   using the M value to fill gaps in the compressed header where field 
   information is to be stored. 
    
   At the receiver, step (3) essentially carries out equivalent 
   operations in reverse. The decompressor has an identical tree, and 
   can use the EPIC compressed header to navigate a path through the 
   tree starting from the root. This results in a known compressed 
   header format and reconstructed M value. Since the set of encoding 
   methods used is now known (from the format) the M value can be 
   resolved into the non-redundant information from every field, 
   reconstructing each field in the uncompressed header. 
    
4.2.  Huffman compression and EPIC inputsets 
    
   Huffman compression is a well known technique used in many popular 
   compression schemes. EPIC uses a variant of Huffman encoding to 
   produce a new set of compressed header formats for the compressor and 
   decompressor. EPIC employs an optimization of the basic Huffman 
   technique which reduces memory and processing requirements within the 
   compressor/decompressor. 
    
   Basic Huffman encoding is designed to compress an arbitrary stream of 
   characters from a character set such as ASCII. The idea is to create 
   a new set of compressed characters, where each normal character maps 
   onto a compressed character and vice versa. Common characters are 
   given shorter compressed equivalents than rarely used characters, 
   reducing the average size of the data stream. The compression ratio 
   can be improved by increasing the size of one character, but at the 
   expense of higher memory requirements. In fact the memory required in 
   a Huffman compressor grows exponentially with the character size so 
   16-bit characters need 256 times as much memory as 8-bit characters. 
    
   The fundamental idea of EPIC is to apply Huffman encoding to a stream 
   of packet headers, where each header is treated as a single 
   character. The advantage of this choice is that Huffman encoding is 
   optimally efficient for a single character, so applying Huffman to 
   the whole header at once means that the average compressed header 
   size is provably the minimum possible. The disadvantage of basic 
   Huffman is that the character set is extremely large (for a 40 octet 
   protocol stack the character set contains up to 2^320 headers). 
   Clearly the basic Huffman technique cannot be used in this way. 
    
   The solution lies in a modified version of Huffman encoding known as 
   Hierarchical Huffman (HH). The HH algorithm compresses a data stream 
   with exactly the same level of efficiency as normal Huffman, but for 
   a fraction of the processing and memory cost. 

Price et al.                                                  [PAGE 7] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   The input required to build a new set of compressed header formats is 
   known as an "EPIC inputset". Each inputset is essentially a 
   structured description of the behavior of the protocol stack to be 
   compressed (along with some control information). EPIC converts an 
   inputset into a suitable form for the HH algorithm, which then builds 
   a unique HH tree containing the header formats. Inputsets have a 
   recursive structure, which makes it easy to build new inputsets out 
   of existing ones (e.g. when adding new layers to a protocol stack). 
   In advanced applications of EPIC new inputsets can even be 
   transmitted in a simple "profiling" message, which allows dynamic 
   updating of the header compression capabilities of a device. 
    
   Figure 2b describes the process of building the HH tree from the EPIC 
   inputset, which is done once only and the results stored at the 
   compressor and decompressor. References are given to pseudocode in 
   Appendix A which describes the various stages explicitly. 
                                      
                          +-----------------------+ 
                          |    Input Stage 1:     | 
                          |  Enter the inputset   | 
                          |    which describes    | 
                          |  the protocol stack   | 
                          |    (Section A.4.1)    | 
                          +-----------------------+ 
                                      | 
                                      v 
                          +-----------------------+ 
                          |    Input Stage 2:     | 
                          |    Resolve complex    | 
                          | encoding methods into | 
                          |  fundamental methods  | 
                          |    (Section A.4.2)    | 
                          +-----------------------+ 
                                      | 
                                      v 
                          +-----------------------+ 
                          |    Input Stage 3:     | 
                          | Prune low probability | 
                          |    header formats     | 
                          |    (Section A.4.3)    | 
                          +-----------------------+ 
                                      | 
                                      v 
                          +-----------------------+ 
                          |    Input Stage 4:     | 
                          |  Run HH algorithm to  | 
                          |     generate tree     | 
                          |    (Section A.4.4)    | 
                          +-----------------------+ 
    
                     Figure 2b : HH Tree Construction 
    
   The following chapters describe the mechanisms of EPIC in greater 
   detail. 

Price et al.                                                  [PAGE 8] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
5.  EPIC Inputset 
    
   This chapter describes the inputset that EPIC requires to create a 
   new set of compressed header formats. Section 5.1 describes the 
   general inputset parameters common to the entire protocol stack, 
   whilst Section 5.2 lists the inputset parameters specific to each 
   field. 
    
5.1.  General inputset parameters 
    
   The general parameters are used to configure the lookup tables, the 
   alignment of the compressed headers and the amount of memory reserved 
   for EPIC at the compressor and decompressor. 
    
5.1.1.  Lookup tables 
    
   One of the compression techniques offered by EPIC is table-based item 
   compression as described in Section 5.8.1 of [ROHC5]. This 
   compression method allows an item of data to be transmitted in full 
   together with an index. The item is stored in a lookup table at the 
   position specified by index, and once the compressor gains enough 
   confidence that the decompressor has successfully received the 
   mapping it can send the item by transmitting its index only. 
    
   EPIC allows multiple lookup tables to be set up, where each lookup 
   table is capable of storing a fixed number of items. Note that 
   smaller lookup tables achieve better compression efficiency since 
   fewer bits are required to index an item in the table. 
    
   Each lookup table is given a unique name by which it can be 
   referenced. Lookup tables are initialized as shown in the example 
   below: 
    
   LTABLE{"0000","0101","110111"} 
    
   The name of the lookup table is given, followed by a list of bit 
   patterns that represent the initial entries in the lookup table. Note 
   that the one lookup table may sometimes store bit patterns with 
   several different lengths (for example when the lookup table is used 
   to store values from a variable-length field). 
    
   Lookup tables are accessed and updated using a pair of encoding 
   techniques as described in Chapter 6. 
    
5.1.2.  Control of header alignment 
    
   The alignment of the compressed headers is controlled using the ALIGN 
   and NPATTERNS parameters. Each of the parameters is an integer. 
    
   All of the compressed headers produced by EPIC are guaranteed to be  
   an integer multiple of ALIGN bits long. Moreover, the first word of 
   each compressed header (where one word is ALIGN bits) uses only 
   NPATTERNS of the 2^ALIGN different bit patterns available. 
    

Price et al.                                                  [PAGE 9] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   For example, suppose that ALIGN = 8 (the octet-aligned case) and 
   NPATTERNS = 224. In the first octet of every compressed header only 
   224 of the available 256 bit patterns are used by EPIC, namely the 
   bit patterns 00000000, 00000001 through to 11011111 inclusive. The 
   bit patterns which are not used are 11100000 through to 11111111. 
    
   In fact, it is important for EPIC not to use the bit patterns 
   111XXXXX in the first octet of each compressed header because they 
   are reserved by the ROHC framework. The correct parameters for EPIC 
   to produce [ROHC5]-compatible compressed header formats are ALIGN = 8 
   and NPATTERNS = 224. 
    
   Some useful values for ALIGN and NPATTERNS are listed below: 
    
   Compressed Headers           ALIGN   NPATTERNS 
    
   Bit-aligned                  1       2 
   Octet-aligned                8       256 
   Octet-aligned with           8       224 
     111XXXXX reserved 
    
5.1.3.  Available memory 
    
   The MAXFORMATS and PTHRESH parameters control the number of 
   compressed header formats to be stored at the compressor and 
   decompressor. 
    
   The MAXFORMATS parameter determines the maximum number of compressed 
   header formats stored by EPIC. If more compressed header formats are 
   generated than can be stored then EPIC discards all but the 
   MAXFORMATS most likely formats to be used. Any packet requiring a 
   compressed header format that has been discarded is communicated 
   using a DYNAMIC packet instead. 
    
   Similarly, the PTHRESH parameter specifies a probability threshold 
   for the compressed header formats. If any format is used with less 
   that PTHRESH probability then it is discarded to conserve memory at 
   the compressor and decompressor. A DYNAMIC packet can be used to 
   communicate the information instead. 
    
   The MAXFORMATS and PTHRESH parameters affect the EPIC compressed 
   header formats, and so for interoperability they MUST be specified as 
   part of the EPIC inputset. 
    
   Note that one compressed header format requires approximately 2 
   octets of storage space at the compressor and decompressor. Since the 
   header formats are only calculated once, it is possible to store them 
   in read only memory. 
    
   Note also that EPIC requires additional working memory to compress 
   and decompress the headers, to store the lookup tables and so on. 
   However, EPIC places no extra working memory requirements on the 
   compressor and decompressor compared to [ROHC5]. 
    

Price et al.                                                 [PAGE 10] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
5.2.  Field-specific parameters 
    
   EPIC contains a toolbox of eight basic compression techniques (LSB 
   compression, lookup table compression etc.). The field-specific 
   parameters indicate which encoding methods from the toolbox should be 
   applied to which fields in the protocol stack. 
    
   The basic plan is to build up more complex encoding methods from the 
   set of eight techniques in the encoding toolbox (Chapter 6). This 
   allows us to construct an encoding method sufficiently powerful to 
   handle the entire protocol stack. 
    
   A new encoding method is described by means of a table as shown below 
   (N.B. the normative version of IPv4 encoding is found in Appendix B): 
    
   IPv4-ENCODING: TTL{"",""} 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |        Version         |           STATIC           |     100%    | 
   |     Header Length      |                            |             | 
   |     Reserved Flag      |                            |             | 
   |   May Fragment Flag    |                            |             | 
   |   Last Fragment Flag   |                            |             | 
   |    Fragment Offset     |                            |             | 
   |     Source Address     |                            |             | 
   |  Destination Address   |                            |             | 
   +------------------------+----------------------------+-------------+ 
   |     Packet Length      |          INFERRED          |     100%    | 
   |    Header Checksum     |                            |             | 
   |        Protocol        |                            |             | 
   +------------------------+----------------------------+-------------+ 
   |    Type of Service     |           STATIC           |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(8)        |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
   |     Identification     |           STATIC           |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(16)       |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
   |      Time to Live      |           STATIC           |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |         READ(TTL,2)        |     0.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |       WRITE(8,TTL,2)       |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
    
   IPv4-ENCODING is the name of the new encoding method. TTL{"",""} is a 
   lookup table containing two entries, which is used to store the two 
   most recent values of the Time to Live field. This lookup table is 
   useful because the TTL field sometimes alternates between two 
   different values (this is possible, for example, in the case of load 
   sharing over paths of unequal hop counts). 

Price et al.                                                 [PAGE 11] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   The Field Name(s) column contains a list of field names from the 
   protocol stack to be compressed (in this case an IPv4 header). 
    
   Notice that there is no length indication for any of the fields. This 
   is because the length of the field can be inferred from the field 
   name. EPIC can also cope with variable-length fields (as described in 
   Section 6.2.1). 
    
   For each field (or set of fields) the table includes a list of 
   possible encoding methods. The encoding methods can be chosen from 
   the set of eight techniques in the encoding toolbox of Chapter 6 (in 
   the example the STATIC, IRREGULAR, LSB, READ, WRITE and INFERRED 
   encoding methods are all from the toolbox). Alternatively, the 
   encoding methods can be defined by their own tables as shown below: 
    
   RTP/UDP/IPv4-ENCODING: 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |          CRC           |        IRREGULAR(3)        |     100%    | 
   +------------------------+----------------------------+-------------+ 
   | Master Sequence Number |         LSB(4,-1)          |     100%    | 
   +------------------------+----------------------------+-------------+ 
   |      IPv4 Header       |       IPv4-ENCODING        |     100%    | 
   +------------------------+----------------------------+-------------+ 
   |       UDP Header       |        UDP-ENCODING        |     100%    | 
   +------------------------+----------------------------+-------------+ 
   |       RTP Header       |        RTP-ENCODING        |     100%    | 
   +------------------------+----------------------------+-------------+ 
    
   The RTP/UDP/IPv4 encoding method uses the IPv4 encoding method 
   defined previously. In this manner it is possible to build some 
   powerful and flexible encoding methods. 
    
   The final column lists the probability that each encoding method will 
   be used to encode the field in question. This is a very important 
   column since EPIC ensures that common encoding methods require fewer 
   bits to carry the compressed data than rarely used encoding methods. 
    
   Note that the order in which fields are given and the order of the 
   encoding types for a particular field is significant in the following 
   sense: 
   - changing the order does not impact on the compression/decompression 
   performance of the profile; however 
   - changing the order does modify the format and content of the 
   compressed headers themselves. 
    
   Therefore, for interoperability, both the compressor and decompressor 
   MUST have a common set of input tables including all ordering 
   considerations. 
    
    
    

Price et al.                                                 [PAGE 12] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
6.  EPIC Encoding Toolbox 
    
   ROHC can encode header fields using a number of different encoding 
   techniques (LSB encoding, scaled timestamp encoding, list-based 
   compression etc.). The EPIC encoding toolbox supports all of the 
   techniques currently required by [ROHC5]. Different encoding methods 
   can be assigned to different fields using a simple table of inputs as 
   described in Section 5.2. 
    
6.1.  Toolbox members 
    
   The EPIC encoding toolbox contains the following techniques: 
    
6.1.1.  STATIC 
    
   The STATIC encoding method can be used when the header field does not 
   change relative to the context. If a field is STATIC then no 
   information concerning the field need be transmitted in the 
   compressed header. 
    
   The STATIC command has no parameters. 
    
6.1.2.  IRREGULAR(k) 
    
   The IRREGULAR(k) encoding method is used when the field cannot be 
   compressed relative to the context, and hence must be transmitted in 
   full. 
    
   The parameter k is the (uncompressed) field length in bits. 
    
   Note that it is possible to use the IRREGULAR encoding method to 
   compress variable-length fields. This is accomplished by specifying a 
   separate IRREGULAR(k) encoding method for every common length of the 
   variable-length field. Further information can be found in Section 
   6.2.1. 
    
6.1.3.  VALUE(v) 
    
   The VALUE(v) encoding method has a single parameter v, which is a 
   field value represented as a bit pattern. The VALUE(v) encoding can 
   be used to transmit any field which takes value v. 
    
   Note that VALUE(v) is just a simplified form of lookup table 
   compression. It is useful for compressing fields which take a well-
   known value with high probability. 
    
6.1.4.  READ(LTABLE,i) 
    
   The READ(LTABLE,i) encoding method retrieves the field value from the 
   lookup table LTABLE. Recall that EPIC maintains a number of distinct 
   lookup tables for storing and retrieving common field values. If the 
   value of a field is stored in the lookup table LTABLE then it can be 
   efficiently compressed by sending its position in the lookup table 
   only. 

Price et al.                                                 [PAGE 13] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   The parameter LTABLE is the name of the lookup table. The parameter i 
   is the number of items which the lookup table can store. 
    
   Common field values can be stored in the lookup table in two ways. 
   Firstly they might be stored in the table at initialization of the 
   compressor/decompressor. Secondly they can be updated by using the 
   WRITE encoding method. 
    
   If the value of the field is not present in any lookup table then the 
   READ encoding method cannot be used to compress the field. 
    
6.1.5.  WRITE(k,LTABLE,i) 
    
   The WRITE encoding method sends the field in full and also stores it 
   in the lookup table LTABLE. Subsequently, whenever the field value 
   turns up it can be retrieved from the lookup table using the 
   READ(LTABLE,i) encoding method (provided of course that the WRITE 
   method has been used sufficient times to ensure robustness). 
    
   The WRITE command has the following 3 parameters: 
    
   k:      Uncompressed field length in bits 
   LTABLE: Name of lookup table 
   i:      Number of values which the lookup table can store 
    
6.1.6.  LSB(k,p) 
    
   The LSB(k,p) method compresses the field using LSB encoding. 
    
   LSB encoding is described in Section 4.5.1 of [ROHC5]. The meaning of 
   the two parameters k and p is described in [ROHC5], but basically the 
   parameters have the following functions: 
    
   k:   Number of LSBs to transmit in the compressed header 
   p:   Offset of interpretation interval 
    
6.1.7.  UNCOMPRESSED 
    
   The UNCOMPRESSED encoding method is used to send the field value in 
   full at the end of the compressed header. 
    
   Note that all UNCOMPRESSED fields are sent concatenated together at 
   the end of the compressed header. This means that the decompressor 
   needs to know the length of each UNCOMPRESSED field in order to 
   reconstruct the uncompressed header. The way in which the field 
   length is inferred is specific to each UNCOMPRESSED field, and is 
   described separately from the table of the EPIC inputset. 
    
6.1.8.  INFERRED 
    
   The INFERRED method instructs EPIC to infer the value of the field 
   from other field values and/or the context. 
    


Price et al.                                                 [PAGE 14] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   The INFERRED method has no parameters. The precise method by which 
   INFERRED fields are reconstructed is specific to each field, and is 
   described separately from the table of the EPIC inputset. 
    
6.2.  Hints on applying the toolbox 
    
   This section gives some hints on how the EPIC toolbox can be applied 
   to compress different types of field. 
    
6.2.1.  Compressing variable-length fields 
    
   EPIC can compress variable-length fields efficiently by allocating a 
   separate encoding method for each possible length of the field. This 
   takes into account the probability distribution of the field length, 
   so more common field sizes can be compressed with better efficiency. 
    
   This technique can be used to compress the GRE Key field as follows: 
    
   GRE-ENCODING: 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |          ...           |            ...             |     ...     | 
   +------------------------+----------------------------+-------------+ 
   |    GRE Key Present     |          INFERRED          |     100%    | 
   +------------------------+----------------------------+-------------+ 
   |        GRE Key         |           STATIC           |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(0)        |     0.5%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(32)       |     0.5%    | 
   +------------------------+----------------------------+-------------+ 
   |          ...           |            ...             |     ...     | 
   +------------------------+----------------------------+-------------+ 
    
   The GRE Key field has two possible lengths (0 octets or 4 octets). If 
   the key is not static then it can be sent in full with either of 
   these two lengths. The GRE Key Present field is inferred from the 
   size of the GRE Key. 
    
   Note that if the field has many different lengths, for simplicity 
   only the most common field lengths need be sent using the 
   IRREGULAR(k) encoding. Fields with less common lengths can be sent 
   UNCOMPRESSED. 
    
6.2.2.  Compressing extension headers 
    
   Protocols such as IP have a number of optional extensions that 
   follow the basic header. In [ROHC5] these extensions are handled by 
   defining a new list-based compression type. In EPIC however, the 
   extension headers can be handled using the basic toolbox encoding 
   methods with no special enhancements. 
    

Price et al.                                                 [PAGE 15] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   An example encoding method for the AH header is given below: 
    
   RTP/UDP/IPv4-ENCODING: 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |          ...           |            ...             |     ...     | 
   +------------------------+----------------------------+-------------+ 
   | Authentication Header  |        AH-ENCODING         |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(0)        |       1%    | 
   +------------------------+----------------------------+-------------+ 
   |          ...           |            ...             |     ...     | 
   +------------------------+----------------------------+-------------+ 
    
   N.B. The term "Authentication Header" in the Field Name(s) column is 
   just shorthand for the set of fields in the authentication header. 
    
   AH-ENCODING: COMMONLENGTHS{"00","03","04","05","06"} 
                AH-SPI{"","","","",""} 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |     AH Next Header     |          INFERRED          |     100%    | 
   |      AH Reserved       |                            |             | 
   +------------------------+----------------------------+-------------+ 
   |       AH Length        |           STATIC           |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |   READ(COMMONLENGTHS,5)    |    0.09%    | 
   |                        +----------------------------+-------------+ 
   |                        |  WRITE(8,COMMONLENGTHS,5)  |    0.01%    | 
   +------------------------+----------------------------+-------------+ 
   |         AH SPI         |           STATIC           |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |       READ(AH-SPI,5)       |   0.099%    | 
   |                        +----------------------------+-------------+ 
   |                        |     WRITE(32,AH-SPI,5)     |   0.001%    | 
   +------------------------+----------------------------+-------------+ 
   |   AH Sequence Number   |          INFERRED          |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |         LSB(8,-1)          |     0.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(32)       |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
   | AH Authentication Data |        UNCOMPRESSED        |     100%    | 
   +------------------------+----------------------------+-------------+ 
    
   The AH header is encoded in the same manner as any other header in 
   the protocol stack. The only difference is that the entire AH header 
   may not be present in the stack, so an additional IRREGULAR(0) 
   encoding is required for the whole AH header. This encoding is used 


Price et al.                                                 [PAGE 16] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   to delete every field in the AH header when it is no longer present 
   in the uncompressed packet. 
    
   Note that IP extension headers do not necessarily occur in the same 
   order for every packet stream. This is not a problem for EPIC 
   because each field in the protocol stack is transmitted separately 
   (and not necessarily in order). The individual fields are 
   reconstructed at the decompressor and then placed in the 
   uncompressed header at their correct locations. 
    
   If the order of any fields changes then EPIC transmits the fields 
   uncompressed. When this information has been sent sufficient times 
   to ensure robustness, the decompressor knows the new ordering of the 
   fields and can subsequently reconstruct the uncompressed headers 
   using the new field ordering as appropriate. 
    
    
7.  Control fields 
    
   In general the fields specified in the EPIC input tables are taken 
   directly from the protocol stack to be compressed. However there are 
   a small number of special fields that can be transmitted in the 
   compressed headers, but which are not present in the uncompressed 
   packets. These fields are known as control fields because they 
   transmit control information from the compressor to the decompressor. 
    
   The most obvious example of a control field is the header compression 
   CRC. This CRC is calculated over the uncompressed header, and is used 
   to detect incorrect decompression at the decompressor. 
    
   EPIC treats the control fields in the same manner as any normal 
   field. The only difference is that at the compressor the control 
   field values are calculated rather than obtained directly from the 
   uncompressed packet. Similarly the control field values are used by 
   the decompressor itself (e.g. to check the header for errors) and are 
   not forwarded in the decompressed packet. 
    
   EPIC supports the following control fields: 
    
7.1.  CRC 
    
   The CRC field holds a CRC checksum calculated across the entire 
   uncompressed header, as described in Section 5.9.2 of [ROHC5]. At the 
   decompressor the CRC field is used to validate that correct 
   decompression has occurred. 
    
   Note that it is possible for different header formats to have 
   different amounts of CRC protection, so extra CRC bits can be 
   allocated to protect important context-updating information. This is 
   accomplished by creating a new encoding method for the fields to be 
   given extra CRC protection as shown in the example below: 
    
    
    

Price et al.                                                 [PAGE 17] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   EXAMPLE-ENCODING: 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |    Important Field     |         NORMAL-CRC         |      50%    | 
   |          CRC           +----------------------------+-------------+ 
   |                        |        IRREGULAR(8)        |      50%    | 
   +------------------------+----------------------------+-------------+ 
    
   NORMAL-CRC: 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |    Important Field     |           STATIC           |     100%    | 
   +------------------------+----------------------------+-------------+ 
   |          CRC           |        IRREGULAR(3)        |     100%    | 
   +------------------------+----------------------------+-------------+ 
    
   Observe that the compressed header has 3 bits of CRC protection if 
   the important field is STATIC. However, if the field is IRREGULAR(8) 
   then the compressed header has 8 bits of CRC protection. This is 
   useful since the IRREGULAR behavior is context-updating. 
    
7.2.  Master Sequence Number 
    
   The RTP/UDP/IP protocol stack contains many different sequence 
   numbers (the RTP sequence number, the ESP sequence number, the AH 
   sequence number and so on). One sequence number must be transmitted 
   to protect against bursts of consecutive lost packets, but it is then 
   possible to infer all other sequence numbers by linear extrapolation 
   (i.e. by adding or subtracting a constant integer). 
    
   Unfortunately, no sequence number field is guaranteed to be present 
   in every IP protocol stack. The RTP sequence number is not visible if 
   the IP payload is encrypted, whilst the ESP and AH sequence numbers 
   are only included if the optional ESP and AH headers are present. To 
   overcome this problem EPIC defines a control field called the Master 
   Sequence Number field. This field is present in every compressed 
   header and can be used to infer all other sequence numbers. 
    
   The size of the Master Sequence Number field is chosen for the best 
   trade-off between robustness and efficiency. A Master Sequence Number 
   of k bits prevent context failure for up to 2^k - 1 consecutive lost 
   packets. 
    
7.3.  Miscellaneous Information 
    
   The Miscellaneous Information fields transmit any extra information 
   that EPIC needs to reconstruct the UNCOMPRESSED and INFERRED fields, 
   but which is not already present in the compressed header. 
    


Price et al.                                                 [PAGE 18] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   Some examples of Miscellaneous Information fields used in RTP/UDP/IP 
   compression are given below: 
    
   TS: The Least Significant Bits (LSBs) of the RTP Timestamp, scaled 
   if Tsc = 1. 
    
   TS STRIDE: The predicted increment/decrement of the RTP Timestamp 
   field when it changes. The use for this field is explained in 
   Section 4.5.4 of [ROHC5]. 
    
   TIME STRIDE: The predicted time interval (in milliseconds) between 
   changes in the RTP Timestamp. The use for this field is explained in 
   Section 4.5.4 of [ROHC5]. 
    
   Tsc: Indicates whether the timestamp is scaled or not.  
    
   All of these fields are needed to infer the RTP timestamp. Note that 
   Miscellaneous Information fields are control fields because they 
   discarded once they have been used to infer the relevant 
   UNCOMPRESSED or INFERRED fields. 
    
    
8.  Extensibility 
    
   This chapter considers the extensibility of the EPIC scheme, 
   including possible future enhancements to EPIC. 
    
8.1.  Other protocol stacks 
    
   Especially for multimedia over IP, additional protocols will become 
   important in future, e.g. tunneling protocols for virtual private 
   networks and unreliable STCP for multimedia. They will generate 
   further protocol overhead for transmission and new requirements 
   towards efficient header compression. 
    
   EPIC requires minimal effort to generate a new ROHC profile. The 
   input to EPIC is the behavior of the new protocol stack (i.e. which 
   fields require which encoding techniques). EPIC then produces an 
   optimally efficient set of compressed header formats for use at the 
   compressor and decompressor. 
    
8.2.  New toolbox methods 
    
   The eight encoding techniques currently offered by the EPIC toolbox 
   are sufficient to compress the entire RTP/UDP/IP protocol stack 
   including the CSRC list and IP extension headers. However, if EPIC 
   is used to compress other protocol stacks then it may become 
   necessary to add new encoding methods to the toolbox. 
    
   In general, a new toolbox encoding method is just a mapping between 
   a set of uncompressed field values and their compressed equivalents. 
   The only requirement is that the mapping is 1-1 (in other words that 
   no two field values map onto the same compressed value). It is 
   acceptable for the mapping to change depending on the context, 

Price et al.                                                 [PAGE 19] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   although care must be taken in this case to ensure that it is 
   robust. For example, a DELTA encoding method that stores the 
   increase in the field value relative to the context is very 
   efficient for compressing incrementing fields. However it is not 
   robust to lost packets (since it fails if the decompressor context 
   is incorrect). 
    
8.3.  New control fields 
    
   It may also be necessary to extend EPIC by adding extra control 
   fields. Recall that control fields communicate information between 
   the compressor and decompressor that is not forwarded in the 
   uncompressed header. Since control fields are treated in exactly the 
   same manner as normal header fields (except that they are not 
   present in the uncompressed header), the only change required is a 
   description of how the control field is calculated at the compressor 
   and how it is used at the decompressor. 
    
8.4.  Learning version of EPIC 
    
   As stated in the introduction, the EPIC compressed header formats 
   are optimal for the chosen protocol stack. This means that it is 
   provably impossible to compress any protocol stack for which EPIC is 
   programmed in a more efficient manner than EPIC (for the chosen 
   header alignment and level of robustness). 
    
   An interesting question, however, is the effectiveness of EPIC when 
   compressing a protocol stack for which it is not programmed. If the 
   correct encoding methods have been assigned to each field but the 
   probabilities that they will be used are slightly inaccurate then 
   the efficiency lost is negligible. 
    
   But suppose that the protocol stack behaves in a significantly 
   different manner than expected by ROHC: for example if an RTP stream 
   uses padding or even RTP extensions. Static compressors with fixed 
   compressed header formats suffer a permanent drop in performance if 
   the protocol stack behaves in an unusual manner. However, since EPIC 
   can dynamically generate new packet formats as they are needed, it 
   is possible for an EPIC-generated ROHC profile to adapt its own 
   header formats to the incoming packet stream. 
    
   A basic version of "Learning" EPIC is straightforward to implement. 
   The encoding methods assigned to each field are not changed; instead 
   the probabilities that each encoding method will be used are refined 
   by counting the number of times they have been used by recent 
   packets. In this manner, if a particular compressed header format is 
   used more often than expected then the Huffman tree will be refined 
   to encode the format using fewer bits. 
    
   Fast algorithms exist for updating a Huffman tree when only a small 
   number of inputs have changed. Care must be taken to ensure that the 
   Huffman trees at the compressor and decompressor are updated in 
   step; this can be accomplished by updating the tree periodically at 


Price et al.                                                 [PAGE 20] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   the compressor and sending the refined version to the decompressor 
   within a special "profiling" message. 
    
    
9.  EPIC Processing 
    
   This chapter presents an outline description of the EPIC processing 
   without any algorithmic or computational notation. The detailed 
   pseudocode description of EPIC is presented in Appendix A. 
    
   There are four main aspects to the pseudocode: 
    
   Structure of the Huffman tree 
   Compressor operation 
   Decompressor operation 
   Tree construction 
    
9.1.  Structure of the Huffman Tree 
    
   Recall that EPIC uses a Hierarchical Huffman (HH) tree to succinctly 
   store a set of compressed header formats. A diagram of an HH tree can 
   be found in Figure 3. 
    
   The HH tree consists of a set of nodes joined by branches. 
   Conventionally the nodes are arranged vertically, with leaf nodes at 
   the top and the root node at the bottom. 
    
   There is a single root node, which is the starting point for 
   decompression operations. There is a set of leaf nodes, which are 
   used as the starting points for compression operations. Each 
   compressed header format is associated with a specific leaf node and 
   vice versa. All other nodes are interior nodes. 
    
   The information needed for compression and decompression is contained 
   in the tree topology (i.e. the connectivity of nodes and branches) 
   and the labels on each branch. Information is also associated with 
   each node whilst the tree is being constructed, but this can be 
   discarded once the tree has been built. 
    
   Note that this "tree" is multiply connected, i.e. there are several 
   alternative paths between any given leaf node and the root. The 
   correct route to follow is determined by the compression and 
   decompression algorithms. 
    
   An example of a bit-aligned HH tree is given in Figure 3. The example 
   considers a very simple protocol stack with two fields labeled A and 
   B. Four compressed header formats are available to compress this 
   protocol stack. Each header format is placed in a leaf node of the 
   Huffman tree, together with the probability "P" that the format will 
   be used to compress a header and the number "N" of distinct headers 
   that can be compressed using the format. 
    
   The procedure for generating the tree itself is described in Appendix 
   A.4. 

Price et al.                                                 [PAGE 21] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
    
        Leaf 1            Leaf 2            Leaf 3             Leaf 4 
      A=STATIC,          A=STATIC,      A=IRREGULAR(1)        A=STATIC, 
    B=IRREGULAR(2)   B=READ(LTABLE,3)      B=STATIC           B=STATIC 
    
         ---               ---               ---                 --- 
        | 3%|             | 8%|             | 5%|               |54%| 
        | 4 |             | 3 |             | 2 |               | 1 | 
         ---               ---               ---                 --- 
          | X             / D \               | X               / 0 
          |              /     \              |                / 
          |             /       \             |               / 
         ---         ---         ---         ---             / 
        | 6%|       | 8%|       | 8%|       |10%|           / 
        | 2 |       | 2 |       | 1 |       | 1 |          / 
         ---         ---         ---         ---          / 
          | X         | X         1 \       / 0          / 
          |           |              \     /            / 
          |           |               \   /            / 
         ---         ---               ---            / 
        |12%|       |16%|             |18%|          / 
        | 1 |       | 1 |             | 1 |         / 
         ---         ---               ---         / 
          1 \       / 0               / 0         / 
             \     /                 /           / 
              \   /                 /           / 
               ---                 /           / 
              |28%|               /           / 
              | 1 |              /           / 
               ---              /           / 
                1 \            /           / 
                   \          /           / 
                    \        /           / 
                     \      /           / 
                      \    /           / 
                        ---           / 
                       |46%|         / 
                       | 1 |        / 
                        ---        / 
                         1 \      / 
                            \    / 
                             \  / 
                             ---- 
                            |100%| 
                            | 1  | 
                             ---- 
    
           Figure 3 : An example of a Hierarchical Huffman tree 
    
    
9.2.  Compressor operation 
    
   This section describes the steps of operation within the EPIC header 
   compressor. This will be done in 3 steps: the selection of the 

Price et al.                                                 [PAGE 22] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   relevant HH tree and leaf node, the compression itself, and the 
   addition of the uncompressed fields. 
 
9.2.1.  Step 1: Compression front end 
    
   The input to the EPIC compressor is a header from the protocol stack 
   to be compressed. The compressor maintains a number of contexts as 
   described in Section 5.1.3 of [ROHC5], and the correct context to use 
   should be selected depending on the uncompressed header. 
    
   The uncompressed header should then be divided up into fields. As per 
   [ROHC5] Section 5.7, the term hdr(X) will refer to the value of Field 
   X in the uncompressed header. 
    
   Next the compressor chooses an encoding method from the EPIC toolbox 
   for each field in the protocol stack (for example the IP-ID might be 
   encoded using the "IRREGULAR" function etc.) from the set of 
   possibilities defined by the inputset. The choice of encoding methods 
   can then be mapped onto a compressed header format. There is one 
   compressed header format for every possible way the fields can be 
   encoded (unless the number of formats are limited due to memory 
   constraints, in which case a DYNAMIC packet can be sent instead). 
    
   In general the only requirement is that the format must be capable of 
   transmitting enough information to reconstruct the header at the 
   decompressor. If more than one suitable compressed header format is 
   available then the compressor may choose any of these. 
    
9.2.2.  Step 2: Compressing the uncompressed header 
    
   The compression process works down the Huffman tree, making decisions 
   at each node based on the choice of encoding method for each field. 
    
   The process finishes when there are no more branches to follow. The 
   result is a string of integers (known as codewords) that each contain 
   ALIGN bits of the compressed header. 
    
9.2.3.  Step 3: Adding the UNCOMPRESSED fields 
    
   The third step is to add the uncompressed fields to the end of the 
   compressed header. The final task is to encapsulate the EPIC 
   compressed header within a standard ROHC packet as illustrated in 
   Figure 1. The packet with compressed header is then ready to be 
   transmitted. 
    
9.3.  Decompressor operation 
    
   This section describes the steps of operation within the EPIC header 
   decompressor. 
    
9.3.1.  Step 1: Decompressing the compressed header 
    
   Each codeword is extracted from the received compressed header. 
   Starting at the root of the Huffman tree, branches are followed up 

Price et al.                                                 [PAGE 23] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   the tree and decisions are taken at each node based on the current 
   codeword. When the top of the tree is reached, the leaf node 
   indicates which compressed header format has been used. This in turn 
   gives the encoding method that has been used for every field in the 
   header. 
    
   These values are then used to find the format of the original field, 
   with which reconstruction of the original field value can occur. This 
   process continues until all codewords have been dealt with. 
    
9.3.2.  Step 2: Reconstructing the INFERRED fields 
    
   The final step at the decompressor is to reconstruct the UNCOMPRESSED 
   and INFERRED fields from the transmitted fields and/or the context. 
   In general the reconstruction process is specific to the individual 
   fields. 
    
   The normative pseudocode description of EPIC is given in Appendix A. 
    
    
10. Security Considerations 
    
   EPIC generates compressed header formats for direct use in ROHC 
   profiles. Consequently the security considerations for EPIC match 
   those of [ROHC5]. 
    
    
11. Acknowledgements 
    
   Header compression schemes from [ROHC5] have been important sources 
   of ideas and knowledge. Basic Huffman encoding [HUFF] was enhanced 
   for the specific tasks of robust, efficient header compression. 
    
      Thanks to 
    
         Carsten Bormann (cabo@tzi.org) 
         Max Riegel (maximilian.riegel@icn.siemens.de) 
    
      for valuable input and review. 
    
    
12. Intellectual Property Considerations 
    
   This proposal in is full conformity with [RFC-2026]. 
    
   Siemens may have patent rights on technology described in this 
   document which employees of Siemens contribute for use in IETF 
   standards discussions. In relation to any IETF standard incorporating 
   any such technology, Siemens hereby agrees to license on fair, 
   reasonable and non-discriminatory terms,  based on reciprocity, any 
   patent claims it owns covering such technology, to the extent such 
   technology is essential to comply with such standard. 
    
    

Price et al.                                                 [PAGE 24] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
13. References 
    
   [ROHC5]       "RObust Header Compression (ROHC)", Carsten Bormann et 
                 al, <draft-ietf-rohc-rtp-05.txt>, Internet Engineering 
                 Task Force, October 23, 2000 
    
   [HUFF]        "The Data Compression Book", Mark Nelson and Jean-Loup 
                 Gailly, M&T Books, 1995 
    
   [IMPL]        http://www.roke.co.uk/networks/epic/epic.html 
    
   [RFC-2026]    "The Internet Standards Process - Revision 3", Scott 
                 Bradner, Internet Engineering Task Force, October 1996 
    
   [RFC-2119]    "Key words for use in RFCs to Indicate Requirement 
                 Levels", Scott Bradner, Internet Engineering Task 
                 Force, March 1997 
    
    
14.  Authors' Addresses 
    
   Richard Price        Tel: +44 1794 833681 
   Email:               richard.price@roke.co.uk 
    
   Robert Hancock       Tel: +44 1794 833601 
   Email:               robert.hancock@roke.co.uk 
    
   Stephen McCann       Tel: +44 1794 833341 
   Email:               stephen.mccann@roke.co.uk 
    
   Mark A West          Tel: +44 1794 833311 
   Email:               mark.a.west@roke.co.uk 
    
   Abigail Surtees      Tel: +44 1794 833131 
   Email:               abigail.surtees@roke.co.uk 
    
   Roke Manor Research Ltd 
   Romsey, Hants, SO51 0ZN 
   United Kingdom 
    
    
   Christian Schmidt    Tel: +49 89 722 27822 
   Email:               christian.schmidt@icn.siemens.de 
    
   Siemens ICM N MR MA1  
   Munich 
   Germany 








Price et al.                                                 [PAGE 25] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
Appendix A.  EPIC Description in Pseudocode 
    
   Chapter 9 presented an outline description of the operation of the 
   EPIC scheme. This appendix goes on to describe the complete EPIC 
   scheme in pseudocode. 
    
   As stated in Section 9, there are four main aspects to EPIC: 
    
   Description of the structure of the Hierarchical Huffman (HH) tree 
   Compressor operation 
   Decompressor operation 
   Tree construction 
    
   The final section of the appendix describes certain implementation 
   issues. 
    
A.1.  Structure of the Hierarchical Huffman tree 
    
   As mentioned in Section 9, the HH tree consists of a set of nodes 
   joined by branches. Conventionally, the nodes are arranged vertically 
   with leaf nodes at the top and the root node at the bottom. 
    
   There is a single root node, which is the starting point for 
   decompression operations. There is a set of leaf nodes, which are 
   used as the starting points for compression operations. Each possible 
   compressed header format is associated with a specific leaf node. All 
   other nodes are interior nodes. 
    
   The format indicator for a leaf node is in fact just a list of 
   encoding methods, one for each field in the header. In what follows, 
   we notate the encoding for a single field as "F=E", for a field 
   called "F" using encoding method "E". The complete format indicator 
   will then look like {F1=E1, F2=E2, ...}. Although we start off in the 
   inputset with encoding methods expressed recursively, by the time the 
   tree has been constructed all the Ei are actually toolbox encoding 
   methods. 
    
   No special information is associated with any of the nodes. 
   (Information is associated with each node while the tree is being 
   constructed, but this can all be discarded once the construction is 
   complete.) All the information is contained in the tree topology (the 
   connectivity of nodes and branches) and labels on each branch. 
    
   There are three possible cases for branches between nodes at 
   different levels: 
    
   i)   A single vertical branch between a pair of nodes. Such a branch 
        is labeled {X}. 
   ii)  A pair of branches joining one upper node to two lower ones. 
        These branches are labeled {D t} for some integer t (different 
        for each branch). 
   iii) Several branches each joining one upper node to a common lower 
        node. Each branch is labeled with {t} for some integer t 
        (different for each branch). 

Price et al.                                                 [PAGE 26] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   Note that this "tree" can actually be multiply connected, i.e. there 
   may be several alternative paths between any given leaf node and the 
   root. The correct route to follow is determined by the compression 
   and decompression algorithms. 
    
   As well as the tree itself, the following input parameters are needed 
   to begin the compression and decompression process: 
    
   ALIGN:       Number of bits for alignment (i.e. size of one 
                codeword in the compressed header). 
   NPATTERNS:   Number of bit patterns available for use by EPIC in the 
                first codeword of the compressed header. 
    
A.2.  Compressor 
    
   This section describes the EPIC header compressor. 
    
A.2.1.  Step 1: Compression front end 
    
   The input to the EPIC compressor is a header from the protocol stack 
   to be compressed. The compressor maintains a number of contexts as 
   described in Section 5.1.3 of [ROHC5], and the correct context to use 
   should be selected depending on the uncompressed header. 
    
   The uncompressed header should then be divided up into fields. As per 
   [ROHC5] Section 5.7, the term hdr(X) will refer to the value of Field 
   X in the uncompressed header. 
    
   The compressor chooses a compressed header format from the possible 
   set (where each compressed header format corresponds to a different 
   leaf node in the HH tree). In general the format must be capable of 
   transmitting enough information to reconstruct the header at the 
   decompressor. If more than one suitable format is available then the 
   compressor may choose any of these. 
    
   The compressor then calculates a non-negative integer M that contains 
   all of the non-redundant information from the uncompressed header. M 
   is calculated using the format indicator for the chosen leaf node. 
   The following steps are taken: 
    
   1    Set M=0 
    
   2    From the left, search the format indicator for the next instance 
        of "READ" or "WRITE" encoding methods. 
    
        while (A new instance of "READ" or "WRITE" can be found) 
        { 
             i)      Let F denote the corresponding field and let i 
                     denote the corresponding parameter (so the format 
                     indicator contains "F=READ(LTABLE,i)" or 
                     "F=WRITE(k,LTABLE,i)" for some k,LTABLE). 
             ii)     Let M=M*i 
             iii)    Let M=M+index where index is an integer from 0 to 
                     i-1. It denotes the position of hdr(F) in the 

Price et al.                                                 [PAGE 27] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
                     lookup table and is fixed for READ but chosen by 
                     the compressor for WRITE. 
        } 
    
   3    From the left, search the format indicator for the next instance 
        of "LSB", "IRREGULAR" or "WRITE". 
    
        while (A new instance of "LSB", "IRREGULAR" or "WRITE" encoding 
        methods can be found) 
        { 
             i)      Let F denote the corresponding field and let k  
                     denote the corresponding parameter (so the format  
                     indicator contains "F=LSB(k,p)" or "F=IRREGULAR(k)  
                     or "F=WRITE(k,LTABLE,i)"). 
             ii)     Let M=M*2^k 
             iii)    Let M=M+value, where value is from 0 to 2^k - 1. 
                     For IRREGULAR and WRITE encoding it is hdr(F) in 
                     integer form. For LSB encoding it is the LSB of 
                     hdr(F) as described in Section 4.5.1 of [ROHC5]. 
        } 
    
    
   The end result is the M value for the header, along with the 
   identified starting leaf node. 
    
A.2.2.  Step 2: Compressing the uncompressed header 
    
   Working from the M value and start node, the encoding process creates 
   a sequence {codeword(i)} of codewords as follows: 
    
   1    Begin at the start node in the tree corresponding to the header 
   format. Set i=1. 
    
   2    Follow the branches of the tree downwards towards the root, 
        taking the following steps at each node: 
    
        a)      If the next branch label is {X} then: 
    
                i)      codeword(i) = M modulo 2^ALIGN, i=i+1 
                ii)     set M = integer part (M / 2^ALIGN) 
    
        b)      If there are 2 branches with labels {D N1}, {D N2} then: 
    
                i)      d = absolute difference between N1 and N2 
                ii)     if M < d then follow branch to left 
                iii)    else set M = M - d and follow right branch 
    
        c)      If the next branch label is a number {L} then: 
    
                i)      codeword(i) = L + M, i=i+1 
                ii)     set M = 0 
    



Price et al.                                                 [PAGE 28] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   3    Finish when there are no more branches to follow, set 
        NCODEWORDS=i (i.e. NCODEWORDS is the total number of codewords 
        in the compressed header). 
    
   4    Reverse the list of codewords, so codeword(1) is the last 
        codeword produced, and codeword(NCODEWORDS) is the first. 
    
   This produces a string of codeword numbers, where each codeword 
   represents ALIGN bits in the EPIC compressed header. For the most 
   likely permutations this string could contain a single codeword. 
    
   It is a property of EPIC that the following hold: 
   For i=1:                  0 <= codeword(i) < NPATTERNS 
   For i=2, ..., NCODEWORDS: 0 <= codeword(i) < 2^ALIGN 
    
   This means that the compressed header can be very simply transmitted 
   with implicit codeword alignment. The values to be placed in the 
   first codeword are limited to be numerically less than NPATTERNS, 
   which means that part of this codeword can be used for other 
   purposes. 
    
A.2.3.  Step 3: Adding the UNCOMPRESSED fields 
    
   The last step is to add the UNCOMPRESSED fields to the end of the 
   compressed header. This is accomplished as follows: 
    
   1    From the left, search the format indicator for the next instance 
        of "UNCOMPRESSED". 
    
        while (A new instance of "UNCOMPRESSED" can be found) 
        { 
             i)      Let F denote the corresponding field. 
             ii)     Add hdr(F) to the compressed header. 
        } 
    
   If the compressed header is no longer a multiple of ALIGN bits then 
   it should be padded by adding "0" bits onto the end. 
    
   The final task is to encapsulate the EPIC compressed header within a 
   standard ROHC packet as described in Section 3.2. The packet with 
   compressed header is then ready to be transmitted. 
    
A.3.  Decompressor 
    
   This section describes the EPIC header decompressor. 
    
A.3.1.  Step 1: Decompressing the compressed header 
    
   The list C=(codeword(1), ... codeword(NCODEWORDS)) can be read 
   directly from the compressed header. Note that the integer NCODEWORDS 
   (i.e. the length of the compressed header excluding UNCOMPRESSED 
   fields) is not yet known, but instead will be determined when the 
   compressed header is parsed. 
    

Price et al.                                                 [PAGE 29] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   The list C and the Huffman tree are used to determine the compressed 
   header format and the M value as follows: 
    
   1)   Set M = 0 and i = 1 
    
   2)   Begin at root of tree (single node) and follow the branches up 
        the tree taking the following steps at each node: 
    
        a)      If next branch label is {X} then: 
    
                i)      set M = M * 2^ALIGN 
                ii)     set M = M + codeword(i) 
                iii)    i=i+1 
                 
        b.)     If next branch label is {D t} then: 
    
                i)      if t = 0 just continue up the tree 
                ii)     else set M = M + t and continue up tree 
    
        c.)     If there is > 1 branch, each with a label {t} then: 
    
                i)     set c = codeword(i), i=i+1 
                ii)    find neighboring pair of branches with labels 
                       {L} and {U} such that L <= c < U (if c >= number 
                       on every branch then take L = number on branch 
                       with greatest number) 
                iii)    set M = c - L 
                iv)     continue up tree on branch labeled {L} 
     
   3)   End up at a leaf node of the tree corresponding to a particular 
        compressed header format and value M 
    
   The value M and the header format can then be provided to the 
   decompression front end of Section A.3.2. The value NCODEWORDS does 
   not need to be transmitted explicitly, since the process terminates 
   of its own accord when a leaf node of the tree is reached. 
    
A.3.2.  Step 2: Decompression front end 
    
   The decompressor begins with the non-negative integer M which is then 
   translated back into a value hdr(X) for each field X using the format 
   indicator. The following steps are taken: 
    
   1    Begin with M 
    
   2    From the right, search the format indicator for the next 
        instance of "LSB", "IRREGULAR" or "WRITE" encoding methods. 
    
        while (A new instance of "LSB", "IRREGULAR" or "WRITE" can be 
        found) 
        { 
             i)      Let F denote the corresponding field and let k 
                     denote the corresponding parameter (so the format 


Price et al.                                                 [PAGE 30] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
                     indicator contains "F=LSB(k,p)" or "F=IRREGULAR(k) 
                     or "F=WRITE(k,LTABLE,i)"). 
             ii)     Let value=M mod 2^k (so value is an integer from 0 
                     to 2^k - 1). For IRREGULAR and WRITE encoding let 
                     hdr(F) be the k bits corresponding to value. For 
                     LSB encoding let hdr(F) be context(F) with the last 
                     k LSBs at offset p replaced by value as described 
                     in Section 4.5.1 of [ROHC]. 
             iii)    M=(M-value)/(2^k). 
        } 
    
   3    From the right, search the format indicator for the next 
        instance of "READ" or "WRITE" encoding methods. 
    
        while (A new instance of "READ" or "WRITE" can be found) 
        { 
             i)      Let F denote the corresponding field and let i 
                     denote the corresponding parameter (so the format 
                     indicator reads "F=READ(LTABLE,i)" or 
                     "F=WRITE(k,LTABLE,i)" for some k,LTABLE). 
             ii)     Let index=M mod i (so index is an integer from 0 to 
                     i - 1). For READ encoding let hdr(F) be the value 
                     stored in LTABLE at the position specified by 
                     index. For WRITE encoding store the value hdr(F) in 
                     LTABLE at the position specified by index. 
             iii)    M=(M-index)/i. 
        } 
         
   4    Reading backwards from the right, search the format indicator 
        for the next field encoded as "STATIC" or "VALUE". 
    
        while (A new instance of "STATIC" or "VALUE" can be found) 
        { 
             i)      Let F denote the corresponding field (so the format  
                     indicator reads "F=STATIC" or "F=VALUE(v)" for some  
                     bit pattern v). 
             ii)     For STATIC encoding let hdr(F)=context(F). For  
                     VALUE encoding let hdr(F)=v. 
        } 
    
   The end result is that hdr(F) has been determined for all fields 
   other than UNCOMPRESSED and INFERRED fields. 
    
A.3.3.  Step 3: Reconstructing the INFERRED fields 
    
   The final step at the decompressor is to reconstruct the UNCOMPRESSED 
   and INFERRED fields from the transmitted fields and/or the context. 
    
   In general the reconstruction process is specific to the individual 
   fields (and is implemented using a separate subroutine for each). For 
   the UNCOMPRESSED fields it is necessary to know the lengths of each 
   field, so that the information can be retrieved from the end of the 
   compressed header. For the INFERRED fields the procedure for 
   calculating the actual value of the field must be known. 

Price et al.                                                 [PAGE 31] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
A.4.  Hierarchical Huffman tree construction 
    
   This section describes how EPIC converts an inputset into a new set 
   of compressed header formats. We first introduce basic concepts and 
   notation that are used in the construction of the HH tree. 
    
   An FNP table is a 3-column table which represents information about 
   the behavior of a particular data item (e.g. a single field, group of 
   fields, or a complete protocol stack). Each row of the table is 
   called a format and contains the following: 
    
   A format indicator (F): This ties the row to a particular format of 
   the data item; 
   A number of codepoints (N): This is the number of codepoints that is 
   needed to represent the compressed form of the data item when it has 
   this format (N measures the amount of non-redundant information in 
   the data item); 
   A scaled probability (P): This represents the probability of the data 
   item having this format, divided by the N. 
    
   An FNP table has associated with it a total number of codepoints, 
   CODEPOINTS(table), which is just the sum of the N values for all the 
   rows in the table. The set of data formats given in the table MUST be 
   "exhaustive", which means that the sum of N*P over the table MUST be 
   unity. This ensures that the table contains every possible format for 
   the data item. The order of the rows in the table is significant. 
    
   There are two types of FNP tables: fundamental tables and constructed 
   tables. 
    
   Fundamental FNP tables correspond to fields described by a single 
   item in the EPIC toolbox. The FNP tables for all of these types are 
   listed in the following table. Each FNP table has a single row, so 
   N*P=1 directly (and hence the FNP table is exhaustive). 
    
   +---------------------------+---------+-----------+ 
   |  F                        |    N    |     P     | 
   +---------------------------+---------+-----------+ 
   | "Field1=STATIC"           |    1    |     1     | 
   +---------------------------+---------+-----------+ 
   | "Field1=IRREGULAR(k)"     |   2^k   |   1/2^k   | 
   +---------------------------+---------+-----------+ 
   | "Field1=VALUE(v)"         |    1    |     1     | 
   +---------------------------+---------+-----------+ 
   | "Field1=READ(LTABLE,i)"   |    i    |    1/i    | 
   +---------------------------+---------+-----------+ 
   | "Field1=WRITE(k,LTABLE,i)"|  i*2^k  | 1/(i*2^k) | 
   +---------------------------+---------+-----------+ 
   | "Field1=LSB(k,p)"         |   2^k   |   1/2^k   | 
   +---------------------------+---------+-----------+ 
   | "Field1=UNCOMPRESSED"     |    1    |     1     | 
   +---------------------------+---------+-----------+ 
   | "Field1=INFERRED"         |    1    |     1     | 
   +---------------------------+---------+-----------+ 

Price et al.                                                 [PAGE 32] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   Note that the name "Field1" should be replaced by the correct name 
   for the field (e.g. "RTP Timestamp", "ESP Sequence Number" etc.). 
    
A.4.1.  Step 1: Process the EPIC inputset 
    
   The input to the packet format generator is a list of general 
   parameters and field-specific parameters, referred to as the inputset 
   (and described in Chapter 5). 
    
   The parameters used in the packet format generator stage are as 
   follows. For the general parameters: 
    
   ALIGN:       Number of bits for alignment (i.e. size of one 
                codeword in the compressed header) 
   NPATTERNS:   Number of bit patterns available for use by EPIC in the 
                first codeword of the compressed header 
   MAXFORMATS:  Maximum number of compressed header formats (optional) 
   PTHRESH:     Minimum probability threshold (optional) 
    
   In addition, we need to derive the following quantities: 
    
   AFORMATS:    Actual number of header formats used 
   PMISSING:    The total probability of all formats discarded during 
                tree FNP-table generation. (PMISSING is not actually 
                used during the tree construction, but can be useful 
                in validation checks during the process.) 
    
   The field-specific parameters are the encoding methods for the 
   headers which the profile is intended to compress. The information in 
   these encoding methods is used to make successively more complex FNP-
   tables following either of the SUM or PRODUCT styles described in the 
   next section. The end result is the top-level FNP-table which 
   describes the complete header stack for the profile. This is suitable 
   input for the HH algorithm. 
    
A.4.2.  Step 2: Resolve complex encoding methods 
    
   Constructed FNP tables are used to represent more complex data 
   behavior. Complexity is introduced either by allowing more than one 
   behavior for a single item, or by considering the behavior of 
   multiple data items. These two methods correspond to two styles for 
   constructing FNP tables, which we refer to as SUM-style or PRODUCT-
   style respectively. In each case the construction process implicitly 
   defines the order of the rows in the constructed table, based on the 
   information that describes the construction and the order of the rows 
   in each constituent table. 
    
   SUM-style construction is typically used to describe a variety of 
   possible behaviors of a single data item (e.g. 80% of the time 
   static, remaining 20% alternating between a set of values). SUM-
   style construction takes as input a sequence {TABLE(i), P(i)}, where 
   each TABLE is a known FNP table for the data item, and each P(i) is 
   the unscaled probability that the behavior of the data item being 
   described by TABLE(i) occurs (so the sum over the sequence of P(i) 

Price et al.                                                 [PAGE 33] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   is unity). The new table is constructed using the following 
   mechanism: 
    
   Initialize the constructed table to be empty 
   for each element i of the sequence {TABLE(i), P(i)} 
   { 
      for each row in TABLE(i) 
      { 
         Add a new row to the constructed table with 
               the following entries: 
             F = Format from row of TABLE(i) 
             N = corresponding entry for N from row of TABLE(i) 
             P = P(i)*corresponding entry for P from row of TABLE(i) 
      } 
   } 
    
   The number of codepoints in the constructed table is the sum of the 
   numbers of codepoints from all the constituent tables, and similarly 
   for the number of rows in the table. 
    
   The second type of construction is PRODUCT-style. This type of 
   construction is used to describe a data item composed of more than 
   one field. It takes as an input a simple sequence of {TABLE(i)}, 
   where each TABLE(i) is a known FNP table, and there are NT tables in 
   all. The new table is constructed using the following mechanism: 
    
   Initialize the constructed table to be empty 
   for each row i1 of TABLE(1) 
   { 
      for each row i2 of TABLE(2) 
      { 
         ... 
            for each row iNT of TABLE(NT) 
            { 
               Add a new row to the constructed table with the  
                     following entries: 
                   F = {Format i1 of TABLE(1), Format i2 of 
                        TABLE(2), ... , Format iNT of TABLE(NT)} 
                   N = [N(i1) of TABLE(i1)]* ... *[N(iNT) of TABLE(iNT)] 
                   P = [P(i1) of TABLE(i1)]* ... *[P(iNT) of TABLE(iNT)] 
            } 
         ... 
      } 
   } 
    
   The number of codepoints in the constructed table is the product of 
   the numbers of codepoints from all the constituent tables, and 
   similarly for the number of rows in the table. 
    
   The two types of construction steps can be repeated indefinitely to 
   build up progressively more complex data formats. The basis is always 
   the set of toolbox formats. The end result is known as the top-level 
   FNP table, which describes the whole protocol stack to be compressed. 
    

Price et al.                                                 [PAGE 34] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
A.4.3.  Step 3: Tree pruning 
    
   Rows are now pruned from the top-level FNP table and alignment is 
   enforced using the following procedure. 
    
   1. Sort the rows of the table in order of decreasing N*P. Where there 
      are ties in N*P, preserve the original ordering of the table. 
    
   2. If PTHRESH is given, discard all rows from the table for which N*P 
      < PTHRESH, and add the values of N*P to the variable PMISSING. 
 
   3. If MAXFORMATS is given, discard all rows from the table other than 
      the first MAXFORMATS-1, and add the values of N*P to PMISSING. 
      Then re-order the complete set of formats using the original 
      ordering of encodings (ignoring probability values). 
    
   4. Set AFORMATS = (number of rows remaining in the table) + 1, and 
      NPOINTS = sum of all N values for all rows in the table. 
    
   5. Create a new terminal row number AFORMAT, with  
      P = 0 
      N = (NPATTERNS - NPOINTS) mod (2^ALIGN-1) 
      The format indicator for this row is not used. 
    
   Note that in practice, it is not necessary to exhaustively generate 
   the completed table before pruning starts (i.e. some short cuts are 
   possible). However, for interoperability, care must be taken to 
   preserve the order that the standard construction method would 
   generate. 
    
   We complete this stage with a complete set of starting leaf nodes for 
   the tree, with low probability formats discarded. The nodes have 
   assigned N and P values and associated header format indicators. 
 
A.4.4.  Step 4: Creating the HH Tree 
    
   Creating a Hierarchical Huffman tree begins with a set of nodes, each 
   one having a number of codepoints (N) and a probability per codepoint 
   (P). During the creation of the tree new nodes may be created and 
   given a number of codepoints and a probability.  These will then be 
   incorporated into the tree (i.e. the new nodes are incorporated into 
   the active set out of which the tree is growing, and old nodes which 
   are already part of the tree may be removed from this active set) The 
   tree building finishes when the active set consists of a single node 
   which contains all the codepoints and has probability 1. 
    
   The format indicator of each leaf node is not used for tree 
   construction (but instead must be stored for use in compression and 
   decompression). 
    
   We denote a node as NODE(k) for j=1, 2, 3 ...  
   NODE(k) has number of codepoints N(NODE(k))=n[k] 
   NODE(k) has probability per codepoint of P(NODE(k))=p[k] 
    

Price et al.                                                 [PAGE 35] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   The initial nodes in the active set correspond to the formats from 
   the top-level FNP table: taking on their order, number of codepoints 
   and probabilities. Each codepoint corresponds to one possible header 
   within the given header format. Conventionally we place these leaf 
   nodes at the top of the tree and add new nodes working downwards. 
    
   Note that the total probability of NODE(k) is n[k]*p[k], and we start 
   with the total probability of all nodes in the active set being  
   1 - PMISSING. As new nodes are added to the set and old ones removed 
   or replaced, this property is conserved at each stage. 
    
   As the tree is built up, nodes are joined by branches. There are 
   three possible cases for branches between nodes at different levels: 
    
   i)      A single vertical branch between a pair of nodes. Such a 
           branch is labeled {X}. 
   ii)     A pair of branches joining one upper node to two lower ones. 
           These branches are labeled {D t} for some integer t 
           (different for each branch). 
   iii)    Several branches each joining one upper node to a common 
           lower node. Each branch is labeled with {t} for some integer 
           t (different for each branch). 
    
   The tree generation algorithm is as follows: 
    
   1. Set NNODES = AFORMAT. 
    
   2. Find node j such that n[j] > 0 and p[j] is minimal (if there are 
     equalities then use smallest j). If there is no such j, STOP. 
    
   3. if (n[j] is divisible by 2^a) 
      { 
         /* Case A - Figure 4 */ 
        i)  replace NODE(j) in the active set with a new node j with 
                new p[j] = 2^ALIGN * p[j] 
                new n[j] = n[j] / 2^ALIGN 
        ii) join the new NODE(j) to original NODE(j) 
        iii)label branch with {X} 
      }  
      else /* n[j] not divisible by 2^a */ 
      { 
        if (n[j] > 2^a) 
        { 
           /* Case B - Figure 5 */ 
          i) create two new nodes, a replacement NODE(j) and NODE(r) 
              with r=NNODES+1, and set: 
                 p[r] = p[j] 
                 n[r] = n[j] modulo 2^ALIGN 
                 new p[j] = p[j] 
                 new n[j] = n[j] - n[r] 
          ii) join these nodes to original NODE(j) (left branch going 
              to new NODE(r) and right branch going to NODE(j)) 
          iii)label branch to left with {D 0} and branch to right with 
              {D n[r]} 

Price et al.                                                 [PAGE 36] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
          iv) set NNODES=NNODES+1 
        } 
        else /* so n[j] < 2^a */ 
        { 
          /* Case C - Figure 6 */ 
          i)   find nodes k[x], x=1, 2, 3...,b with the following 
                properties: 
                all k[x] different and k[x] != j; 
                p[k[x]] are minimal among the available nodes and 
                increasing with x; 
                if there are ties among p[k[x]], order the k[x] affected 
                by value of k[x]; 
                b is minimal such that n[j]+n[k[1]]+...+n[k[b]]>2^ALIGN. 
          ii)  define S[i] = n[j] + sum over (1<=x<=i-1) of n[k[x]] 
          iii) create two new nodes, a replacement NODE(k[b]) and 
                NODE(r) with r=NNODES+1, and set: 
                   p[r] = p[k[b]] 
                   n[r] = 2^ALIGN - S[b] 
                   new p[k[b]] = p[k[b]] 
                   new n[k[b]] = n[k[b]] - n[r] 
          iv)  join NODE(k[b]) and NODE(r) to original NODE(k[b]) (left 
                branch going to NODE(r) and right branch going to 
                NODE(k[b])) 
          v)   label branch to left with {D 0} and branch to right with 
                {D n[r]} 
          vi)  create a replacement for NODE(j) and set  
                   new n[j] = 1 
                   new p[j] = n[j]*p[j] + n[r]*p[r] +  
                              sum(1<= x<= b-1) of n[k[x]]*p[k[x]] 
          vii) join this node to the original NODE(j), to NODE(k[1]) to 
                NODE(k[b - 1]) and NODE(r) 
          viii)label branch to original NODE(j) with {0} 
          ix)  for 1<=x<=b-1, label branch to NODE(k[x]) with {S[x]}  
          x)   label branch to node r with {S[b]} 
          xi)  remove NODE(k[1]) to NODE(k[b-1]) and NODE(r) from the  
               active set 
          xii) Set NNODES=NNODES+1 
           
        } 
      } 
       
 
   4.  Return to 2 
    
                             Former NODE(j) 
                                {p, n} 
                                   |  
                                   | {X} 
                                   | 
                                   | 
                               New NODE(j) 
                   {p[j]=p*2^ALIGN, n[j]=n/2^ALIGN} 
    
                  Figure 4 : Tree Building Case A 

Price et al.                                                 [PAGE 37] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
    
                             Former NODE(j) 
                                {p, n} 
                                  /\  
                                 /  \ 
                                /    \  
                               /      \ 
                              /        \ 
                        {D 0}/          \{D n[r]} 
                            /            \ 
                           /              \ 
                          /                \ 
                        NODE(r)          New NODE(j) 
      {p[r]=p, n[r]=n mod 2^ALIGN}     {n[j]=n-n[r], p[j]=p} 
    
                  Figure 5 : Tree Building Case B 
    
   Former        NODE(k[1])...NODE(k[b-1])       Former 
   NODE(j)              |     |                NODE(K[b]) 
    \                   |     |                   /\ 
     \                  |     |                  /  \ 
      \                 |     |           {D 0} /    \ {D n[r]} 
       \                |     |                /      \ 
        \               |     |               /        \ 
         \        {S[1]}|     |{S[b-1]}      /          \ 
          \             |     |          NODE(r)        New 
           \            |     |          {p, n}      NODE(k[b]) 
            \           |     |           /             
         {0} \          |     |          / 
              \         |     |         / 
               \        |     |        / 
                \       |     |       / 
                 \      |     |      /{S[b]} 
                  \     |     |     / 
                   \    |     |    / 
                    \   |     |   / 
                     \  |     |  / 
                      New NODE(j) 
    
                  Figure 6 : Tree Building Case C 
           (See text for N and P values of created nodes) 
    
   The HH tree is now ready for use in the compressor and decompressor. 
    
A.5.  EPIC implementation issues 
    
   This section considers some issues that arise when implementing 
   EPIC. 
 
A.5.1.  Arithmetic 
 
   The implementation of all stages of the EPIC algorithm can be 
   performed using simple arithmetic operations on integer and floating 


Price et al.                                                 [PAGE 38] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   point numbers. However, some aspects of the algorithm require care 
   when being implemented on a real microprocessor. 
    
   The compression/decompression front end works in terms of M values 
   which are described in the pseudocode as large integers. However, the 
   only operations required are very simple bit pattern manipulations: 
   - left and right shift by ALIGN bits; 
   - bitwise AND with (2^ALIGN-1); 
   - addition to, subtraction from, and comparison with "ordinary" 
   integers. 
   Given that ALIGN will typically be 8, this represents very limited 
   complexity and does not require specialized support in the data path. 
    
   The calculation of the Huffman tree is more involved, since the 
   numbers of codepoints can be large integers and the probability 
   values are conceptually floating point. The additional operations 
   required are the multiplication and addition of these numbers with 
   themselves and each other, and comparison of probability values. 
   Support for large integer multiplication is not particularly 
   problematic since it only has to be performed at tree generation time 
   (which could even be an off-line activity); however, if rounding 
   error leads to inconsistent floating point orderings then 
   interoperability may be compromised. 
    
   There are two solutions to this problem. Firstly, for a given (fixed) 
   profile it is possible to construct the tree and also propagate upper 
   bounds on the rounding error for the value at each node. It is then 
   normally possible to verify that rounding cannot have modified the 
   node selections made. This is particularly useful if tree pruning 
   measures are taken to eliminate header formats with very small 
   probabilities. Note that once the tree has been constructed the 
   probability (and indeed codepoint numbers) for the tree nodes are no 
   longer necessary, so this process can be carried out off-line if 
   necessary. 
    
   The second approach is to note that the only operations involving 
   floating point values are multiplication and addition. Therefore, if 
   the basic input probabilities (the scaled values in the fundamental 
   FNP tables of the toolbox, and P values from profiles) are expressed 
   explicitly as multiples of 1/2^m (for any m), the calculations can 
   be carried out exactly using just the large integer operations which 
   are required for the tree construction anyway. Note that scaled 
   probabilities such as 1/v for arbitrary v cannot be exactly 
   expressed in this way, which means that the summed N*P values for 
   that and derived tables will no longer be precisely unity. However, 
   this does not prevent the construction, only modifying the 
   validation checks that are appropriate at each stage. Noting that 
   this only occurs in the case of lookup-table encoding where the 
   value of v is small, profiles which adopt this approach should use 
   the convention that the probability will be taken as (exactly) 
   n/2^32, where n is the result of dividing 2^32 by v in integer 
   arithmetic. 
    
    

Price et al.                                                 [PAGE 39] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
Appendix B.  Example Profile for RTP/UDP/IPv4 
    
   This appendix describes an example profile for the compression of 
   RTP/UDP/IPv4. 
    
   Note that the probabilities listed for each encoding method are 
   initial estimates only. These need to be refined with more accurate 
   values from genuine RTP/UDP/IPv4 streams. 
    
B.1.  General parameters 
    
   ALIGN      :  8 
   NPATTERNS  :  224 
   MAXHEADERS :  1000 
    
   For simplicity, the lookup tables are initialized next to the 
   encoding methods in which they are used. 
    
B.2.  Field-specific parameters 
    
   The field-specific parameters are given below for UO-mode with a 
   sequential IP ID. 
    
   The modifications required to the inputset for a random IP ID are 
   described in Section B.3. 
    
   Each encoding method is listed in the form of a table as per Section 
   5.2. 
    
    
   RTP/UDP/IPv4-ENCODING: 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |          CRC           |        IRREGULAR(3)        |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(7)        |       1%    | 
   +------------------------+----------------------------+-------------+ 
   | Master Sequence Number |         LSB(4,-1)          |     100%    | 
   +------------------------+----------------------------+-------------+ 
   |       RTP Header       |        RTP-ENCODING        |     100%    | 
   +------------------------+----------------------------+-------------+ 
   |       UDP Header       |        UDP-ENCODING        |     100%    | 
   +------------------------+----------------------------+-------------+ 
   |   Inner IPv4 Header    |       IPv4-ENCODING        |     100%    | 
   +------------------------+----------------------------+-------------+ 
   |  Optional Extensions   |     EXTENSION-ENCODING     |     100%    | 
   +------------------------+----------------------------+-------------+ 
    
   Notes: 
    
   The term "RTP Header" is shorthand for all of the fields in the RTP 
   header. A similar note applies for the "UDP Header" name etc. 

Price et al.                                                 [PAGE 40] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   EXTENSION-ENCODING: 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |   Inner Auth Header    |        AH-ENCODING         |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(0)        |       1%    | 
   +------------------------+----------------------------+-------------+ 
   |    Inner ESP Header    |        ESP-ENCODING        |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(0)        |       1%    | 
   +------------------------+----------------------------+-------------+ 
   |    Inner GRE Header    |        GRE-ENCODING        |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(0)        |       1%    | 
   +------------------------+----------------------------+-------------+ 
   |   Outer IPv4 Header    |    OUTER-IPv4-ENCODING     |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(0)        |       1%    | 
   +------------------------+----------------------------+-------------+ 
   |   Outer Auth Header    |     OUTER-AH-ENCODING      |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(0)        |       1%    | 
   +------------------------+----------------------------+-------------+ 
   |    Outer ESP Header    |     OUTER-ESP-ENCODING     |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(0)        |       1%    | 
   +------------------------+----------------------------+-------------+ 
   |    Outer GRE Header    |     OUTER-GRE-ENCODING     |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(0)        |       1%    | 
   +------------------------+----------------------------+-------------+ 
    
   Notes: 
    
   The optional headers in the protocol stack are treated in exactly the 
   same manner as mandatory headers, except that an additional encoding 
   method is needed to delete the optional header when it is no longer 
   needed. 
    
   Further information on this topic can be found in Section 6.2.2. 
    
   The minimal encapsulation header is not considered because it is not 
   covered by [ROHC5]. 
    
    
    
    
    
    
    
    
    

Price et al.                                                 [PAGE 41] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   IPv4-ENCODING: TTL{"",""} 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |        Version         |           STATIC           |     100%    | 
   |     Header Length      |                            |             | 
   |     Reserved Flag      |                            |             | 
   |   Last Fragment Flag   |                            |             | 
   |    Fragment Offset     |                            |             | 
   |     Source Address     |                            |             | 
   |  Destination Address   |                            |             | 
   +------------------------+----------------------------+-------------+ 
   |     Packet Length      |          INFERRED          |     100%    | 
   |    Header Checksum     |                            |             | 
   |        Protocol        |                            |             | 
   +------------------------+----------------------------+-------------+ 
   |   May Fragment Flag    |           STATIC           |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(1)        |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
   |    Type of Service     |           STATIC           |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(8)        |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
   |         IP-ID          |          INFERRED          |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |          LSB(5,0)          |     0.7%    | 
   |                        +----------------------------+-------------+ 
   |                        |          LSB(8,0)          |     0.2%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(16)       |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
   |      Time to Live      |           STATIC           |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |         READ(TTL,2)        |     0.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |       WRITE(8,TTL,2)       |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
   |   Network Byte Order   |           STATIC           |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(1)        |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
    
   Notes: 
    
   The Header Length field is not static when IP options are used. The 
   inputset could be extended to handle this possibility if needed. 
    
   The inputset could also be extended to handle packet fragmentation if 
   required. 
    



Price et al.                                                 [PAGE 42] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   The Network Byte Order (NBO) field is a Miscellaneous Information 
   field (see Section 7.3). Its use is described in Section 4.5.5 of 
   [ROHC5]. 
    
   The Packet Length field is inferred from information provided by the 
   link layer. 
   The Header Checksum field is inferred by recalculating the IP 
   checksum at the decompressor. 
   The Protocol field is inferred from the next header in the chain. 
   The IP Identification field is inferred from the Master Sequence 
   Number. 
    
    
   UDP-ENCODING: 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |       Source Port      |           STATIC           |     100%    | 
   |    Destination Port    |                            |             | 
   +------------------------+----------------------------+-------------+ 
   |       UDP Length       |          INFERRED          |     100%    | 
   |      UDP Checksum      |                            |             | 
   +------------------------+----------------------------+-------------+ 
   |   UDP Checksum Value   |        UNCOMPRESSED        |     100%    | 
   +------------------------+----------------------------+-------------+ 
    
   Notes: 
    
   Even thought the UDP Checksum is always classed as INFERRED, it may 
   still be transmitted in full using the "UDP Checksum Value" field. 
   The UDP Checksum is then copied from this value. 
    
   The UDP Checksum Value field is a Miscellaneous Information field. If 
   context(UDP checksum) is zero at the compressor then the UDP Checksum 
   Value field is empty, otherwise it carries hdr(UDP Checksum) in full. 
    
   The length of the UDP Checksum Value field is inferred at the 
   decompressor from context(UDP Checksum). If this is zero then the UDP 
   Checksum Value field is empty, otherwise it is 16 bits long. 
    
   The UDP Length field is inferred from information provided by the 
   link layer. 
   The UDP Checksum field is inferred from the UDP Checksum Value field. 
   If this is empty then the UDP Checksum is zero, otherwise it is 
   copied in full from the UDP Checksum Value field. 
    
    
    
    
    
    
    
    

Price et al.                                                 [PAGE 43] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   RTP-ENCODING: 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |        Version         |           STATIC           |     100%    | 
   |        Padding         |                            |             | 
   |       Extension        |                            |             | 
   |         SSRC           |                            |             | 
   +------------------------+----------------------------+-------------+ 
   |      CSRC Counter      |          INFERRED          |     100%    | 
   |    Sequence Number     |                            |             | 
   |     RTP Timestamp      |                            |             | 
   +------------------------+----------------------------+-------------+ 
   |      Payload Type      |           STATIC           |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(7)        |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
   |       RTP Marker       |         VALUE("0")         |      90%    | 
   |           TS           +----------------------------+-------------+ 
   |                        |         TALKSPURT          |      10%    | 
   +------------------------+----------------------------+-------------+ 
   |          Tsc           |         VALUE("0")         |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |         VALUE("1")         |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
   |       TS Stride        |           STATIC           |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(8)        |    0.04%    | 
   |                        +----------------------------+-------------+ 
   |                        |       IRREGULAR(16)        |    0.03%    | 
   |                        +----------------------------+-------------+ 
   |                        |       IRREGULAR(24)        |    0.02%    | 
   |                        +----------------------------+-------------+ 
   |                        |       IRREGULAR(32)        |    0.01%    | 
   +------------------------+----------------------------+-------------+ 
   |      Time Stride       |           STATIC           |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(8)        |    0.04%    | 
   |                        +----------------------------+-------------+ 
   |                        |       IRREGULAR(16)        |    0.03%    | 
   |                        +----------------------------+-------------+ 
   |                        |       IRREGULAR(24)        |    0.02%    | 
   |                        +----------------------------+-------------+ 
   |                        |       IRREGULAR(32)        |    0.01%    | 
   +------------------------+----------------------------+-------------+ 
   |      CSRC Item 1       |      CSRC-ENCODING(1)      |     100%    | 
   +------------------------+----------------------------+-------------+ 
   |      CSRC Item 2       |      CSRC-ENCODING(2)      |     100%    | 
   +------------------------+----------------------------+-------------+ 
   |          ...           |            ...             |     ...     | 
   +------------------------+----------------------------+-------------+ 
   |      CSRC Item 16      |      CSRC-ENCODING(16)     |     100%    | 
   +------------------------+----------------------------+-------------+ 

Price et al.                                                 [PAGE 44] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   Notes: 
    
   Self-describing variable-length values are not needed by EPIC. 
    
   The TS, Tsc, TS Stride and Time Stride fields are all Miscellaneous 
   Information fields. They are used to reconstruct the RTP Timestamp at 
   the decompressor. Their use is explained in Section 4.5.3 and Section 
   4.5.4 of [ROHC5]. 
    
   The CSRC Counter field is inferred by counting the number of non-
   empty CSRC Item fields. 
   The Sequence Number field is inferred from the Master Sequence Number 
   by linear extrapolation. 
   The RTP Timestamp is inferred from the TS, Tsc, TS Stride and Time 
   Stride fields as in Section 4.5.3 and Section 4.5.4 of [ROHC5]. 
    
   TALKSPURT: 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |       RTP Marker       |         VALUE("0")         |      80%    | 
   |                        +----------------------------+-------------+ 
   |                        |         VALUE("1")         |      20%    | 
   +------------------------+----------------------------+-------------+ 
   |           TS           |        IRREGULAR(5)        |      90%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(8)        |     9.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(32)       |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
    
   Notes: 
    
   The RTP Marker and TS fields are given their own encoding method 
   because they behave in a dependent manner. 
   A talkspurt requires some LSBs of timestamp to be communicated so 
   that the correct timestamp value can be inferred from the wall clock. 
   The RTP marker is "1" at the beginning of each talkspurt, but is "0" 
   when the timestamp LSBs are resent for robustness. 
    
   CSRC-ENCODING(i): CSRCVALUEi{""} 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |      CSRC Item i       |           STATIC           | 100-a-b-c%  | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(0)        |      a%     | 
   |                        +----------------------------+-------------+ 
   |                        |     READ(CSRCVALUEi,1)     |      b%     | 
   |                        +----------------------------+-------------+ 
   |                        |   WRITE(32,CSRCVALUEi,1)   |      c%     | 
   +------------------------+----------------------------+-------------+ 

Price et al.                                                 [PAGE 45] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   Note:        a = b = c = (17 - i) / 100 
    
    
   AH-ENCODING: COMMONLENGTHS{"00","03","04","05","06"} 
                AH-SPI{"","","","",""} 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |     AH Next Header     |          INFERRED          |     100%    | 
   |      AH Reserved       |                            |             | 
   +------------------------+----------------------------+-------------+ 
   |       AH Length        |           STATIC           |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |   READ(COMMONLENGTHS,5)    |    0.09%    | 
   |                        +----------------------------+-------------+ 
   |                        |  WRITE(8,COMMONLENGTHS,5)  |    0.01%    | 
   +------------------------+----------------------------+-------------+ 
   |         AH SPI         |           STATIC           |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |       READ(AH-SPI,5)       |   0.099%    | 
   |                        +----------------------------+-------------+ 
   |                        |     WRITE(32,AH-SPI,5)     |   0.001%    | 
   +------------------------+----------------------------+-------------+ 
   |   AH Sequence Number   |          INFERRED          |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |         LSB(8,-1)          |     0.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(32)       |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
   | AH Authentication Data |        UNCOMPRESSED        |     100%    | 
   +------------------------+----------------------------+-------------+ 
    
   Notes: 
    
   The initial values for the COMMONLENGHS lookup table are given in 
   hexadecimal. They represent the lengths of the AH Authentication Data 
   field for common hashing functions such as MD5, SHA-1 and so on. 
    
   If the AH SPI field is null (i.e. 0 bits long) then the AH header is 
   not present and hence all of the UNCOMPRESSED and INFERRED fields are 
   null. Otherwise: 
    
   The length of the AH Authentication Data field is known from the AH 
   Length field. 
    
   The AH Next Header field is inferred from the next header in the 
   extension chain. 
   The AH Reserved field is "0000000000000000". 
   The AH Sequence Number field is inferred from the Master Sequence 
   Number by linear extrapolation. 
    
    
    

Price et al.                                                 [PAGE 46] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   ESP-ENCODING: PADDINGLENGTHS{0,0,0,0,0} 
                 ESP-SPI{0,0,0,0,0} 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |        ESP SPI         |           STATIC           |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |      READ(ESP-SPI,5)       |   0.099%    | 
   |                        +----------------------------+-------------+ 
   |                        |    WRITE(32,ESP-SPI,5)     |   0.001%    | 
   +------------------------+----------------------------+-------------+ 
   |  ESP Sequence Number   |          INFERRED          |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |         LSB(8,-1)          |     0.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(32)       |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
   |      ESP Padding       |          INFERRED          |   99.99%    | 
   |                        +----------------------------+-------------+ 
   |                        |        UNCOMPRESSED        |    0.01%    | 
   +------------------------+----------------------------+-------------+ 
   |   ESP Padding Length   |           STATIC           |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |   READ(PADDINGLENGTHS,5)   |    0.09%    | 
   |                        +----------------------------+-------------+ 
   |                        |  WRITE(PADDINGLENGTHS,5)   |    0.01%    | 
   +------------------------+----------------------------+-------------+ 
   |    ESP Next Header     |          INFERRED          |     100%    | 
   +------------------------+----------------------------+-------------+ 
   |     ESP Auth Data      |       IRREGULAR(96)        |     100%    | 
   +------------------------+----------------------------+-------------+ 
    
   Notes: 
    
   If the ESP SPI field is null then the ESP header is not present and 
   hence all of the UNCOMPRESSED and INFERRED fields are null. 
   Otherwise: 
    
   The length of the ESP Padding field is known from the ESP Padding 
   Length field. 
    
   The ESP Sequence Number field is inferred from the Master Sequence 
   Number by linear extrapolation. 
   The ESP Padding field has octets that are 1, 2, 3, ..., k where k is 
   the ESP Padding Length. 
   The ESP Next Header field is inferred from the next header in the 
   extension chain. 
    
    
    
    
    
    

Price et al.                                                 [PAGE 47] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   GRE-ENCODING: 
    
   +------------------------+----------------------------+-------------+ 
   |      Field Name(s)     |      Encoding Method       | Probability | 
   +------------------------+----------------------------+-------------+ 
   |    GRE Key Present     |          INFERRED          |     100%    | 
   |  GRE Seq Num Present   |                            |             | 
   |       GRE Flags        |                            |             | 
   |      GRE Version       |                            |             | 
   |      GRE Protocol      |                            |             | 
   |      GRE Checksum      |                            |             | 
   +------------------------+----------------------------+-------------+ 
   |  GRE Checksum Present  |           STATIC           |    99.9%    | 
   |  GRE Routing Present   +----------------------------+-------------+ 
   | GRE Strict Source Rte  |        IRREGULAR(1)        |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
   | GRE Recursion Control  |           STATIC           |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(3)        |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
   |   GRE Checksum Value   |        UNCOMPRESSED        |     100%    | 
   +------------------------+----------------------------+-------------+ 
   |       GRE Offset       |           STATIC           |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(0)        |     0.5%    | 
   |                        +----------------------------+-------------+ 
   |                        |       IRREGULAR(16)        |     0.5%    | 
   +------------------------+----------------------------+-------------+ 
   |        GRE Key         |           STATIC           |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(0)        |     0.5%    | 
   |                        +----------------------------+-------------+ 
   |                        |       IRREGULAR(32)        |     0.5%    | 
   +------------------------+----------------------------+-------------+ 
   |  GRE Sequence Number   |          INFERRED          |      99%    | 
   |                        +----------------------------+-------------+ 
   |                        |         LSB(8,-1)          |     0.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |        IRREGULAR(32)       |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
   |      GRE Routing       |           STATIC           |    99.9%    | 
   |                        +----------------------------+-------------+ 
   |                        |        UNCOMPRESSED        |     0.1%    | 
   +------------------------+----------------------------+-------------+ 
    
   Notes: 
    
   The GRE Checksum Value field is a Miscellaneous Information field. If 
   context(GRE Checksum) is zero at the compressor then the GRE Checksum 
   Value field is empty, otherwise it carries hdr(GRE Checksum) in full. 
    
   If the GRE Recursion Control field is null then the GRE header is not 
   present and hence all of the INFERRED fields are null. Otherwise: 
    

Price et al.                                                 [PAGE 48] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
   The length of the GRE Checksum Value field is inferred at the 
   decompressor from context(GRE Checksum). If this is zero then the GRE 
   Checksum Value field is empty, otherwise it is 16 bits long. 
   The GRE Routing field contains its own built-in length indicator. 
    
   The GRE Flags field is "00000" and the GRE Version field is "000". 
   The GRE Sequence Number field is inferred from the Master Sequence 
   Number by linear extrapolation. 
   The GRE Protocol field is inferred from the next header in the 
   extension chain. 
   The GRE Key Present and GRE Sequence Number present fields are "0" if 
   their respective fields are null, otherwise they are "1". 
   If the Checksum Present and Routing Present fields are both "0" then 
   the GRE Checksum is null, else it is inferred from the GRE Checksum 
   Value field. If this is empty then the GRE Checksum is zero, 
   otherwise it is copied in full from the GRE Checksum value field. 
    
   OUTER-IPv4-ENCODING: 
    
   Identical to IPv4-ENCODING but for fields in the outer IP header. 
   Note that the NBO field is replaced by the NBO2 field. 
    
   OUTER-AH-ENCODING: 
    
   Identical to AH-ENCODING but for fields in the outer Authentication 
   Header. 
    
   OUTER-ESP-ENCODING: 
    
   Identical to ESP-ENCODING but for fields in the outer ESP header. 
    
   OUTER-GRE-ENCODING: 
    
   Identical to GRE-ENCODING but for fields in the outer GRE header. 
    
B.3.  Changes required for a random IP ID 
    
   If the IP ID changes randomly then a new inputset can be used with 
   following encoding: 
    
   +------------------------+----------------------------+-------------+ 
   |         IP-ID          |       IRREGULAR(16)        |     100%    | 
   +------------------------+----------------------------+-------------+ 
    
   All other general and field-specific parameters are as above. Note 
   that the two inputsets define two different sets of compressed header 
   formats (one for sequential IP IDs, one for random IP IDs). 
    
   Since the change between a sequential and a random IP ID occurs only 
   rarely, there is no need to reserve space in the compressed headers 
   to indicate when to change between the two sets of compressed header 
   formats. Instead the change should be communicated using a STATIC or 
   DYNAMIC packet. 
    

Price et al.                                                 [PAGE 49] 

INTERNET-DRAFT                   EPIC                November 17, 2000 
 
Appendix C.  Proof of Optimal Efficiency 
    
   A very important property of the compressed headers built by EPIC is 
   that they are optimally efficient. This means that the average 
   compressed header size of EPIC is provably the minimum possible for 
   the chosen protocol stack. The precise statement of optimal 
   efficiency is given below: 
    
   "If every compressed header format specified by EPIC is used with 
   the claimed probability, and if the distribution of headers within 
   each format is flat, then the EPIC compressed headers are optimally 
   efficient for the chosen level of robustness and word alignment". 
    
   Note that the two caveats in the statement just ensure that EPIC is 
   programmed correctly for the protocol stack it is compressing. If 
   the input to EPIC states that Field 1 will need LSB(4,0) encoding 
   for 20% of the time then optimal efficiency is only achieved if this 
   really is the case. Similarly each of the headers which make use of 
   a compressed header format must turn up equally often, otherwise the 
   extra non-randomness could be used to compress the headers even 
   further. 
    
   The proof of optimal efficiency is straightforward in the bit-
   aligned case. Ordinary Huffman encoding works on a stream of 
   characters and provides optimal compression for each character in 
   turn as explained in [HUFF]. Since Hierarchical Huffman gives 
   semantically identical results the proof of optimality applies to it 
   as well. But EPIC treats an entire header as a single character in 
   the Hierarchical Huffman tree, so EPIC attains optimal compression 
   of every header in the chosen protocol stack. 
    
   In the general word-aligned case the compressed headers produced by 
   EPIC are optimally efficient for the chosen alignment (for example 
   in the octet-aligned case the average compressed header size is 
   minimal given that all headers must be a whole number of octets). 
   The proof can be easily adapted from the bit-aligned case. 



















Price et al.                                                 [PAGE 50]