Comparative Study on Data Compression
Techniques in Cache to Promote Performance
G. D. Kesavan1 Dr. A. Sivanesh Kumar2
Asst. Prof., Dept. of CSE, Asst. Prof., Dept. of CSE,
Vel Tech Rangarajan Dr. Sagunthala R & D Vel Tech Rangarajan Dr. Sagunthala R & D
Institute of Science & Technology, Avadi Institute of Science & Technology, Avadi
[email protected] [email protected] Abstract—Cache memory is a small, faster memory
used to store recently accessed data or neighboring data
item, so that the data need not be acquired from next
level memories. The use of cache memory reduces the overall
memory access time. As cache memory is related to
memory speeding up process, the caching process should
consume less time. To speed up caching process, there are
many cache optimizations available. Some of the cache
optimization process are reducing miss rate, reducing miss
penalty, reducing the time to hit in the cache etc. Recent Fig. 1 BDI compressed cache line. The base item 00000750 is related to
advancement paved way for compressing data in cache. adjacent data items. So only their difference 1, 2, 3 are only represented
Through data compression in cache more data resides in cache as deltas. While reconstruction the addition of delta value with
resulting in more hit ratio. There are many cache related corresponding bases will give actual data.
compression and optimization techniques available. The paper
deals with some recent techniques of compression. B. Frequency pattern compression method
Frequent pattern compression(FPC) finds redundant
Keywords—compression; cache based optimization; data among a set of data. Like data having full zeros
memory utilization in many bytes, full ones or similar repeated pat- terns and
are represented by encoded bits like single zero, single
I. INTRODUCTION one etc.
Memory Compression is a process of finding Usually Frequent pattern compression uses a table
similarities between data storage and representing having three bits to represent commonly occurring pat-
them optimally to gain memory with easier access. terns. For such redundant data, the table is referred and
When data is stored in a compressed way, it needs to be corresponding values are substituted while decompression.
de- compressed to read its actual contents. Compression [5]
can be done at memory as well as at cache level. Memory
compression in main memory helps to reduce the size
or number of paging requests while that in cache helps
to reduce the number of line requests. The advantages of
memory compression are a) it grows capacity of memory,
b) utilizes memory effectively, c) helps re- duce page fault
rate. As many spatially local data can be stored in the
single data space, d)the method reduces energy and
bandwidth demands. The complexity in accessing data
includes additional encoding and decoding delay.
The paper is organized by describing Need for
Compression, Some of Compression Methodologies, Pre- Fig. 2 FPC method where repeated runs of 0’s or 1’s are noted and their
fetching Cache blocks, Changes in Tag and Data storage, code words are rep- resented in bits instead of actual data. Here three
Compression related Meta data storage, Methods of bits are used for encoding. [1]
encountering Write Backs, Adding Error Correcting Codes C. Data bus inversion scheme
(ECC). At certain places the storage of complement of the data
is simpler than its actual data. Complement refers to
II. COMPRESSION METHODOLOGIES representing zeros for ones and vice versa. For such cases
A. Base Delta Immediate compression where the inversion of the data is easy and less
The Base Delta Immediate (BDI) compression is energy consuming to store then making note of this in-
used where some of the bits of data are in common version in inversion bit. [1] For Example to represent
with data of adjacent memory location. The base contains 00000000 followed by 11111111, using DBI we represent
the initial location wholly and all other adjacent locations 11111111 0 11111111 1.
are represented by their differences from the base.
Meta data along with compressed data indicates how
many deltas are available for decompressing purpose. The
BDI has many variations in usage based on the number of
base bits. [1] [2] [3] [4]
978-1-5386-9543-2/19/$31.00 ©2019 IEEE
Fig. 3 DBI Scheme is a scheme where adjacent data items with change
in value from 0 to 1 or 1 to 0 are avoided by mentioning either 0 or
1 and note their change in inversion bit so that it is interpreted in its
original form. [1]
D. Most significant bit compression Fig. 5 Run Length Encoding method. [5]
Most Significant Bit (MSB)compression checks forF. Text compression
redundant most significant bit within the block. The The ASCII encoding scheme is a 7 bit encoding
matched number of bits are represented only once. They scheme having 128 possible characters. The eighth bit is
are like base in BDI. The rest of the bits are rep- resented meant for parity. Usually in 16 bit representations for an 8
like deltas in BDI. The bits are concatenated while bit text data, the most significant bits are filled with zeros.
decompression instead of adding as on BDI. The This most significant bit can be avoided just by indicating
advantage of concatenation over addition is that con- that the data is a text. Through the indicator the method
catenation is a less complex operation than addition. avoids half the memory space al- located. So in that case,
The method can be used to represent floating point values. data is read every bite. [5]
The sign bit is ignored while comparing two numbers. [5] G. Stream based compression
Compressing and appending a cache line to buffer
stream such that it would be the expected line in near
future is stream based compression methodology. For this
there must be appropriate buffer space. [6]
Fig. 4 MSB compression compressed cache line where adjacent data
items are closely related in MSB, 0000 and are represented only once.
E. Run length encoding
Run length encoding groups repeated zeros or ones in a
word. It has a meta data at block start denoting a)
zero or one that repeats, b) the number of bytes zero or
one repeats and c) the 5 bit starting address. Here in
this scheme only two to three bytes repeated runs are
noted. When compared the run length encoding is
better than frequency pattern compression as frequency Fig. 6 Text compression where MSB does not contain any information
pattern compression might need at least one bit for and can be omitted. [5]
each repeated byte but for run length encoding it SC2
does not. Figure 5 shows the compression by run length SC2 uses Huffman Encoding to encode frequent
encoding where the compressed data 22 is obtained by the values. The number of bits is log (n) for n distinct
following computation. Most significant bit(MSB) is 0 elements. Integers are compressed by SC2 in Hycomp
as run data is 0. The next bit to MSB is 1 to easily. For floating point values, there is value locality
indicate if 3 bytes run. For 2 bytes run, it is 0. The next in mantissa similar to exponent. They are represented
set of bits indicate offset. [5] separately and combined when decom- pressed. SC2
is a variable length encoding scheme unlike FPC
which is fixed length encoding scheme.
K. Compaction
Across blocks data with similar higher ordered bits are
combined to single block representing only their
changed bit positions. Appropriate mapping should be
changed so that the available block should not be
treated as a miss. Re-compaction includes
concatenating or adding the whole bit positions as per
their positions during requests. Figure 10 shows
the process of compaction. The difference between
compression and compaction is that Compression is within
cache block but compaction is across blocks. [4]
Fig. 7 Compression SC2, using variable length encoding.
I. Zero content augmented cache compression
Usually data storage in memory is embedded with null
blocks. The null block can be replaced with a single bit.
Null block is represented more compactly in Zero Content
Augmented Cache compression than SC2 (Huffman
Encoding). As Huffman coding is based on frequency
of data occurrences or repetition of characters. Also
the number of bits is log(n) for n distinct characters
in Huffman Encoding [3]. Fig. 10: The process of compaction across blocks. [4]
L Dictionary based compression
Fig. 8 Zero content Augmented cache compression where 0
repetitions are represented in a simpler manner. [5]
J. Floating point-Hycomp
Fig. 11 Compression by Dictionary Encoding. The dictionary is of
variable sized from 256b to 32 bit hash. If the values in dictionary if
matches the 256b wholly, then the encoded bits are rep- resented in
place of them. Else leaving MSB, the data is checked if it matches.
Else it goes on till 32 bit hashes. Whenever the data matches the coded
bits are represented
The repeated data within a cache line is taken and
placed inside dictionary, a small referential memory space.
The number of distinct dictionary elements if n, then
number of bits representing the data is log (n). There are
many variations in dictionary based compression.
CPACK uses a dictionary look up for small and repeated
Fig. 9 Floating point numbers compression by
FP-H [3] values. Lempel–Ziv–Markov chain Algorithm (LZMA)
Floating point-Hycomp (FP-H) is used for floating is software implemented dictionary based compression.
point compression. Usually exponent field alone is FP-H-Dictionary uses a light dictionary based
compressed. But here we convert the floating point compression faster than Huffman Encoding. [3]
number into smaller parts and treat them as if they are M. Decoupled Compressed Cache
integers. Thus compression of floating point is done by Decoupled compressed cache uses super block tags
breaking them into four components a) sign, b) exponent, where one tag tracks to 4 sub blocks. And the method uses
c) mantissa high and d) mantissa low. Then com- pressing decoupled tag data mapping to allow them store anywhere
using Huffman Encoding individually. Figure 9 shows in the set. Re-compaction is handled by additional
how a floating point data is split and Huffman Encoding is pointers and some area to maintain map- ping. [4]
performed individually and then combined together. [3] N. CPACK+Z
CPACK+Z compression is used which maintains
the dictionary for repeated patterns and a pointer
points to which data item from dictionary it belongs needed for the line. This help in effective band- width
to. The term Z denotes a single bit is represented when the utilization. Meta data is stored in physical memory and is
data is completely zero. Base delta immediate cached by Meta data cache at memory controller. Every
compression is used when upper half of the data chunk dynamic random access memory contain 127 data lines
have repetition but not the whole chunk. Dictionary and 1 Meta data line in overflow row. The granularity of
Sharing (DISH) uses CPACK+Z dictionary for data transfer is made as per compression factor to achieve
compression and are shared across blocks for compaction. energy efficiency. The spare space gained by compression
[7] [8] is used to store ECC and encoding information. [1]
B. ECC Meta data
Fig. 8: Compression by large block encoding
Fig. 13 ECC for uncompressed data. [5] Holds details about a data if
There are two schemes of representation in DISH. compressed or not is
Scheme I is used for cases where the number of similar Stored as Meta data in COP. For uncompressed data,
data contents across multiple contiguous cache blocks is the ECC of them can be stored in memory. So to protect
less. Scheme II is used for cases where the number of uncompressed blocks, a small memory is allocated to hold
repeated upper data bits across multiple contiguous cache ECC which is called as COP-ER. The meta data about
blocks are more. [8] ECC are represented by pointers to ECC block. The
O. Large Block Encoding ECC block contain valid bit, displaced data and ECC
The method is similar to dictionary based check bits. The ECC region blocks can be cached for
compression using variable sized hashes but needs to performance improvement. The validity in- formation
be mapped using Line Map Table. denotes if the data is dirty or not. COP-ER can eliminate
in-compressible alias problem. Figure 13 shows the ECC
III. PRE-FETCHING CACHE BLOCKS for uncompressed line where a separate memory is
Pre-fetching is the process of bringing spatially local allocated containing valid bit, data and a pointer to ECC
blocks or temporally local blocks into memory before its region. [1] [5]
request. The prefetching of blocks are done if they exhibit Log Meta data
spatial locality or temporal locality. Spatially local data The compressed tag and data are stored in the block.
are those data adjacent to currently accessed one. The log information regarding when the block is
Temporally local data are those data which referenced accessed recently is used in eviction purpose and also
now will have capability to be accessed in near future. for knowing frequent accesses. This information is
Ninety percent of data access is already accessed one. maintained in log Meta data. [6]
This avoids additional latency waiting for data from lower Re-reference interval prediction data
memory. There are three types of pre- fetching. a) Re-reference interval prediction in CAMP uses M-
Stream pre-fetcher, b) Stride pre-fetcher and c) The spatial bit saturating counter for a cache block. The re-
memory streaming. reference prediction value predicts the block’s re-
Stream pre-fetcher is for pre-fetching repeatedly reference distance. Less distance indicates near local. The
accessed streams. Stride pre-fetcher gains knowledge distance is incremented when searching for a block is a
of common stride distance between accessed data miss. For a cache hit its value is made zero. Blocks are
previously and pre-fetches next access using last used inserted with value 2M –2. [2]
ad- dress. The spatial memory streaming is used toE. The minimal value eviction data
identify spatially local data based on history of access pat- The minimal value eviction in CAMP is based on a
terns by program counter. The pre-fetched data can be value Vi, computed from likelihood of future reference, pi,
decompressed if they are compressed. [4] and compressed block size, is. Vi = pi/is where, i is any
compressed block. The higher value is assigned to blocks
IV. TAG DETAILS AND DATA LOCATION which are more likely and small in size. Also more
For some compression methodology, the indexing compressible block is considered valuable than less
in the memory access changes after compression, The compressible blocks. [2]
changes are in tag and in data location. [3] [4] [5] [6]
[7] [8]
V. META DATA STORAGE
A. Meta data cache
Compression Meta data, is stored in uncompressed
format. The Meta data tells exactly how many bursts are
F. Data type related data The encoding bit 0 for scheme I and 1 for
scheme II in don’t care place of YACC, DISH is used to
differentiate scheme I and scheme II. [8]
VI. IN-BLOCK EXPANSION
The write back on compressed block, if it exceeds
previously allocated space after compression, creates
relocation problem. If there is no need of additional
space or no change in alignment and address, the space
after compression and ECC compacting is used for in-
block expansion. If the space is not enough, a separate
block in physical memory is used for such purpose, in-
validating the current block. [2]
VII. ERROR CORRECTING CODES
Fig. 14 Data type classification process based on Inspection points (IP).
Inspections points are certain numbers of bits that can be checkedA. Embedded error correcting code [ECC]
to categorize data types. The IP for floating point is 7 bits from Embedded error checking code avoids extra chip to
exponent. The IP for null blocks is a chunk. The IP for integers, pointers store ECC. The ECC is stored along with data with a
are 4 most significant bits. [3] separation, as memory storage space gets reduces.
The Meta data in Comp stores information about the Starting location of every block is same as that of
data type in tag store. Count information is used to find if uncompressed block and ECC is placed at the
the chunks to be compressed are done. The first process is increments of eight bytes(at the end of the row).
the data type classification. The appropriate count values This avoids look up complexity. [1] [2]
are increased whenever that specific data types are
encountered. And the appropriate data type count values TABLE I SHOWS PERFORMANCE IMPROVEMENT AND ENERGY
are decremented when the chunks are compressed. [3] CONSUMPTION REDUCTION BY VARIOUS COMPRESSION SCHEMES
G. Value frequency table data Compression Performance Energy Consumption
Schemes Improvement Reduction
Value frequency table holds the value frequency
Memzip 47% 57%
statistics associated with exponent and mantissa in FP- COP - -
H. The frequency of a particular data value is the number MORC 37% 17%
of times the data value occurs. This information is useful CAMP 4.7% 7.2%
when adjacent data have similar exponent fields. [3] Hycomp - 39%
H. Compression factor Meta data PBC 11.1% 5.8%
YACC 8% 6%
Encoding information about base size and word size DISH 7.2% 6%
are stored in the block itself in pre-fetched blocks cache.
This information is used for compacting similar com- TABLE II SHOWS BENCHMARK SUITS USED BY COMPRESSION SCHEMES
pressed blocks together to avoid wastage of memory Compression
Benchmarks Used
Schemes
space due to fragmentation. [4] [7]
Memzip SPEC 2K6, PARSEC, NAS parallel
I. Super block tag Meta data Benchmarks, Cloud suite
Super block tags in YACC contains meta data COP SPEC 2K6, PARSEC
which says which blocks of super block containing the MORC SPEC 2K6
data are stored in corresponding data entry. Each super CAMP SPEC 2K6
Hycomp SPEC 2K6, NAS
block tags contains super block tag address, coherent
PBC SPEC 2K, SPEC 2K6
states, validity bits and compression factor. The index bits YACC SPEC OMP, PARSEC, SPEC CPU 2006
are used to find which blocks are compressed thus DISH SPEC 2K6, CRONO
enabling non adjacent blocks with similar super blocks
to be compacted together. Super block tags is used to TABLE III SHOWS ADVANTAGES OF VARIOUS COMPRESSION
reduce tag area overhead. YACC associates one tag entry Compression
Advantages
per data entry. So the tag data mapping is direct. The Scheme
Memzip Rank Sub-setting and Placing compressed blocks in
super-block tag entry is based on different compression less complex way.
factors (CF). When CF is 1, only one data resides in them. COP 93% Error rate reduction.
So it contains only its block id, coherence state and super MORC Log based cache organization to compress and store
block. The CF of 1, 2 and 4 are encoded as 00, 01 and 1x closely accessed data.
CAMP Size based cache organization and stores similar
respectively where x is don’t care. [7] [8] sized data closely. 8.7% Off-chip bandwidth
J. Valid and coherence bits consumption reduction.
A valid bit and a coherence bit is associated with Hycomp Data type based compression scheme. 80% best in
each block of compacted data in MORC, YACC and compression method selection.
DISH to indicate the change of data during write back PBC Prefetched temporal data and compact whole cache
line than compressing blocks.
operation. Practical implementation needs registers to YACC Variable sized compacting of blocks using super
indicate dictionary elements and flags for conditional block tags.
execution. [6] [7] [8] DISH Dictionary based caching. Avoids memory wastage
K. Scheme bit due to internal fragmentation.
VIII CONCLUSION International Symposium on Microarchitecture, MICRO-48, pages
38–49, New York, NY, USA, 2015. ACM.
From the above analysis and comparison, compression
[4] K. Raghavendra, B. Panda, and M. Mutyam. Pbc: Prefetched
and compaction promotes system Performance and blocks compaction. IEEE Transactions on Computers, 65(8):2534–
reduces Energy, Bandwidth Consumption. 2547, Aug 2016.
[5] David J. Palframan, Nam Sung Kim, and Mikko H. Lipasti. Cop:
To compress and protect main memory. In Proceedings of the
REFERENCES
42nd Annual International Symposium on Computer Architecture,
[1] A. Shafiee, M. Taassori, R. Balasubramonian, and A. Davis.
ISCA’15, pages 682–693, New York, NY, USA, 2015. ACM.
Memzip: Exploring unconventional benefits from memory
[6] Tri M. Nguyen and David Wentzlaff. Morc: a manycore-
compression. In 2014 IEEE 20th International Symposium on
oriented compressed cache. In Proceedings of the 48th International
High Performance Computer Architecture (HPCA), pages 638–
Symposium on Microarchitecture, pages 76–88, 12 2015.
649, Feb 2014.
[7] Somayeh Sardashti, Andre Seznec, and David A. Wood. Yet
[2] G. Pekhimenko, T. Huberty, R. Cai, O. Mutlu, P. B. Gibbons,
another compressed cache: A low-cost yet effective
M. A. Kozuch, and T. C. Mowry. Exploiting compressed block
compressed cache. ACM Trans. Archit. Code Optim. 13(3): 27:1–
size as an indicator of future reuse. In 2015 IEEE 21st
27:25, September 2016.
International Symposium on High Performance Computer
[8] B. Panda and A. Seznec. Dictionary sharing: An efficient cache
Architecture (HPCA), pages 51–63, Feb 2015.
compression scheme for compressed caches. In 2016 49th Annual
[3] Angelos Arelakis, Fredrik Dahlgren, and Per Sten strom. Hycomp:
IEEE/ACM International Symposium on Microarchitecture
A hybrid cache compression method for selection of data-type-
(MICRO), pages 1–12, Oct 2016.
specific compression methods. In Proceedings of the 48th