Low utilization of on-chip cache capacity limits performance and causes energy wastages because of the long latency, the limited bandwidth, and the energy consumption associated with off-chip memory accesses. Value replication - the same value appears in multiple memory locations - is an important source of low capacity utilization. While cache compression techniques in the past manage to code frequent values densely, they trade off a high compression ratio for low decompression latency, thus missing opportunities to utilize on-chip cache capacity more effectively.
This talk presents, for the first time, a detailed design-space exploration of statistical-based cache compression. We show that more aggressive, statistical-based compression approaches, such as Huffman, that have been excluded in the past due to the processing overhead for compression and decompression, are prime candidates for cache and memory compression. We first find that the overhead of statistics acquisition to generate new codewords is low because value locality varies little over time and across applications so new encodings need to be generated rarely making it possible to off-load it to software routines. We then show that the high compression ratio obtained by Huffman-based cache compression makes it possible to enjoy the performance benefits of 4X larger last-level caches at a power consumption that is about 50% lower than 4X times larger caches.