Journal of Parallel and Distributed Computing, 2018
h i g h l i g h t s • A new content selection mechanism for Shared Last-Level Caches (SLLC) in ch... more h i g h l i g h t s • A new content selection mechanism for Shared Last-Level Caches (SLLC) in chip multiprocessor systems is proposed. • The mechanism leverages the reuse locality embedded in the SLLC request stream. • By the addition of a Reuse Detector (ReD), located in between each L2 cache and the SLLC, the mechanism discovers useless L2 evicted blocks, bypassing them. • The ReD mechanism is designed to overcome as much as possible problems affecting previous state-of-the-art proposals as low accuracy, reduced visibility window and detector thrashing.
ACM Transactions on Architecture and Code Optimization, 2013
Optimization of the replacement policy used for Shared Last-Level Cache (SLLC) management in a Ch... more Optimization of the replacement policy used for Shared Last-Level Cache (SLLC) management in a Chip-MultiProcessor (CMP) is critical for avoiding off-chip accesses. Temporal locality, while being exploited by first levels of private cache memories, is only slightly exhibited by the stream of references arriving at the SLLC. Thus, traditional replacement algorithms based on recency are bad choices for governing SLLC replacement. Recent proposals involve SLLC replacement policies that attempt to exploit reuse either by segmenting the replacement list or improving the rereference interval prediction. On the other hand, inclusive SLLCs are commonplace in the CMP market, but the interaction between replacement policy and the enforcement of inclusion has barely been discussed. After analyzing that interaction, this article introduces two simple replacement policies exploiting reuse locality and targeting inclusive SLLCs: Least Recently Reused (LRR) and Not Recently Reused (NRR). NRR has t...
Characterization of Apache web server with Specweb2005
Proceedings of the 2007 workshop on MEmory performance DEaling with Applications, systems and architecture - MEDEA '07, 2007
Page 1. Characterization of Apache web server with Specweb2005 ∗ Ana Bosque DAC Universitat Polit... more Page 1. Characterization of Apache web server with Specweb2005 ∗ Ana Bosque DAC Universitat Politècnica de Cataluyna [email protected] Pablo Ibañez DIIS Universidad de Zaragoza [email protected] Víctor Viñals DIIS Universidad de Zaragoza [email protected] ...
SPEC CPU is one of the most common benchmark suites used in computer architecture research. CPU20... more SPEC CPU is one of the most common benchmark suites used in computer architecture research. CPU2017 has recently been released to replace CPU2006. In this paper we present a detailed evaluation of the memory hierarchy performance for both the CPU2006 and single-threaded CPU2017 benchmarks. The experiments were executed on an Intel Xeon Skylake-SP, which is the first Intel processor to implement a mostly non-inclusive last-level cache (LLC). We present a classification of the benchmarks according to their memory pressure and analyze the performance impact of different LLC sizes. We also test all the hardware prefetchers showing they improve performance in most of the benchmarks. After comprehensive experimentation, we can highlight the following conclusions: i) almost half of SPEC CPU benchmarks have very low miss ratios in the second and third level caches, even with small LLC sizes and without hardware prefetching, ii) overall, the SPEC CPU2017 benchmarks demand even less memory hierarchy resources than the SPEC CPU2006 ones, iii) hardware prefetching is very effective in reducing LLC misses for most benchmarks, even with the smallest LLC size, and iv) from the memory hierarchy standpoint the methodologies commonly used to select benchmarks or simulation points do not guarantee representative workloads.
Journal of Parallel and Distributed Computing, 2018
h i g h l i g h t s • A new content selection mechanism for Shared Last-Level Caches (SLLC) in ch... more h i g h l i g h t s • A new content selection mechanism for Shared Last-Level Caches (SLLC) in chip multiprocessor systems is proposed. • The mechanism leverages the reuse locality embedded in the SLLC request stream. • By the addition of a Reuse Detector (ReD), located in between each L2 cache and the SLLC, the mechanism discovers useless L2 evicted blocks, bypassing them. • The ReD mechanism is designed to overcome as much as possible problems affecting previous state-of-the-art proposals as low accuracy, reduced visibility window and detector thrashing.
ACM Transactions on Architecture and Code Optimization, 2013
Optimization of the replacement policy used for Shared Last-Level Cache (SLLC) management in a Ch... more Optimization of the replacement policy used for Shared Last-Level Cache (SLLC) management in a Chip-MultiProcessor (CMP) is critical for avoiding off-chip accesses. Temporal locality, while being exploited by first levels of private cache memories, is only slightly exhibited by the stream of references arriving at the SLLC. Thus, traditional replacement algorithms based on recency are bad choices for governing SLLC replacement. Recent proposals involve SLLC replacement policies that attempt to exploit reuse either by segmenting the replacement list or improving the rereference interval prediction. On the other hand, inclusive SLLCs are commonplace in the CMP market, but the interaction between replacement policy and the enforcement of inclusion has barely been discussed. After analyzing that interaction, this article introduces two simple replacement policies exploiting reuse locality and targeting inclusive SLLCs: Least Recently Reused (LRR) and Not Recently Reused (NRR). NRR has t...
Characterization of Apache web server with Specweb2005
Proceedings of the 2007 workshop on MEmory performance DEaling with Applications, systems and architecture - MEDEA '07, 2007
Page 1. Characterization of Apache web server with Specweb2005 ∗ Ana Bosque DAC Universitat Polit... more Page 1. Characterization of Apache web server with Specweb2005 ∗ Ana Bosque DAC Universitat Politècnica de Cataluyna [email protected] Pablo Ibañez DIIS Universidad de Zaragoza [email protected] Víctor Viñals DIIS Universidad de Zaragoza [email protected] ...
SPEC CPU is one of the most common benchmark suites used in computer architecture research. CPU20... more SPEC CPU is one of the most common benchmark suites used in computer architecture research. CPU2017 has recently been released to replace CPU2006. In this paper we present a detailed evaluation of the memory hierarchy performance for both the CPU2006 and single-threaded CPU2017 benchmarks. The experiments were executed on an Intel Xeon Skylake-SP, which is the first Intel processor to implement a mostly non-inclusive last-level cache (LLC). We present a classification of the benchmarks according to their memory pressure and analyze the performance impact of different LLC sizes. We also test all the hardware prefetchers showing they improve performance in most of the benchmarks. After comprehensive experimentation, we can highlight the following conclusions: i) almost half of SPEC CPU benchmarks have very low miss ratios in the second and third level caches, even with small LLC sizes and without hardware prefetching, ii) overall, the SPEC CPU2017 benchmarks demand even less memory hierarchy resources than the SPEC CPU2006 ones, iii) hardware prefetching is very effective in reducing LLC misses for most benchmarks, even with the smallest LLC size, and iv) from the memory hierarchy standpoint the methodologies commonly used to select benchmarks or simulation points do not guarantee representative workloads.
Uploads
Papers by P. Ibáñez