0% found this document useful (0 votes)
19 views43 pages

Comparch Fall2020 Lecture14 Simulation

hardware simulation methodology

Uploaded by

lijianing1024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views43 pages

Comparch Fall2020 Lecture14 Simulation

hardware simulation methodology

Uploaded by

lijianing1024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Computer Architecture

Lecture 14: Simulation


(with a Focus on Memory)

Prof. Onur Mutlu


ETH Zürich
Fall 2020
12 November 2020
Simulating (Memory)
Systems

2
Evaluating New Ideas
for New (Memory)
Architectures
Potential Evaluation Methods
 How do we assess how an idea will affect a target
metric X?

 A variety of evaluation methods are available:

 Theoretical proof

 Analytical modeling/estimation

 Simulation (at varying degrees of abstraction and


accuracy)

 Prototyping with a real system (e.g., FPGAs)

4
 Real implementation
The Difficulty in Architectural
Evaluation
The answer is usually workload dependent
 E.g., think caching
 E.g., think pipelining
 E.g., think any idea we talked about (RAIDR, Mem.
Sched., …)

 Workloads change

 System has many design choices and parameters


 Architect needs to decide many ideas and many
parameters for a design
 Not easy to evaluate all possible combinations!

 System parameters may change


5
Simulation: The Field of Dreams
Dreaming and Reality
 An architect is in part a dreamer, a creator

 Simulation is a key tool of the architect


 Allows the evaluation & understanding of non-existent
systems

 Simulation enables
 The exploration of many dreams
 A reality check of the dreams
 Deciding which dream is better

 Simulation also enables


 The ability to fool yourself with false dreams
7
Why High-Level Simulation?
 Problem: RTL simulation is intractable for design
space exploration  too time consuming to design
and evaluate
 Especially over a large number of workloads
 Especially if you want to predict the performance of a
good chunk of a workload on a particular design
 Especially if you want to consider many design choices
 Cache size, associativity, block size, algorithms
 Memory control and scheduling algorithms
 In-order vs. out-of-order execution
 Reservation station sizes, ld/st queue size, register file
size, …
 …

 Goal: Explore design choices quickly to see their


8
impact on the workloads we are designing the
Different Goals in Simulation
 Explore the design space quickly and see what you
want to
 potentially implement in a next-generation platform
 propose as the next big idea to advance the state of the art
 the goal is mainly to see relative effects of design decisions

 Match the behavior of an existing system so that


you can
 debug and verify it at cycle-level accuracy
 propose small tweaks to the design that can make a
difference in performance or energy
 the goal is very high accuracy

 Other goals in-between:


 Refine the explored design space without going into a
full detailed, cycle-accurate design
 Gain confidence in your design decisions made by 9
Tradeoffs in Simulation
 Three metrics to evaluate a simulator
 Speed
 Flexibility
 Accuracy

 Speed: How fast the simulator runs (xIPS, xCPS,


slowdown)
 Flexibility: How quickly one can modify the simulator
to evaluate different algorithms and design choices?
 Accuracy: How accurate the performance (energy)
numbers the simulator generates are vs. a real
design (Simulation error)

 The relative importance of these metrics varies


depending on where you are in the design process 10
Trading Off Speed, Flexibility,
Accuracy
Speed & flexibility affect:
 How quickly you can make design tradeoffs

 Accuracy affects:
 How good your design tradeoffs may end up being
 How fast you can build your simulator (simulator design
time)

 Flexibility also affects:


 How much human effort you need to spend modifying
the simulator

 You can trade off between the three to achieve


design exploration and decision goals
11
High-Level Simulation
 Key Idea: Raise the abstraction level of modeling to
give up some accuracy to enable speed & flexibility
(and quick simulator design)

 Advantage
+ Can still make the right tradeoffs, and can do it quickly
+ All you need is modeling the key high-level factors,
you can omit corner case conditions
+ All you need is to get the “relative trends”
accurately, not exact performance numbers

 Disadvantage
-- Opens up the possibility of potentially wrong decisions
-- How do you ensure you get the “relative trends”
accurately? 12
Simulation as Progressive
Refinement
High-level models (Abstract, C)
 …
 Medium-level models (Less abstract)
 …
 Low-level models (RTL with everything modeled)
 …
 Real design

 As you refine (go down the above list)


 Abstraction level reduces
 Accuracy (hopefully) increases (not necessarily, if not
careful)
 Flexibility reduces; Speed likely reduces except for real
design 13
Making The Best of
Architecture
A good architect is comfortable at all levels of
refinement
 Including the extremes

 A good architect knows when to use what type of


simulation
 And, more generally, what type of evaluation method

 Recall: A variety of evaluation methods are


available:
 Theoretical proof
 Analytical modeling
 Simulation (at varying degrees of abstraction and
accuracy)
 Prototyping with a real system (e.g., FPGAs) 14
An Example Simulator

15
Ramulator: A Fast and
Extensible DRAM Simulator
[IEEE Comp Arch Letters’15]

16
Ramulator Motivation
 DRAM and Memory Controller landscape is changing
 Many new and upcoming standards
 Many new controller designs
 A fast and easy-to-extend simulator is very much needed

17
Ramulator
 Provides out-of-the box support for many DRAM
standards:
 DDR3/4, LPDDR3/4, GDDR5, WIO1/2, HBM, plus new
proposals (SALP, AL-DRAM, TLDRAM, RowClone, and
SARP)
 ~2.5X faster than fastest open-source simulator
 Modular and extensible to different standards

18
Case Study: Comparison of DRAM
Standards

Across 22
workloads,
simple CPU
model

19
Ramulator Paper and Source
Code
Yoongu Kim, Weikun Yang, and Onur Mutlu,
"Ramulator: A Fast and Extensible DRAM Simu
lator"

IEEE Computer Architecture Letters (CAL), March


2015.
[Source Code]

 Source code is released under the liberal MIT


License
 https://github.com/CMU-SAFARI/ramulator

20
Bonus Assignment as Part of
HW
 #4
Review the Ramulator paper
 Same points as any other BONUS review in HW #4

21
An Example Study using
Ramulator

22
An Example Study with
Ramulator

and Onur Mutlu,


(I)
Saugata Ghose, Tianshi Li, Nastaran Hajinazar, Damla Senol Cali,

"Demystifying Workload–DRAM Interactions: An Experimental St


udy"

Proceedings of the
ACM International Conference on Measurement and Modeling of Comput
er Systems
(SIGMETRICS), Phoenix, AZ, USA, June 2019.
[Preliminary arXiv Version]
[Abstract]
[Slides (pptx) (pdf)]
[MemBen Benchmark Suite]
[Source Code for GPGPUSim-Ramulator]

23
Why Study Workload–DRAM Interactions?
 Manufacturers are developing many new
types of DRAM
• DRAM limits performance, energy improvements:
new types may overcome some limitations
• Memory systems now serve a very diverse set of
applications:
can no longer take a one-size-fits-all approach

 So which DRAM type works best with which


application?
• Difficult to understand intuitively due to the complexity of
the interaction
• Can’t be tested methodically on real systems: new type
needs a new CPU

 We perform a wide-ranging experimentalPage 24 of 25


Modern DRAM Types: Comparison to DDR3

Low-  Bank groups


Banks Bank 3D-
DRAM
per Group Stack
Type Power Bank Group Bank Group
Rank s ed
Bank Bank Bank Bank
DDR3 8
DDR4 16  increased latency
GDDR5
GDDR5 16
16 
 increased area/power memory channel
HBM
HBM
High- 16 
High-
Bandwidth
Bandwidth
16   3D-stacked
Memory high bandwidth with
Memory DRAM Through-Silicon
HMC narrower rows,
HMC
Hybrid higher latency Vias (TSVs)
Hybrid
Memory
256
256 

Memory
Cube
Cube Memory
Wide I/O 4   Layers
Wide I/O 4  
Wide I/O
Wide I/O 8  
2 8   dedicated Logic Layer
2
LPDDR3 8  Page 25 of 25
4. Need for Lower Access Latency: Performance
 New DRAM types often increase access
latency in order to provide more banks,
higher throughput
 Many applications can’t make up for the
increased
1.2 latency
DDR4 GDDR5 HBM HMC
• Especially
1.1 true of common OS routines (e.g., file I/O,
Speedup

process
1.0 forking)
0.9
0.8
forkbench (4...

TCP_STREAM (...
shell (0.2)

bootup (1.1)

TCP_RR (0.1)

UDP_STREAM ...

Test 4 (3.4)

Test 9 (4.7)

Test 8 (4.7)
UDP_RR (0.1)

Test 11 (4.5)

Test 10 (4.7)

Test 5 (10.1)

Test 3 (13.3)

Test 1 (13.6)

Test 7 (13.7)

Test 12 (15.4)

Test 2 (15.6)

Test 0 (15.7)

Test 6 (16.5)
Netperf IOZone, 64MB File

Several applications don’t benefit from more


parallelism Page 26 of 25
Key Takeaways

1. DRAM latency remains a critical bottleneck


for
many applications

2. Bank parallelism is not fully utilized by a


wide variety
of our applications

3. Spatial locality continues to provide


significant performance benefits if it is
exploited by the memory subsystem

4. For some classes of applications, low-


Page 27 of 25
Conclusion
 Manufacturers are developing many new
types of DRAM
• DRAM limits performance, energy improvements:
new types may overcome some limitations
• Memory systems now serve a very diverse set of
applications:
can no longer take a one-size-fits-all approach
• Difficult to intuitively determine which DRAM–workload
pair works best

 We perform a wide-ranging experimental


study to uncover
the combined behavior of workloads, DRAM
types
Open-source tools: https://github.com/CMU-
SAFARI/ramulator
• 115 prevalent/emerging applications and
multiprogrammed workloads
Full paper: https://arxiv.org/pdf/1902.07609
Page 28 of 25
For More Information…
 Saugata Ghose, Tianshi Li, Nastaran Hajinazar, Damla Senol Cali,
and Onur Mutlu,
"Demystifying Workload–DRAM Interactions: An Experimental St
udy"

Proceedings of the
ACM International Conference on Measurement and Modeling of Comput
er Systems
(SIGMETRICS), Phoenix, AZ, USA, June 2019.
[Preliminary arXiv Version]
[Abstract]
[Slides (pptx) (pdf)]
[MemBen Benchmark Suite]
[Source Code for GPGPUSim-Ramulator]

29
Ramulator for Processing in
Memory

30
Simulation Infrastructures for
PIM
Ramulator extended for PIM
 Flexible and extensible DRAM simulator
 Can model many different memory standards and
proposals
 Kim+, “Ramulator: A Flexible and Extensible
DRAM Simulator”, IEEE CAL 2015.
 https://github.com/CMU-SAFARI/ramulator-pim
 https://github.com/CMU-SAFARI/ramulator
 [Source Code for Ramulator-PIM]

31
Ramulator for PIM
 Gagandeep Singh, Juan Gomez-Luna, Giovanni Mariani, Geraldo
F. Oliveira, Stefano Corda, Sander Stujik, Onur Mutlu, and Henk
Corporaal,
"NAPEL: Near-Memory Computing Application Performanc
e Prediction via Ensemble Learning"

Proceedings of the 56th Design Automation Conference (DAC),


Las Vegas, NV, USA, June 2019.
[Slides (pptx) (pdf)]
[Poster (pptx) (pdf)]
[Source Code for Ramulator-PIM]

32
What We Discussed Is Applicable
to
Other Types of Simulation
Case Study:
COVID-19 Spread
Modeling and Prediction
COVID-19 Measures: Evaluation
Methods
How do we assess how an idea will affect a target
metric X?

 A variety of evaluation methods are available:

 Theoretical proof

 Analytical modeling/estimation

 Simulation (at varying degrees of abstraction and


accuracy)

 Prototyping with a real system (e.g., FPGAs)


35
 Real implementation
Simulating COVID-19 Spread
 An architect is in part a dreamer, a creator

 Simulation is a key tool of the architect


 Allows the evaluation & understanding of non-existent
systems

 Simulation enables
 The exploration of many dreams
 A reality check of the dreams
 Deciding which dream is better

 Simulation also enables


 The ability to fool yourself with false dreams
36
Goals in Simulating COVID-19
Spread
Explore the design space quickly and see what you
want to
 potentially implement in a next-generation platform
 propose as the next big idea to advance the state of the art
 the goal is mainly to see relative effects of design decisions

 Match the behavior of an existing system so that


you can
 debug and verify it at cycle-level accuracy
 propose small tweaks to the design that can make a
difference in performance or energy
 the goal is very high accuracy

 Other goals in-between:


 Refine the explored design space without going into a
full detailed, cycle-accurate design
 Gain confidence in your design decisions made by 37
Tradeoffs in Simulation
 Three metrics to evaluate a simulator
 Speed
 Flexibility
 Accuracy

 Speed: How fast the simulator runs (xIPS, xCPS,


slowdown)
 Flexibility: How quickly one can modify the simulator
to evaluate different algorithms and design choices?
 Accuracy: How accurate the performance (energy)
numbers the simulator generates are vs. a real
design (Simulation error)

 The relative importance of these metrics varies


depending on where you are in the design process 38
Trading Off Speed, Flexibility,
Accuracy
Speed & flexibility affect:
 How quickly you can make design tradeoffs

 Accuracy affects:
 How good your design tradeoffs may end up being
 How fast you can build your simulator (simulator design
time)

 Flexibility also affects:


 How much human effort you need to spend modifying
the simulator

 You can trade off between the three to achieve


design exploration and decision goals
39
High-Level Simulation
 Key Idea: Raise the abstraction level of modeling to
give up some accuracy to enable speed & flexibility
(and quick simulator design)

 Advantage
+ Can still make the right tradeoffs, and can do it quickly
+ All you need is modeling the key high-level factors,
you can omit corner case conditions
+ All you need is to get the “relative trends”
accurately, not exact performance numbers

 Disadvantage
-- Opens up the possibility of potentially wrong decisions
-- How do you ensure you get the “relative trends”
accurately? 40
Simulation as Progressive
Refinement
High-level models (Abstract, C)
 …
 Medium-level models (Less abstract)
 …
 Low-level models (RTL with everything modeled)
 …
 Real design

 As you refine (go down the above list)


 Abstraction level reduces
 Accuracy (hopefully) increases (not necessarily, if not
careful)
 Flexibility reduces; Speed likely reduces except for real
design 41
Making The Best of
Architecture
A good architect is comfortable at all levels of
refinement
 Including the extremes

 A good architect knows when to use what type of


simulation
 And, more generally, what type of evaluation method

 Recall: A variety of evaluation methods are


available:
 Theoretical proof
 Analytical modeling
 Simulation (at varying degrees of abstraction and
accuracy)
 Prototyping with a real system (e.g., FPGAs) 42
Computer Architecture
Lecture 14: Simulation
(with a Focus on Memory)

Prof. Onur Mutlu


ETH Zürich
Fall 2020
12 November 2020

You might also like