0% found this document useful (0 votes)

19 views43 pages

Comparch Fall2020 Lecture14 Simulation

hardware simulation methodology

Uploaded by

lijianing1024

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views43 pages

Comparch Fall2020 Lecture14 Simulation

hardware simulation methodology

Uploaded by

lijianing1024

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Computer Architecture

Lecture 14: Simulation

(with a Focus on Memory)

Prof. Onur Mutlu

ETH Zürich
Fall 2020
12 November 2020
Simulating (Memory)
Systems

2
Evaluating New Ideas
for New (Memory)
Architectures
Potential Evaluation Methods
 How do we assess how an idea will affect a target
metric X?

 A variety of evaluation methods are available:

 Theoretical proof

 Analytical modeling/estimation

 Simulation (at varying degrees of abstraction and

accuracy)

 Prototyping with a real system (e.g., FPGAs)

4
 Real implementation
The Difficulty in Architectural
Evaluation
The answer is usually workload dependent
 E.g., think caching
 E.g., think pipelining
 E.g., think any idea we talked about (RAIDR, Mem.
Sched., …)

 Workloads change

 System has many design choices and parameters

 Architect needs to decide many ideas and many
parameters for a design
 Not easy to evaluate all possible combinations!

 System parameters may change

5
Simulation: The Field of Dreams
Dreaming and Reality
 An architect is in part a dreamer, a creator

 Simulation is a key tool of the architect

 Allows the evaluation & understanding of non-existent
systems

 Simulation enables
 The exploration of many dreams
 A reality check of the dreams
 Deciding which dream is better

 Simulation also enables

 The ability to fool yourself with false dreams
7
Why High-Level Simulation?
 Problem: RTL simulation is intractable for design
space exploration  too time consuming to design
and evaluate
 Especially over a large number of workloads
 Especially if you want to predict the performance of a
good chunk of a workload on a particular design
 Especially if you want to consider many design choices
 Cache size, associativity, block size, algorithms
 Memory control and scheduling algorithms
 In-order vs. out-of-order execution
 Reservation station sizes, ld/st queue size, register file
size, …
 …

 Goal: Explore design choices quickly to see their

8
impact on the workloads we are designing the
Different Goals in Simulation
 Explore the design space quickly and see what you
want to
 potentially implement in a next-generation platform
 propose as the next big idea to advance the state of the art
 the goal is mainly to see relative effects of design decisions

 Match the behavior of an existing system so that

you can
 debug and verify it at cycle-level accuracy
 propose small tweaks to the design that can make a
difference in performance or energy
 the goal is very high accuracy

 Other goals in-between:

 Refine the explored design space without going into a
full detailed, cycle-accurate design
 Gain confidence in your design decisions made by 9
Tradeoffs in Simulation
 Three metrics to evaluate a simulator
 Speed
 Flexibility
 Accuracy

 Speed: How fast the simulator runs (xIPS, xCPS,

slowdown)
 Flexibility: How quickly one can modify the simulator
to evaluate different algorithms and design choices?
 Accuracy: How accurate the performance (energy)
numbers the simulator generates are vs. a real
design (Simulation error)

 The relative importance of these metrics varies

depending on where you are in the design process 10
Trading Off Speed, Flexibility,
Accuracy
Speed & flexibility affect:
 How quickly you can make design tradeoffs

 Accuracy affects:
 How good your design tradeoffs may end up being
 How fast you can build your simulator (simulator design
time)

 Flexibility also affects:

 How much human effort you need to spend modifying
the simulator

 You can trade off between the three to achieve

design exploration and decision goals
11
High-Level Simulation
 Key Idea: Raise the abstraction level of modeling to
give up some accuracy to enable speed & flexibility
(and quick simulator design)

 Advantage
+ Can still make the right tradeoffs, and can do it quickly
+ All you need is modeling the key high-level factors,
you can omit corner case conditions
+ All you need is to get the “relative trends”
accurately, not exact performance numbers

 Disadvantage
-- Opens up the possibility of potentially wrong decisions
-- How do you ensure you get the “relative trends”
accurately? 12
Simulation as Progressive
Refinement
High-level models (Abstract, C)
 …
 Medium-level models (Less abstract)
 …
 Low-level models (RTL with everything modeled)
 …
 Real design

 As you refine (go down the above list)

 Abstraction level reduces
 Accuracy (hopefully) increases (not necessarily, if not
careful)
 Flexibility reduces; Speed likely reduces except for real
design 13
Making The Best of
Architecture
A good architect is comfortable at all levels of
refinement
 Including the extremes

 A good architect knows when to use what type of

simulation
 And, more generally, what type of evaluation method

 Recall: A variety of evaluation methods are

available:
 Theoretical proof
 Analytical modeling
 Simulation (at varying degrees of abstraction and
accuracy)
 Prototyping with a real system (e.g., FPGAs) 14
An Example Simulator

15
Ramulator: A Fast and
Extensible DRAM Simulator
[IEEE Comp Arch Letters’15]

16
Ramulator Motivation
 DRAM and Memory Controller landscape is changing
 Many new and upcoming standards
 Many new controller designs
 A fast and easy-to-extend simulator is very much needed

17
Ramulator
 Provides out-of-the box support for many DRAM
standards:
 DDR3/4, LPDDR3/4, GDDR5, WIO1/2, HBM, plus new
proposals (SALP, AL-DRAM, TLDRAM, RowClone, and
SARP)
 ~2.5X faster than fastest open-source simulator
 Modular and extensible to different standards

18
Case Study: Comparison of DRAM
Standards

Across 22
workloads,
simple CPU
model

19
Ramulator Paper and Source
Code
Yoongu Kim, Weikun Yang, and Onur Mutlu,
"Ramulator: A Fast and Extensible DRAM Simu
lator"

IEEE Computer Architecture Letters (CAL), March

2015.
[Source Code]

 Source code is released under the liberal MIT

License
 https://github.com/CMU-SAFARI/ramulator

20
Bonus Assignment as Part of
HW
 #4
Review the Ramulator paper
 Same points as any other BONUS review in HW #4

21
An Example Study using
Ramulator

22
An Example Study with
Ramulator


and Onur Mutlu,

(I)
Saugata Ghose, Tianshi Li, Nastaran Hajinazar, Damla Senol Cali,

"Demystifying Workload–DRAM Interactions: An Experimental St

udy"

Proceedings of the
ACM International Conference on Measurement and Modeling of Comput
er Systems
(SIGMETRICS), Phoenix, AZ, USA, June 2019.
[Preliminary arXiv Version]
[Abstract]
[Slides (pptx) (pdf)]
[MemBen Benchmark Suite]
[Source Code for GPGPUSim-Ramulator]

23
Why Study Workload–DRAM Interactions?
 Manufacturers are developing many new
types of DRAM
• DRAM limits performance, energy improvements:
new types may overcome some limitations
• Memory systems now serve a very diverse set of
applications:
can no longer take a one-size-fits-all approach

 So which DRAM type works best with which

application?
• Difficult to understand intuitively due to the complexity of
the interaction
• Can’t be tested methodically on real systems: new type
needs a new CPU

 We perform a wide-ranging experimentalPage 24 of 25

Modern DRAM Types: Comparison to DDR3

Low-  Bank groups

Banks Bank 3D-
DRAM
per Group Stack
Type Power Bank Group Bank Group
Rank s ed
Bank Bank Bank Bank
DDR3 8
DDR4 16  increased latency
GDDR5
GDDR5 16
16 
 increased area/power memory channel
HBM
HBM
High- 16 
High-
Bandwidth
Bandwidth
16   3D-stacked
Memory high bandwidth with
Memory DRAM Through-Silicon
HMC narrower rows,
HMC
Hybrid higher latency Vias (TSVs)
Hybrid
Memory
256
256 

Memory
Cube
Cube Memory
Wide I/O 4   Layers
Wide I/O 4  
Wide I/O
Wide I/O 8  
2 8   dedicated Logic Layer
2
LPDDR3 8  Page 25 of 25
4. Need for Lower Access Latency: Performance
 New DRAM types often increase access
latency in order to provide more banks,
higher throughput
 Many applications can’t make up for the
increased
1.2 latency
DDR4 GDDR5 HBM HMC
• Especially
1.1 true of common OS routines (e.g., file I/O,
Speedup

process
1.0 forking)
0.9
0.8
forkbench (4...

TCP_STREAM (...
shell (0.2)

bootup (1.1)

TCP_RR (0.1)

UDP_STREAM ...

Test 4 (3.4)

Test 9 (4.7)

Test 8 (4.7)
UDP_RR (0.1)

Test 11 (4.5)

Test 10 (4.7)

Test 5 (10.1)

Test 3 (13.3)

Test 1 (13.6)

Test 7 (13.7)

Test 12 (15.4)

Test 2 (15.6)

Test 0 (15.7)

Test 6 (16.5)
Netperf IOZone, 64MB File

Several applications don’t benefit from more

parallelism Page 26 of 25
Key Takeaways

1. DRAM latency remains a critical bottleneck

for
many applications

2. Bank parallelism is not fully utilized by a

wide variety
of our applications

3. Spatial locality continues to provide

significant performance benefits if it is
exploited by the memory subsystem

4. For some classes of applications, low-

Page 27 of 25
Conclusion
 Manufacturers are developing many new
types of DRAM
• DRAM limits performance, energy improvements:
new types may overcome some limitations
• Memory systems now serve a very diverse set of
applications:
can no longer take a one-size-fits-all approach
• Difficult to intuitively determine which DRAM–workload
pair works best

 We perform a wide-ranging experimental

study to uncover
the combined behavior of workloads, DRAM
types
Open-source tools: https://github.com/CMU-
SAFARI/ramulator
• 115 prevalent/emerging applications and
multiprogrammed workloads
Full paper: https://arxiv.org/pdf/1902.07609
Page 28 of 25
For More Information…
 Saugata Ghose, Tianshi Li, Nastaran Hajinazar, Damla Senol Cali,
and Onur Mutlu,
"Demystifying Workload–DRAM Interactions: An Experimental St
udy"

29
Ramulator for Processing in
Memory

30
Simulation Infrastructures for
PIM
Ramulator extended for PIM
 Flexible and extensible DRAM simulator
 Can model many different memory standards and
proposals
 Kim+, “Ramulator: A Flexible and Extensible
DRAM Simulator”, IEEE CAL 2015.
 https://github.com/CMU-SAFARI/ramulator-pim
 https://github.com/CMU-SAFARI/ramulator
 [Source Code for Ramulator-PIM]

31
Ramulator for PIM
 Gagandeep Singh, Juan Gomez-Luna, Giovanni Mariani, Geraldo
F. Oliveira, Stefano Corda, Sander Stujik, Onur Mutlu, and Henk
Corporaal,
"NAPEL: Near-Memory Computing Application Performanc
e Prediction via Ensemble Learning"

Proceedings of the 56th Design Automation Conference (DAC),

Las Vegas, NV, USA, June 2019.
[Slides (pptx) (pdf)]
[Poster (pptx) (pdf)]
[Source Code for Ramulator-PIM]

32
What We Discussed Is Applicable
to
Other Types of Simulation
Case Study:
COVID-19 Spread
Modeling and Prediction
COVID-19 Measures: Evaluation
Methods
How do we assess how an idea will affect a target
metric X?

 A variety of evaluation methods are available:

 Theoretical proof

 Analytical modeling/estimation

 Simulation (at varying degrees of abstraction and

accuracy)

 Prototyping with a real system (e.g., FPGAs)

35
 Real implementation
Simulating COVID-19 Spread
 An architect is in part a dreamer, a creator

 Simulation is a key tool of the architect

 Allows the evaluation & understanding of non-existent
systems

 Simulation enables
 The exploration of many dreams
 A reality check of the dreams
 Deciding which dream is better

 Simulation also enables

 The ability to fool yourself with false dreams
36
Goals in Simulating COVID-19
Spread
Explore the design space quickly and see what you
want to
 potentially implement in a next-generation platform
 propose as the next big idea to advance the state of the art
 the goal is mainly to see relative effects of design decisions

 Match the behavior of an existing system so that

you can
 debug and verify it at cycle-level accuracy
 propose small tweaks to the design that can make a
difference in performance or energy
 the goal is very high accuracy

 Other goals in-between:

 Refine the explored design space without going into a
full detailed, cycle-accurate design
 Gain confidence in your design decisions made by 37
Tradeoffs in Simulation
 Three metrics to evaluate a simulator
 Speed
 Flexibility
 Accuracy

 Speed: How fast the simulator runs (xIPS, xCPS,

 The relative importance of these metrics varies

depending on where you are in the design process 38
Trading Off Speed, Flexibility,
Accuracy
Speed & flexibility affect:
 How quickly you can make design tradeoffs

 Accuracy affects:
 How good your design tradeoffs may end up being
 How fast you can build your simulator (simulator design
time)

 Flexibility also affects:

 How much human effort you need to spend modifying
the simulator

 You can trade off between the three to achieve

design exploration and decision goals
39
High-Level Simulation
 Key Idea: Raise the abstraction level of modeling to
give up some accuracy to enable speed & flexibility
(and quick simulator design)

 Disadvantage
-- Opens up the possibility of potentially wrong decisions
-- How do you ensure you get the “relative trends”
accurately? 40
Simulation as Progressive
Refinement
High-level models (Abstract, C)
 …
 Medium-level models (Less abstract)
 …
 Low-level models (RTL with everything modeled)
 …
 Real design

 As you refine (go down the above list)

 Abstraction level reduces
 Accuracy (hopefully) increases (not necessarily, if not
careful)
 Flexibility reduces; Speed likely reduces except for real
design 41
Making The Best of
Architecture
A good architect is comfortable at all levels of
refinement
 Including the extremes

 A good architect knows when to use what type of

simulation
 And, more generally, what type of evaluation method

 Recall: A variety of evaluation methods are

available:
 Theoretical proof
 Analytical modeling
 Simulation (at varying degrees of abstraction and
accuracy)
 Prototyping with a real system (e.g., FPGAs) 42
Computer Architecture
Lecture 14: Simulation
(with a Focus on Memory)

Prof. Onur Mutlu

ETH Zürich
Fall 2020
12 November 2020

CIS775: Computer Architecture: Chapter 1: Fundamentals of Computer Design
No ratings yet
CIS775: Computer Architecture: Chapter 1: Fundamentals of Computer Design
43 pages
CIS775: Computer Architecture: Chapter 1: Fundamentals of Computer Design
No ratings yet
CIS775: Computer Architecture: Chapter 1: Fundamentals of Computer Design
43 pages
CIS775: Computer Architecture: Chapter 1: Fundamentals of Computer Design
No ratings yet
CIS775: Computer Architecture: Chapter 1: Fundamentals of Computer Design
43 pages
What Is An Embedded System ?
No ratings yet
What Is An Embedded System ?
9 pages
Embedded System
No ratings yet
Embedded System
26 pages
FPGA-Accelerated Simulation of Computer Systems
No ratings yet
FPGA-Accelerated Simulation of Computer Systems
82 pages
Designing A Chip SASE 2012
No ratings yet
Designing A Chip SASE 2012
104 pages
Onur 447 Spring15 Lecture1 Intro Afterlecture
No ratings yet
Onur 447 Spring15 Lecture1 Intro Afterlecture
88 pages
Till Slide 45
No ratings yet
Till Slide 45
155 pages
EL203 - Embedded
No ratings yet
EL203 - Embedded
467 pages
Comparch Fall2020 Lecture10 Lowlatencymemory
No ratings yet
Comparch Fall2020 Lecture10 Lowlatencymemory
278 pages
Computer Architecture: Fundamentals
No ratings yet
Computer Architecture: Fundamentals
36 pages
Traditional ASIC Design Flow Overview
100% (1)
Traditional ASIC Design Flow Overview
24 pages
Digital ASIC Back End Methodology
No ratings yet
Digital ASIC Back End Methodology
89 pages
Eda Ieee Eds26sep2013
No ratings yet
Eda Ieee Eds26sep2013
90 pages
Computer Architecture Fundamentals
No ratings yet
Computer Architecture Fundamentals
36 pages
Acknowledgement: - The Bulk of The Material in This Lecture Is Adapted From
No ratings yet
Acknowledgement: - The Bulk of The Material in This Lecture Is Adapted From
7 pages
Embedded Prathap
No ratings yet
Embedded Prathap
58 pages
Onur Mutlu All Lecs 447
No ratings yet
Onur Mutlu All Lecs 447
503 pages
Comparch Fall2020 Lecture11a Memory Controllers
No ratings yet
Comparch Fall2020 Lecture11a Memory Controllers
71 pages
ES Intr
No ratings yet
ES Intr
39 pages
VLSI Synthesis and STA
No ratings yet
VLSI Synthesis and STA
58 pages
Logic Synthesis Techniques by Silvia Zhang
No ratings yet
Logic Synthesis Techniques by Silvia Zhang
21 pages
1d996928lecture 2 and 3 PDF
No ratings yet
1d996928lecture 2 and 3 PDF
53 pages
Soc Overview Pitt 11
No ratings yet
Soc Overview Pitt 11
62 pages
Advanced Digital Logic Course Overview
No ratings yet
Advanced Digital Logic Course Overview
44 pages
Lect4 - IC Technology
No ratings yet
Lect4 - IC Technology
43 pages
Class Notes Dicd
No ratings yet
Class Notes Dicd
39 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
14 pages
Digital ASIC Design Guide
No ratings yet
Digital ASIC Design Guide
89 pages
Prep Asic
No ratings yet
Prep Asic
36 pages
Computer Architecture: Lecture 1: Introduction and Basics
No ratings yet
Computer Architecture: Lecture 1: Introduction and Basics
28 pages
Embedded Systems Design Overview
No ratings yet
Embedded Systems Design Overview
88 pages
Dr. Rehan Hafiz: Digital Design Overview
No ratings yet
Dr. Rehan Hafiz: Digital Design Overview
32 pages
FPGA and ASIC Design Flow Overview
No ratings yet
FPGA and ASIC Design Flow Overview
18 pages
EE5902R Chapter 1 Slides
No ratings yet
EE5902R Chapter 1 Slides
46 pages
Usc
No ratings yet
Usc
17 pages
Backend Design Flow
No ratings yet
Backend Design Flow
33 pages
Conrad RX63N Advanced
No ratings yet
Conrad RX63N Advanced
12 pages
Sharif Digital Flow Introduction Part I: Synthesize & Power Analyze
No ratings yet
Sharif Digital Flow Introduction Part I: Synthesize & Power Analyze
128 pages
Overview of Embedded Systems Design
No ratings yet
Overview of Embedded Systems Design
67 pages
BCSE305L - Embedded Systems: School of Computer Science and Engineering
No ratings yet
BCSE305L - Embedded Systems: School of Computer Science and Engineering
32 pages
Digital Design Lec1 Introduction
No ratings yet
Digital Design Lec1 Introduction
43 pages
FPGA and Simulator Integration in RAMP
No ratings yet
FPGA and Simulator Integration in RAMP
15 pages
FPGA and VLSI System Design Overview
No ratings yet
FPGA and VLSI System Design Overview
28 pages
Tuning The Pentium Pro Microarchitecture
No ratings yet
Tuning The Pentium Pro Microarchitecture
8 pages
Soc Design
No ratings yet
Soc Design
42 pages
AI in Chip Design for Engineers
No ratings yet
AI in Chip Design for Engineers
25 pages
Microsoft PowerPoint - SoC Design Flow Tools Codesign
No ratings yet
Microsoft PowerPoint - SoC Design Flow Tools Codesign
110 pages
Lab2 DC
No ratings yet
Lab2 DC
52 pages
Lecture 1-3: Introduction To Embedded Systems
No ratings yet
Lecture 1-3: Introduction To Embedded Systems
54 pages
HPC Node Performance Simulation
No ratings yet
HPC Node Performance Simulation
33 pages
Model Based Design From Concept To Production
No ratings yet
Model Based Design From Concept To Production
40 pages
Models, Architectures and Languages
No ratings yet
Models, Architectures and Languages
107 pages
ODE in Maple PDF
No ratings yet
ODE in Maple PDF
6 pages
Yash Certificate
No ratings yet
Yash Certificate
49 pages
B.Tech 3rd Sem Syllabus 2015-16
No ratings yet
B.Tech 3rd Sem Syllabus 2015-16
123 pages
Linear Thompson Sampling (LinTS) - Contextual, Linear TS
No ratings yet
Linear Thompson Sampling (LinTS) - Contextual, Linear TS
9 pages
Cyber Security Essentials Syllabus
No ratings yet
Cyber Security Essentials Syllabus
6 pages
EECS 3101 W24 - 04 Final Sol
No ratings yet
EECS 3101 W24 - 04 Final Sol
19 pages
Product & Service Based Companies
No ratings yet
Product & Service Based Companies
5 pages
O RAN - WG1.Slicing Architecture R003 v13.00
No ratings yet
O RAN - WG1.Slicing Architecture R003 v13.00
75 pages
Broadband Communications Networks - Midterm Exam - Feb 2025
No ratings yet
Broadband Communications Networks - Midterm Exam - Feb 2025
4 pages
Understanding Bda Paper
No ratings yet
Understanding Bda Paper
9 pages
1000cc Gtts Ugtts 500cc Gtts Ugtts: Conversion
No ratings yet
1000cc Gtts Ugtts 500cc Gtts Ugtts: Conversion
1 page
Certsinside Disciplined Agile Senior Scrum Master Disciplined Agile Senior Scrum Master Verified Questions Answers by Foley 15 04 2024 6qa
No ratings yet
Certsinside Disciplined Agile Senior Scrum Master Disciplined Agile Senior Scrum Master Verified Questions Answers by Foley 15 04 2024 6qa
5 pages
Calculus III Practice Problems For Exam 2:: I. Sections of Functions
No ratings yet
Calculus III Practice Problems For Exam 2:: I. Sections of Functions
4 pages
K11 Demo ppt12
No ratings yet
K11 Demo ppt12
40 pages
LP - Rational Equations
No ratings yet
LP - Rational Equations
41 pages
Get PHP and MySQL Web Development Third Edition Luke Welling Free All Chapters
No ratings yet
Get PHP and MySQL Web Development Third Edition Luke Welling Free All Chapters
55 pages
Wireless Biofeedback System
No ratings yet
Wireless Biofeedback System
39 pages
MT6572 Android Scatter
No ratings yet
MT6572 Android Scatter
6 pages
8 5-8 6-Points
No ratings yet
8 5-8 6-Points
2 pages
Parameter Setting Keep Voice Traffic in 3G Network Huawei
No ratings yet
Parameter Setting Keep Voice Traffic in 3G Network Huawei
2 pages
2009 SK Electrical Code Amendments
No ratings yet
2009 SK Electrical Code Amendments
54 pages
Noncomputability and The Busy Beaver Problem: Bryant A. Julstrom
No ratings yet
Noncomputability and The Busy Beaver Problem: Bryant A. Julstrom
36 pages
Book 13 Texto PDF
50% (2)
Book 13 Texto PDF
206 pages
Linkedin Full Hack
No ratings yet
Linkedin Full Hack
32 pages
(English (India) ) Top 5 Best Smartphones Under 40000 Budget April 2024 (DownSub - Com) C
No ratings yet
(English (India) ) Top 5 Best Smartphones Under 40000 Budget April 2024 (DownSub - Com) C
7 pages
Design Qualification Protocol - Finish Product Cold Room
50% (2)
Design Qualification Protocol - Finish Product Cold Room
8 pages
Urban Planning Compliance Report
No ratings yet
Urban Planning Compliance Report
2 pages
ISO 9001 Nonconformity Guide
100% (2)
ISO 9001 Nonconformity Guide
9 pages
Poweredge-2400 - Service Manual - En-Us
No ratings yet
Poweredge-2400 - Service Manual - En-Us
83 pages
Question Bank Unit-3
No ratings yet
Question Bank Unit-3
3 pages