0% found this document useful (0 votes)

71 views34 pages

Lecture02 - High-Level Digital Design Automation

The document outlines the course ECE 6775 on High-Level Digital Design Automation for Fall 2024, including announcements for reading assignments and lab setups. It discusses the significance of Electronic Design Automation (EDA) in improving productivity and efficiency in hardware specialization, particularly for machine learning applications. The agenda includes topics on hardware specialization techniques, energy efficiency principles, and the introduction of fixed-point types in computing.

Uploaded by

leprelepre

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views34 pages

Lecture02 - High-Level Digital Design Automation

Uploaded by

leprelepre

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

ECE 6775

High-Level Digital Design Automation

Fall 2024

Hardware Specialization
Announcements

▸First reading assignment

– A. Boutros and V. Betz, “FPGA Architecture:
Principles and Progression”, IEEE CAS-M 2021
– Complete reading before Thursday 9/5

▸Lab 1 and an HLS tool setup guide will be

released soon (by Monday)

1
Recap: Our Interpretation of E-D-A

Exponential
in complexity (or Extreme scale) Exponential

Diverse
increasing system heterogeneity

Algorithmic
intrinsically computational
Diverse Algorithmic

2
Significance of EDA: Another Proof
Productivity Innovation : Reduce Custom Design (Structured Synthesis)
# of Customs over Time

1
0.9 >10x reduction over 5 generation Milestone:
0.8
Digital Logic
0.7 in 22nm server class
0.6 Microprocessors
0.5 99% synthesized
0.4 and signed-off by
0.3 Gate Level signoff
0.2
0.1
0

Ruchir Puri, High Performance Microprocessor Design, and Synthesis

Automation: Challenges and Opportunities, TAU’2013 keynote.
results w/
custom-like
data flow
alignment.3

*ISPD 2013 Best Paper Award: “Network Flow Based Datapath Bit Slicing” H.Xiang et al.
Agenda

▸Motivation for hardware specialization

– Key driving forces from applications and technology
– Main sources of inefficiency in general-purpose
computing

▸A taxonomy of common specialization

techniques

▸Introduction to fixed-point types

4
A Golden Age of Hardware Specialization
▸ Higher demand on efficient compute acceleration,
esp. for machine learning (ML) workloads

▸ Lower barrier with open-source hardware &

accelerators in cloud coming of age

5
Rising Computational Demands of Emerging
Applications
▸ Deep neural networks (DNNs) require enormous amount of compute
– Consider ResNet50, a 70-layer model that performs 7.7 billion operations
to classify an image (a relatively small model by today's standards)

Minerva
PaLM
GPT-3
AlphaGo
Transformer

ResNet
NPLM
AlexNet
Decision tree
LSTM
LeNet

NVIDIA Intel SPR

Intel Haswell H100 60-Core
NVIDIA Kepler 18-Core
Intel Pentium 4
Intel 386

6
Figure source: Cornell Zhang Research Group
On Crash Course with the End of “Cheap”
Technology Scaling

7
Dennard Scaling in a Nutshell

▸ Classical Dennard scaling

– Frequency increases at constant power profiles
– Performance improves “for free"!

Dennard scaling
Transistor (trans.) # S2
Capacitance / trans. 1/S
Voltage (Vdd) 1/S
Frequency S
Total power 1
Note: Dynamic power ∝ CV2F

8
End of Dennard Scaling and its Implications

▸ Power limited scaling

– Vth scaling halted due to exponentially increasing leakage power
– VDD scaling nearly stopped as well to maintain performance

Leakage limited scaling

Transistor (trans.) # S2
Capacitance / trans. 1/S
Voltage (Vdd) ~1
Frequency ~1
Total power S
Note: Dynamic power ∝ CV2F

▸ Implication: “Dark silicon”?

– Power limits restrict how much of the chip can be activated simultaneously
– No longer 100% without more power

9
Trade-off Between Flexibility and Efficiency

FLEXIBILITY EFFICIENCY
Register
Contr s
ol
Unit
CPUs
Arithmet GPUs FPGAs ASICs
(CU) ic Logic
Unit
(ALU)

Why are general-purpose

CPUs less energy efficient?

10
CPU Core Architecture

▸ Core = complex control + limited # of ALU ALU

compute units + large caches Control
ALU ALU
– Scalar & vector instructions
• Backward compatible ISA
Cache
– Complex control logic: decoding, hazard
detection, exception handling, etc.

▸ Mainly optimized to reduce latency of running serial code

– Shallow pipelines (< 30 stages)
– Superscalar, OOO, speculative execution, branch prediction,
prefetching, etc.
– Low throughput, even with multithreading

11
Poll & Discussion

Shallow vs. Deep Pipelining in context of CPU design

http://pollev.com/ece6775
Sign in or register using your Cornell email
12
Multi-Core Architecture

ALU ALU ALU ALU ALU ALU ALU ALU

Control Control Control Control
ALU ALU ALU ALU ALU ALU ALU ALU

Private L1/L2 Cache Private L1/L2 Cache Private L1/L2 Cache Private L1/L2 Cache

Shared Last-Level Cache (LLC)

With four cores, should we expect a 4x speedup

on an arbitrary application?

13
Graphics Processing Unit (GPU)
▸ GPU has thousands of cores to run ALU ALU
many threads in parallel Control
ALU ALU
– Cores are simpler (compared to CPU)
• No support of superscalar, OOO,
speculative execution, etc. Cache
• ISA not backward compatible
– Amortize overhead with SIMD + single
instruction multiple threads (SIMT) CPU

▸ Optimized to increase throughput of

running data-parallel applications
– Initially targeting graphics code
– Latency tolerant with many
concurrent threads
GPU

14
It’s Not Just About Performance: Computing’s
Energy Problem
Reading two 32b words from Energy
DRAM 1.3 nJ
Large SRAM (256MB) 58 pJ
Small SRAM (8KB) 5 pJ
Moving two 32b words by Energy
65,000x
40mm (across a 400mm2 chip) 77 pJ
1mm (local communication) 1.9 pJ 250x
Arithmetic on two 32b words Energy
FMA (float fused multiply-add) 1.2 pJ
IADD (integer add) 0.02 pJ
Data from [1], based on a 14nm process

Data supply far outweighs arithmetic operations in energy cost

15
[1] William Dally and Uzi Vishkin, On the Model of Computation, CACM’2022.
Rough Energy Breakdown for an Instruction

>20pJ 5pJ Control

I-Cache Register Control overheads 32-bit

access file access (clocking, decoding, ALU
pipeline control, ….)

Diagram adapted from W. Qadder, et al., Convolution Engine: Balancing Efficiency & Flexibility in 16
Specialized Computing, ISCA’2013.
Principles for Improving Energy Efficiency
Do less work!
– Amortize overhead in control and data supply across
multiple instructions

17
Amortizing the Overhead
A sequence of energy-inefficient instructions
I-Cache RF Control Arithmetic
I-Cache RF Control
…
I-Cache RF Control

Single instruction multiple Data (SIMD): tens of operations per instruction

I-Cache RF Control …

Further specialization (what we achieve using accelerators)

I-Cache RF Control … hundreds …
or more

Diagram adapted from W. Qadder, et al., Convolution Engine: Balancing Efficiency & Flexibility in 18
Specialized Computing, ISCA’2013.
Principles for Improving Energy Efficiency
Do less work!
– Amortize overhead in control and data supply across
multiple instructions

Do even less work!

– Use smaller (or simpler) data => cheaper operations,
lower storage & communication costs
– Move data locally and directly
• Store data nearby in simpler memory (e.g., scratchpads are
cheaper than cache)
• Wire compute units for direct (or even combinational)
communication when possible

19
Tensor Processing Unit (TPU)
▸ A domain-specific accelerator specialized for deep learning
– Main focus: accelerating matrix multiplication (MatMul) with a systolic array
• Use CISC instructions: MatMul Unit may take thousands of cycles
– TPUv1 does 8-bit integer (INT8) inference; TPUv2 supports a customized floating-
point type (bfloat16) for training

PE PE PE … PE

PE PE PE … PE
… … … …
PE PE PE … PE

A 256x256 Systolic Array

Google TPU v1

20
Source: Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit”, ISCA 2017
Common Hardware Specialization Techniques:
A Taxonomy
▸ Custom Compute Units: Use complex instructions to
amortize overhead (e.g., SIMD, “ASIC”-in-an-instruction)

▸ Custom Numeric Types: Trade off accuracy and

efficiency with data types that use smaller bit widths or
simpler arithmetic

▸ Custom Memory Hierarchy: Exploit data access

patterns to reduce energy per memory operation

▸ Custom Communication Architecture: Tailor on-chip

networks to data movement patterns

21
Common Hardware Specialization Techniques:
A Taxonomy
▸ Custom Compute Units: Use complex instructions to
amortize overhead (e.g., SIMD, “ASIC”-in-an-instruction)

▸ Custom Numeric Types: Trade off accuracy and

efficiency with data types that use smaller bit widths or
simpler arithmetic

▸ Custom Memory Hierarchy: Exploit data access

patterns to reduce energy per memory operation

▸ Custom Communication Architecture: Tailor on-chip

networks to data movement patterns

22
Customizing Compute Units: An Intuitive View

+2 0
1
SE(OFF,0) Adder
MP
Fm … F0
DR
SA RF
DataA
Inst. RAM

Decoder

SB M_address
IMM LD
DataB ALU Data
PC

MB SA Data_in 0
FS
0 RAM
SB 1
MD 1
DR
LD
D_in SE
MW VCZN MW MD
BS
MB
IMM

A simple single-cycle CPU

23
Evaluating a Simple Expression on CPU
Cycle-by-cycle
+
CPU activities
R5 <= R1 * R3

Decode

RF
ALU RAM
P
C

R6 <= R2 * R4

Decode

RF
ALU RAM
P
C

R7 <= R5 - R6 Decode

RF
ALU RAM
P
C

R8 <= R9 + R7
Decode

ALU RAM
P
C

24
Source: Adapted from Desh Singh’s talk at HCP’14 workshop
“Unrolling” the Instruction Execution
+
1. Replicate the
CPU1 CPU hardware
R5 <= R1 * R3 Instruction fixed

Decode

RF
ALU RAM
P
C
=> disable fetch
& decode logic

+
CPU2
R6 <= R2 * R4

Decode

RF
ALU RAM
P
C

Space

+
CPU3
R7 <= R5 - R6 Decode

RF
ALU RAM
P
C

+
CPU4
R8 <= R9 + R7
Decode

ALU RAM
P
C

25
Source: Adapted from Desh Singh’s talk at HCP’14 workshop
Removing Unused Logic
2. Removing
unused logic
R5 <= R1 * R3 => ALU also

RF
x
simplified

R6 <= R2 * R4

RF
x

Space

R7 <= R5 - R6

RF
–

R8 <= R9 + R7
RF

26
Source: Adapted from Desh Singh’s talk at HCP’14 workshop
An Application-Specific Compute Unit

R5 <= R1 * R3 R1 R3 R2 R4 3. Wire up registers and

functional units
R6 <= R2 * R4 x x Use combinational connections
when timing constraints allow
R7 <= R5 - R6 R5 R6
(e.g., R7)
–
R8 <= R9 + R7 R9

R7
+

27
Source: Adapted from Desh Singh’s talk at HCP’14 workshop
Common Hardware Specialization Techniques:
A Taxonomy
▸ Custom Compute Units: Use complex instructions to
amortize overhead (e.g., SIMD, “ASIC”-in-an-instruction)

▸ Custom Numeric Types: Trade off accuracy and

efficiency with data types that use smaller bit widths or
simpler arithmetic

▸ Custom Memory Hierarchy: Exploit data access

patterns to reduce energy per memory operation

▸ Custom Communication Architecture: Tailor on-chip

networks to data movement patterns

28
Customized Data Types
▸ Using custom numeric types tailored for a given
application/domain improves performance & efficiency
Sign Exponent Mantissa
Half float (fp16)

bfloat16

block-fp

fixed<9,4>

int4

uint256 …
uint1 Covered in lectures & labs 29
Binary Representation – Positional Encoding

Unsigned number Two’s complement

▸ MSB has a place value ▸ MSB weight = -2n-1
(weight) of 2n-1

Most Binary Point

significant bit (implicit)
(MSB)

23 22 21 20 unsigned -23 22 21 20 2’c

1 0 1 1 = 11 1 0 1 1 = -5

30
Fixed-Point Representation of Fractional Numbers

▸ The positional binary encoding can also represent fractional values,

by using a fixed position of the binary point and place values with
negative exponents
(–) Less convenient to use in software, compared to floating point
(+) Much more efficient in hardware

Integer part (4 bits) Fractional part (2 bits)

Unsigned 23 22 21 20 2-1 2-2 unsigned

fixed-point
number
1 0 1 1 0 1 = 11.25
Binary point
Signed 2’c
fixed-point
number 1 0 1 1 0 1 = ??
31
Next Lecture

▸More Hardware Specialization

32
Acknowledgements

▸These slides contain/adapt materials developed

by
– Bill Dally, NVIDIA
– System for AI Education Resource by Microsoft
Research

Computer Architecture Overview
No ratings yet
Computer Architecture Overview
51 pages
And Motivation: Presenter
No ratings yet
And Motivation: Presenter
22 pages
Chapter1 Computer Abstractions and Technology
No ratings yet
Chapter1 Computer Abstractions and Technology
52 pages
Systolic Array Design for Education
No ratings yet
Systolic Array Design for Education
6 pages
Intro Microprocessors Embedded Systems Lecture Presentation 1 PDF
No ratings yet
Intro Microprocessors Embedded Systems Lecture Presentation 1 PDF
99 pages
Computer Architecture Principles Overview
No ratings yet
Computer Architecture Principles Overview
274 pages
Customizable Computing
No ratings yet
Customizable Computing
120 pages
EEE415 Lect Intro
No ratings yet
EEE415 Lect Intro
61 pages
Mod 5 HW - Upd 2024
No ratings yet
Mod 5 HW - Upd 2024
38 pages
1 Introduction
No ratings yet
1 Introduction
41 pages
Advanced Computer Architecture Overview
No ratings yet
Advanced Computer Architecture Overview
61 pages
M-1 Introduction
No ratings yet
M-1 Introduction
43 pages
CompTIA A+ Exam Preparation Guide
100% (1)
CompTIA A+ Exam Preparation Guide
24 pages
Onur Ddca 2023 Lecture2a Tradeoffs Metrics Mindset Afterlecture
No ratings yet
Onur Ddca 2023 Lecture2a Tradeoffs Metrics Mindset Afterlecture
111 pages
Computer Architecture Overview
No ratings yet
Computer Architecture Overview
470 pages
0th Rivew Proj2
No ratings yet
0th Rivew Proj2
15 pages
ECE153a/253 Embedded Systems Class Overview
No ratings yet
ECE153a/253 Embedded Systems Class Overview
41 pages
CAO Fall 2024 Lecture 01 Introduction Motivation
No ratings yet
CAO Fall 2024 Lecture 01 Introduction Motivation
68 pages
Scalable System-on-Chip Design Dissertation
No ratings yet
Scalable System-on-Chip Design Dissertation
230 pages
EE292A Lecture 2.ML - Hardware
No ratings yet
EE292A Lecture 2.ML - Hardware
61 pages
MajorProject - PPT
No ratings yet
MajorProject - PPT
31 pages
ELECH473 Th06
No ratings yet
ELECH473 Th06
65 pages
Digital Design Lec1 Introduction
No ratings yet
Digital Design Lec1 Introduction
43 pages
Lect 01
No ratings yet
Lect 01
40 pages
Lec5 Tpu
No ratings yet
Lec5 Tpu
44 pages
Son-CA - Lec1 - 1 - Computer Abstraction and Technology
No ratings yet
Son-CA - Lec1 - 1 - Computer Abstraction and Technology
31 pages
GPGPU
100% (1)
GPGPU
139 pages
Cse431 02
No ratings yet
Cse431 02
50 pages
1 s2.0 S1383762122001138 Main
No ratings yet
1 s2.0 S1383762122001138 Main
51 pages
Neural Network Accelerators: CS223 Computer Architecture & Organization
No ratings yet
Neural Network Accelerators: CS223 Computer Architecture & Organization
45 pages
System On Chip and Embedded Systems
No ratings yet
System On Chip and Embedded Systems
53 pages
CH6 - Computer Abstractions and Technology
No ratings yet
CH6 - Computer Abstractions and Technology
69 pages
CS3350B Computer Architecture: Marc Moreno Maza
100% (1)
CS3350B Computer Architecture: Marc Moreno Maza
45 pages
CGE13213 It03 Hardware and Software
No ratings yet
CGE13213 It03 Hardware and Software
22 pages
Microcontroller 1 1707718209291
No ratings yet
Microcontroller 1 1707718209291
48 pages
Lecture1 ch1 Fundamentals of Quantitative Design and Analysis
No ratings yet
Lecture1 ch1 Fundamentals of Quantitative Design and Analysis
28 pages
Administrative Stuff : Instructor
No ratings yet
Administrative Stuff : Instructor
8 pages
The Essentials of Computer Organization and Architecture, Fifth Edition by Null and
100% (2)
The Essentials of Computer Organization and Architecture, Fifth Edition by Null and
49 pages
A1.1 - Computer Hardware and Operations
No ratings yet
A1.1 - Computer Hardware and Operations
30 pages
Calculating CPU Load in Microcontrollers
No ratings yet
Calculating CPU Load in Microcontrollers
75 pages
Lecture 2
No ratings yet
Lecture 2
16 pages
Lect 1
No ratings yet
Lect 1
34 pages
IMCA An Efficient in Memory Convolution Accelerator For Artificial Intelligence Applications
No ratings yet
IMCA An Efficient in Memory Convolution Accelerator For Artificial Intelligence Applications
15 pages
Week 4a - Computer Architecture Fundamentals - Part 1
No ratings yet
Week 4a - Computer Architecture Fundamentals - Part 1
45 pages
Understanding MIPS in Computer Architecture
No ratings yet
Understanding MIPS in Computer Architecture
9 pages
CUHK ELEG5764 AIIC 2025 Lec1
No ratings yet
CUHK ELEG5764 AIIC 2025 Lec1
53 pages
Basics Computer Architecture by Pooyan Jamshidi 1731311297
No ratings yet
Basics Computer Architecture by Pooyan Jamshidi 1731311297
266 pages
Aula Ch1
No ratings yet
Aula Ch1
40 pages
Defining Computer Architecture
No ratings yet
Defining Computer Architecture
6 pages
Rocket
No ratings yet
Rocket
93 pages
Computer Architecture Insights
No ratings yet
Computer Architecture Insights
29 pages
Introducción 2024
No ratings yet
Introducción 2024
41 pages
Computer Science Evolution
No ratings yet
Computer Science Evolution
49 pages
Lecture HWA
No ratings yet
Lecture HWA
11 pages
Hardware Dataflow For Convolutional Neural Network Accelerator
No ratings yet
Hardware Dataflow For Convolutional Neural Network Accelerator
6 pages
217 Lec1
No ratings yet
217 Lec1
35 pages
Advanced Computer Architecture Course
No ratings yet
Advanced Computer Architecture Course
28 pages
SOC Design
No ratings yet
SOC Design
23 pages
Embedded Software Design: A Practical Approach To Architecture, Processes, and Coding Techniques
100% (8)
Embedded Software Design: A Practical Approach To Architecture, Processes, and Coding Techniques
474 pages
Power Electronics Devices and Circuits Second Edition PDF
100% (20)
Power Electronics Devices and Circuits Second Edition PDF
383 pages
CAN Bus Companion Projects With Arduino Uno & Raspberry Pi With Examples For The MCP2515 CAN Bus Interface Module
100% (9)
CAN Bus Companion Projects With Arduino Uno & Raspberry Pi With Examples For The MCP2515 CAN Bus Interface Module
159 pages
High-Speed PCB Design Guide
100% (6)
High-Speed PCB Design Guide
115 pages
Designing With Xilinx FPGAs
90% (10)
Designing With Xilinx FPGAs
257 pages
Mastering-Stm32 Book 2nd Edition
100% (14)
Mastering-Stm32 Book 2nd Edition
910 pages
FPGAs - Fundamentals, Advanced Features, and Applications in Industrial Electronics PDF
100% (5)
FPGAs - Fundamentals, Advanced Features, and Applications in Industrial Electronics PDF
267 pages
Assembly Programming PDF
100% (11)
Assembly Programming PDF
446 pages
Ebook - C Programming 4 Embedded Systems - Kirk Zurell
100% (9)
Ebook - C Programming 4 Embedded Systems - Kirk Zurell
191 pages
100 Electronic Projects With Circuit Diagram PDF
97% (37)
100 Electronic Projects With Circuit Diagram PDF
105 pages
Embedded System Design by Frank Vahid PDF
100% (1)
Embedded System Design by Frank Vahid PDF
298 pages
Top 200 Embedded C Interview Questions
100% (3)
Top 200 Embedded C Interview Questions
293 pages
STM32 IoT Projects For Beginner - Aharen San
100% (7)
STM32 IoT Projects For Beginner - Aharen San
214 pages
Verilog Text Book
100% (1)
Verilog Text Book
431 pages
Tutorial T2: Fundamentals of Memory Subsystem Design For HPC and AI
No ratings yet
Tutorial T2: Fundamentals of Memory Subsystem Design For HPC and AI
105 pages
Top Agentic AI Architecture Design Patterns
100% (6)
Top Agentic AI Architecture Design Patterns
8 pages
Fundamentals of Quantum Computing (2021) (9783030636890) (2021)
100% (8)
Fundamentals of Quantum Computing (2021) (9783030636890) (2021)
480 pages
Verilog Quick Start - Practical Guide To Simulation & Synthesis in Verilog (3rd Ed.)
100% (2)
Verilog Quick Start - Practical Guide To Simulation & Synthesis in Verilog (3rd Ed.)
378 pages
Advanced Digital System Design Using SoC FPGAs
100% (1)
Advanced Digital System Design Using SoC FPGAs
435 pages
High-Bandwidth Memory Interface Design
No ratings yet
High-Bandwidth Memory Interface Design
86 pages
Multicore Processors and Systems PDF
100% (2)
Multicore Processors and Systems PDF
310 pages
Mastering-Stm32 017
100% (10)
Mastering-Stm32 017
746 pages
Allen-Holberg - CMOS Analog Circuit Design
89% (9)
Allen-Holberg - CMOS Analog Circuit Design
797 pages
Physical Design Interview Questions
95% (37)
Physical Design Interview Questions
37 pages
Computer Architecture Course Overview
100% (4)
Computer Architecture Course Overview
195 pages
Applied Generative AI For Beginners Practical Knowledge 1703207445
94% (18)
Applied Generative AI For Beginners Practical Knowledge 1703207445
221 pages
CMOS Digital Integrated Circuits
100% (5)
CMOS Digital Integrated Circuits
405 pages
Network Automation The Secret 1667693521
100% (1)
Network Automation The Secret 1667693521
105 pages
Generative Ai Fundamentals v1
100% (19)
Generative Ai Fundamentals v1
80 pages
All-In-One Electronics Guide Your Complete Ultimate Guide To Understanding and Utilizing Electronics!
96% (24)
All-In-One Electronics Guide Your Complete Ultimate Guide To Understanding and Utilizing Electronics!
469 pages
Lecture 11
No ratings yet
Lecture 11
36 pages
Lecture10 - High-Level Digital Design Automation
No ratings yet
Lecture10 - High-Level Digital Design Automation
34 pages
Lecture07 - High-Level Digital Design Automation
No ratings yet
Lecture07 - High-Level Digital Design Automation
28 pages
Lecture05 - High-Level Digital Design Automation
No ratings yet
Lecture05 - High-Level Digital Design Automation
36 pages
Lecture04 - High-Level Digital Design Automation
No ratings yet
Lecture04 - High-Level Digital Design Automation
30 pages
Paper Reddy 21
No ratings yet
Paper Reddy 21
22 pages
CS3491 CCS Iat-2 QP (2024)
No ratings yet
CS3491 CCS Iat-2 QP (2024)
3 pages
Aquarium Fabrication and Setup Guide
No ratings yet
Aquarium Fabrication and Setup Guide
12 pages
070 Double Drum TRAINING
No ratings yet
070 Double Drum TRAINING
165 pages
Subject Verb Agreement R R
No ratings yet
Subject Verb Agreement R R
10 pages
Student Math Challenges & Insights
No ratings yet
Student Math Challenges & Insights
18 pages
Radar Esm and Elint Receivers
No ratings yet
Radar Esm and Elint Receivers
6 pages
Broadband Radial Discone Antenna Design
No ratings yet
Broadband Radial Discone Antenna Design
7 pages
Laboratory Fermentors LiFlus GX/GM
No ratings yet
Laboratory Fermentors LiFlus GX/GM
6 pages
Econometric Analysis Seminar
No ratings yet
Econometric Analysis Seminar
7 pages
A Mathematical Simulation Model of A CH-47B Helicopter
No ratings yet
A Mathematical Simulation Model of A CH-47B Helicopter
136 pages
Softstart Circuit Diagrams & Applications
No ratings yet
Softstart Circuit Diagrams & Applications
45 pages
Full Tilt Feller PDF
No ratings yet
Full Tilt Feller PDF
13 pages
XDPIA112EN F ApplRep AlcoholMeasurement FlavoredSpiritsLiqueurs
No ratings yet
XDPIA112EN F ApplRep AlcoholMeasurement FlavoredSpiritsLiqueurs
2 pages
Electrical Installations - Numbers & Vocabulary Worksheet (A1-A2)
No ratings yet
Electrical Installations - Numbers & Vocabulary Worksheet (A1-A2)
4 pages
Course Structure DD Latest IITB
No ratings yet
Course Structure DD Latest IITB
34 pages
Dynamic Converter Modelling Guide
No ratings yet
Dynamic Converter Modelling Guide
30 pages
Compressor Rotary
No ratings yet
Compressor Rotary
36 pages
Computer Science Exam Prep
No ratings yet
Computer Science Exam Prep
4 pages
MTH 102 Calculus of Vector Functions of A Real Variable
No ratings yet
MTH 102 Calculus of Vector Functions of A Real Variable
4 pages
Studocu 15
No ratings yet
Studocu 15
24 pages
Quantitative Research
No ratings yet
Quantitative Research
39 pages
ETL Testing Goals and Strategies
No ratings yet
ETL Testing Goals and Strategies
3 pages
Understanding Abelian Groups in Mathematics
No ratings yet
Understanding Abelian Groups in Mathematics
13 pages
Merge Sort Algorithm Guide
No ratings yet
Merge Sort Algorithm Guide
18 pages
A Detailed Lesson Plan in Mathematics Five Name: Jerome M. Dela Cruz Subject: Mathematics 5
No ratings yet
A Detailed Lesson Plan in Mathematics Five Name: Jerome M. Dela Cruz Subject: Mathematics 5
4 pages
Aptitude and Programming Questions Guide
No ratings yet
Aptitude and Programming Questions Guide
179 pages
Generation of Power Using Piezoelectric Transducer
No ratings yet
Generation of Power Using Piezoelectric Transducer
4 pages
EuroLoop: Gas & Oil Flow Calibration
No ratings yet
EuroLoop: Gas & Oil Flow Calibration
2 pages
Computer Terms
No ratings yet
Computer Terms
7 pages

Lecture02 - High-Level Digital Design Automation

Uploaded by

Lecture02 - High-Level Digital Design Automation

Uploaded by

ECE 6775

High-Level Digital Design Automation

▸First reading assignment

▸Lab 1 and an HLS tool setup guide will be

Ruchir Puri, High Performance Microprocessor Design, and Synthesis

▸Motivation for hardware specialization

▸A taxonomy of common specialization

▸Introduction to fixed-point types

▸ Lower barrier with open-source hardware &

NVIDIA Intel SPR

▸ Classical Dennard scaling

▸ Power limited scaling

Leakage limited scaling

▸ Implication: “Dark silicon”?

Why are general-purpose

▸ Core = complex control + limited # of ALU ALU

▸ Mainly optimized to reduce latency of running serial code

Shallow vs. Deep Pipelining in context of CPU design

ALU ALU ALU ALU ALU ALU ALU ALU

Shared Last-Level Cache (LLC)

With four cores, should we expect a 4x speedup

▸ Optimized to increase throughput of

Data supply far outweighs arithmetic operations in energy cost

>20pJ 5pJ Control

I-Cache Register Control overheads 32-bit

Single instruction multiple Data (SIMD): tens of operations per instruction

Further specialization (what we achieve using accelerators)

Do even less work!

A 256x256 Systolic Array

▸ Custom Numeric Types: Trade off accuracy and

▸ Custom Memory Hierarchy: Exploit data access

▸ Custom Communication Architecture: Tailor on-chip

▸ Custom Numeric Types: Trade off accuracy and

▸ Custom Memory Hierarchy: Exploit data access

▸ Custom Communication Architecture: Tailor on-chip

A simple single-cycle CPU

R5 <= R1 * R3 R1 R3 R2 R4 3. Wire up registers and

▸ Custom Numeric Types: Trade off accuracy and

▸ Custom Memory Hierarchy: Exploit data access

▸ Custom Communication Architecture: Tailor on-chip

Unsigned number Two’s complement

Most Binary Point

23 22 21 20 unsigned -23 22 21 20 2’c

▸ The positional binary encoding can also represent fractional values,

Integer part (4 bits) Fractional part (2 bits)

Unsigned 23 22 21 20 2-1 2-2 unsigned

▸More Hardware Specialization

▸These slides contain/adapt materials developed

You might also like