0% found this document useful (0 votes)
44 views7 pages

High Performance Computing Notes Unit-1

The document discusses high-performance computing, focusing on modern processor architecture, including stored-program architecture, cache-based microprocessor architecture, and performance metrics. It highlights the evolution of computing from hard-wired programs to flexible stored-program systems, emphasizing the importance of components like the CPU, memory, and I/O systems. Key concepts such as Moore's Law, pipelining, and multicore processors are also introduced, illustrating advancements in computing technology.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views7 pages

High Performance Computing Notes Unit-1

The document discusses high-performance computing, focusing on modern processor architecture, including stored-program architecture, cache-based microprocessor architecture, and performance metrics. It highlights the evolution of computing from hard-wired programs to flexible stored-program systems, emphasizing the importance of components like the CPU, memory, and I/O systems. Key concepts such as Moore's Law, pipelining, and multicore processors are also introduced, illustrating advancements in computing technology.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

HIGH PERFORMANCE COMPUTING – UNIT-1

Chapter 1: Modern Processor

1.1 Stored-program computer architecture

1.2 General-purpose cache-based microprocessor architecture.


1.2.1 Performance metrics and benchmarks.
1.2.2 Transistors galore: Moore’s Law.
1.2.3 Pipelining
1.2.4 Superscalarity
1.2.5 SIMD

1.3 Memory hierarchies


1.3.1 Cache
1.3.2 Cache mapping
1.3.3 Pre-fetch

1.4 Multicore processors


1.5 Multithreaded processors
1.6 Vector processors
Stored Program Architecture:

Before Stored Program Architecture:

 Early computers like ENIAC used hard-wired programs:


 Programs were not stored in memory.
 Changing a program required manually rewiring the hardware.
 This process was time-consuming and error-prone.

Evolution with EDVAC:

 EDVAC (Electronic Discrete Variable Automatic Computer) was one of the


first computers to use stored program architecture.
 It was proposed in 1945 by John von Neumann, who suggested that both
instructions and data should be stored in the same memory.
 Hence, the architecture is often called the Von Neumann Architecture.

What is Stored Program Architecture?

A computer model where:

 Program instructions (code) and data are both stored in main memory
(RAM).
 The CPU fetches and executes instructions sequentially using a common
bus.
 This model is used by almost all general-purpose computers today.

Based on SISD Model:

 SISD: Single Instruction, Single Data


 A single processor executes one instruction at a time on one data item.
 Describes most traditional serial processors.
 Represents the basic sequential processing model.

Why is Stored Program Architecture Important?

 Flexible programming: Programs can be loaded, modified, and executed


easily.
 Automatic execution: No need to rewire hardware to run different programs.
 Efficient use of hardware: One memory for both code and data reduces
complexity.
 Laid the foundation of modern computing.

How is Stored Program Architecture Structured?

Main Components:

CPU (Central Processing Unit):

 Includes ALU, registers, and control unit

Memory:

 Stores data and instructions

I/O System:

 Handles input/output devices

Instruction Cycle:

 Fetch: Get the instruction from memory


 Decode: Identify the operation
 Execute: Perform the operation (e.g., add, load, store)
Von Neumann Bottleneck

 Instructions and data use the same bus for communication with memory.
 Only one access can happen at a time (either data or instruction).
 This causes a bottleneck and slows down performance, especially in high-
speed computing.

General-Purpose Cache-Based Microprocessor Architecture

Why the name ?

 General purpose because these microprocessors are designed to execute a


wide range of applications, including scientific computing, everyday
software, and operating systems.
 Cache-based indicates that the design includes multiple levels of cache
memory (like L1, L2) to reduce memory latency and increase speed.
 The term microprocessor refers to a CPU implemented on a single chip.

What is a General-Purpose Cache-Based Microprocessor Architecture?

 It is a hardware architecture for CPUs that:


 Implements the stored-program digital computer model
 Includes arithmetic units (for FP and INT operations), registers,
caches, and control logic
 Executes code using a structured pipeline and execution units
 Though extremely complex, only a small portion of the chip actually
performs computations (INT/FP units); the rest supports data movement and
control.
Components:

 Main Memory
 Memory Interface
 L2 unified cache
 L1 Data cache
 L1 instruction Cache
 Memory Queue
 INT/FP Queue
 FP register File
 INT register File
 Shift Mask
 INT Operation
 LD : Load ( data transfer memory to Register )
 ST : Store ( data transfer Register to memory )
 FP mult : Floating Point Multiply
 FP add : Floating Point Add
Example :

LOAD R1, [R2 + 8] : Load value from memory address R2 + 8 into register R1

Components Used:

 L1 Instruction Cache – fetches the LOAD instruction.


 INT Reg. File – provides the address base (value of R2).
 Memory Queue – queues the load request.
 L1 Data Cache – checks if data is cached.
 L2 Unified Cache / Main Memory – accessed if L1 cache misses.
 LD Unit – performs the actual data fetch.
 INT Reg. File – stores the result in R1.

Transistors galore: Moore’s Law

 Even before personal computers, computers were already used in science .


Every ~2 years, the number of transistors in chips doubles.
 More transistors = more complex logic = better performance.
 Even though chip design methods changed (e.g., from 90nm → 5nm),
the doubling trend has stayed on track.
 More transistors allowed: Better CPUs , More cores ,More cache ,Faster
instructions.
 Moore’s Law means numbers of transistors are increased performance also
increased.

Advanced Techniques Enabled by More Transistors:

 Pipelined Functional Units


 Superscalar Architecture
 Data parallelism through SIMD ( Single Instruction , Multiple Data )
 Out of Order Execution
 Larger caches
 Simplified Instruction Set

Pipelined Functional Units :

 The term “pipelined” multiple steps can operate simultaneously on different


inputs.
 "Functional units" refer to hardware blocks like adders, multipliers, etc.,
inside the CPU that perform specific operations.
 So, pipelined functional units are those CPU parts that break down complex
operations into smaller steps, allowing concurrent execution at different
stages.

You might also like