0% found this document useful (0 votes)
117 views3 pages

A. Instruction-Level Parallelism: Ntroduction

VLIW and superscalar processors aim to increase instruction-level parallelism (ILP) to improve processor throughput. VLIW exposes more parallelism to software by having the compiler schedule instructions, while superscalar does dynamic scheduling in hardware. VLIW simplifies the processor design by removing complex decode and dispatch logic. It relies on the compiler to schedule instructions for parallel execution on functional units, whereas superscalar identifies parallelism at runtime. Both can achieve ILP but take different approaches through static versus dynamic scheduling.

Uploaded by

Jessica Lovey
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views3 pages

A. Instruction-Level Parallelism: Ntroduction

VLIW and superscalar processors aim to increase instruction-level parallelism (ILP) to improve processor throughput. VLIW exposes more parallelism to software by having the compiler schedule instructions, while superscalar does dynamic scheduling in hardware. VLIW simplifies the processor design by removing complex decode and dispatch logic. It relies on the compiler to schedule instructions for parallel execution on functional units, whereas superscalar identifies parallelism at runtime. Both can achieve ILP but take different approaches through static versus dynamic scheduling.

Uploaded by

Jessica Lovey
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd

INTRODUCTION :

Very Long Instruction Word (VLIW) processors [2, 3] are examples of architectures for hich the program pro!ides explicit information regarding parallelism" #he compiler identifies the parallelism in the program and communicates it to the hard are $y specifying hich operations are independent of one another" #his information is of direct !alue to the hard are, since it %no s ith no further chec%ing hich operations it can start executing in the same cycle" In this report, e introduce the &xplicitly 'arallel Instruction (omputing (&'I() style of architecture, an e!olution of VLIW hich has a$sor$ed many of the $est ideas of superscalar processors, al$eit in a form adapted to the &'I( philosophy" &'I( is not so much an architecture as it is a philosophy of ho to $uild IL' processors along ith a set of architectural features that support this philosophy" In this sense &'I( is li%e )I*(+ it denotes a class of architectures, all of hich su$scri$e to a common architectural philosophy" ,ust as there are many distinct )I*( architectures (-e lett.'ac%ard/s '0)I*(, *ilicon 1raphic/s 2I'* and *un/s *'0)() there can $e more than one instruction set architecture (I*0) ithin the &'I( fold" "

A. Instruction-level parallelism
0 common design goal for general.purpose processors is to maximi3e throughput, hich may $e defined $roadly as the amount of or% performed in a gi!en time" 0!erage processor throughput is a function of t o !aria$les4 the a!erage num$er of cloc% cycles re5uired to execute an instruction, and the fre5uency of cloc% cycles" #o increase throughput, then,

a designer could increase the cloc% rate of the architecture, or increase the a!erage instruction-level parallelism(IL') of the architecture" 2odern processor design has focused on executing more instructions in a gi!en num$er of cloc% cycles, that is, increasing IL'" 0 num$er of techni5ues may $e used" 6ne techni5ue, pipelining, is particularly popular $ecause it is relati!ely simple, and can $e used in con7unction ith superscalar and VLIW techni5ues" 0ll modern ('8 architectures are pipelined""

B. Pipelining
0ll instructions are executed in multiple stages" 9or example, a simple processor may ha!e fi!e stages4 first the instruction must $e fetched from cache, then it must $e decoded, the instruction must $e executed, and any memory referenced $y the instruction must $e loaded or stored" 9inally the result of the instruction is stored in registers" #he output from one stage ser!es as the input to the next stage, forming a pipeline of instruction implementation" #hese stages are fre5uently independent of each other, so, if separate hard are is used to perform each stage, multiple instructions may $e :in flight; at once, ith each instruction at a different stage in the pipeline" Ignoring potential pro$lems, the theoretical increase in speed is

proportional to the length of the pipeline4 longer pipelines means more simultaneous in.flight instructions and therefore fe er a!erage cycles per instruction" #he ma7or potential pro$lem ith pipelining is the potential for hazards" 0 ha3ard occurs hen an instruction in the pipeline cannot $e executed" -ennessey and 'atterson identify three types of ha3ards4 structural hazards, here there simply isn<t sufficient hard are to execute all paralleli3a$le instructions at once+ data hazards, here an instruction depends on the result of a pre!ious instruction+ and control hazards, hich arise from instructions hich change the program counter (ie, $ranch instructions)" Various techni5ues exist for managing ha3ards" #he simplest of these is simply to stall the pipeline until the instruction causing the ha3ard has completed"

VLIW :

VLIW architecture 0ll this additional hard are is complex, and contri$utes to the transistor count of the processor" 0ll other things $eing e5ual, more transistors e5uals more po er consumption, more heat, and less on.die space for cache" #hus it seems $eneficial to expose more of the architecture<s parallelism to the programmer" #his ay, not only is the architecture simplified, $ut programmers ha!e more control o!er the hard are, and can ta%e $etter ad!antage of it" VLIW is an architecture designed to help soft are designers extract more parallelism from their soft are than ould $e possi$le using a traditional )I*( design" It is an alternati!e to $etter.%no n superscalar architectures" VLIW is a lot simpler than superscalar designs, $ut has not so far $een commercially successful" 9igure sho s a typical VLIW architecture" =ote the simplified instruction decode and dispatch"

A. ILP in VLIW
VLIW and superscalar approach the IL' pro$lem differently" #he %ey difference $et een the t o is here instruction scheduling is performed4 in a superscalar architecture, scheduling is performed in hard are (and is called dynamic scheduling, $ecause the schedule of a gi!en piece of code may differ depending on the code path follo ed), hereas in a VLIW scheduling is performed in soft are ( static scheduling, $ecause the schedule is :$uilt in to the $inary; $y the compiler or assem$ly language programmer)"

B. Superscalar
8sually, the execution phase of the pipeline ta%es the longest" 6n modern hard are, the execution of the instruction may $e performed $y one of a num$er of functional units" 9or example, integer instructions may $e

executed $y the 0L8, hereas floating.point operations are performed $y the 9'8" 6n a traditional, scalar pipelined architecture, either one or the other of these units ill al ays $e idle, depending on the instruction $eing executed" 6n a superscalar architecture, instructions may $e executed in parallel on multiple functional units" #he pipeline is essentially split after instruction issue"

C. Interloc ing
0nother architecture feature present in some )I*( and VLIW architectures $ut ne!er in superscalar/s is lac% of interloc%s" In a pipelined processor, it is important to ensure that a stall some here in the pipeline on<t result in the machine performing incorrectly" #his could happen if later stages of the pipeline do not detect the stall, and thus proceed as if the stalled stage had completed" #o pre!ent this, most architectures incorporate interloc s on the pipeline stages" )emo!ing interloc%s from the architecture is $eneficial, $ecause they complicate the design and can ta%e time to set up, lo ering the o!erall cloc% rate" -o e!er, doing so means that the compiler (or assem$ly.language programmer) must %no details a$out the timing of pipeline stages for each instruction in the processor, and insert =6's into the code to ensure correctness" #his ma%es code incredi$ly hard are.specific" >oth the architectures studied in detail $elo are fully interloc%ed, though *un<s ill.fated 20,( architecture as not, and relied on fast, uni!ersal ,I# compilation to sol!e the hard are pro$lems"

You might also like