COMPUTER ORGANIZATION & ARM MICROCONTROLLERS
21ECT602
Department of Electronics & Communication Engineering
Dr. Ambedkar Institute of Technology
Bengaluru-56
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 1
Topics to be covered:
Some Fundamental Concepts
Execution of a Complete Instruction
Multiple Bus Organization
Hard-wired Control
Micro programmed Control
Basic concepts of pipelining
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 2
➢ In this and the next chapter we focus on the processing unit, which executes machine instructions and coordinates the
activities of other units. This unit is often called the Instruction Set Processor (ISP), or simply the processor.
➢ We examine its internal structure and how it performs the tasks of fetching, decoding, and executing instructions of a
program. The processing unit used to be called the central processing unit (CPU).
➢ The term "central" is less appropriate today because many modern computer systems include several processing
units.
➢ The organization of processors has evolved over the years, driven by developments in technology and the need to
provide high performance.
➢ A common strategy in the development of high-performance processors is to make various functional units operate
in parallel as much as possible.
➢ High-performance processors have a pipelined organization where the execution of one instruction is started before
the execution of the preceding instruction is completed.
➢ In another approach, known as superscalar operation, several instructions are fetched and executed at the same
time.
➢ A typical computing task consists of a series of steps specified by a sequence of machine instructions that constitute a
program.
➢ An instruction is executed by carrying out a sequence of more rudimentary operations. These operations and the
means by which they are controlled are the main topic of this chapter.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 3
Lets look into:
Some Fundamental Concepts
Execution of a Complete Instruction
Multiple Bus Organization
Hard-wired Control
Micro programmed Control
Basic concepts of pipelining
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 4
Some Fundamental Concepts
➢ To execute a program, the processor fetches one instruction at a time and performs the operations specified.
Instructions are fetched from successive memory locations until a branch or a jump instruction is encountered.
➢ The processor keeps track of the address of the memory location containing the next instruction to be fetched
using the program counter, PC.
➢ After fetching an instruction, the contents of the PC are updated to point to the next instruction in the sequence.
A branch instruction may load a different value into the PC.
➢ Another key register in the processor is the instruction register, IR. Suppose that each instruction comprises 4
bytes, and that it is stored in one memory word. To execute an instruction, the processor has to perform the
following three steps:
1. Fetch the contents of the memory location pointed to by the PC. The contents of this location are interpreted as
an instruction to be executed. Hence, they are loaded into the IR. Symbolically, this can be written as IR +
[[PC]]
2. Assuming that the memory is byte addressable, increment the contents of the PC by 4, that is, PC + [PC] +4
3. Carry out the actions specified by the instruction in the IR.
➢ In cases where an instruction occupies more than one word, steps 1 and 2 must be repeated as many times as
necessary to fetch the complete instruction. These two steps are usually referred to as the fetch phase; step 3
constitutes the execution phase.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 5
Some Fundamental Concepts
➢ To study these operations in detail, we first need to
examine the internal organization of the processor. The
main building blocks of a processor were introduced.
➢ They can be organized and interconnected in a variety of
ways. We will start with a very simple organization.
➢ Figure 1 shows an organization in which the arithmetic
and logic unit (ALU) and all the registers are
interconnected via a single common bus.
➢ This bus is internal to the processor and should not be
confused with the external bus that connects the processor
to the memory and I/O devices.
➢ The data and address lines of the external memory bus are
shown in Figure 1 connected to the internal processor bus
via the memory data register, MDR, and the memory
address register, MAR, respectively.
➢ Register MDR has two inputs and two outputs. Data may
be loaded into MDR either from the memory bus or from
the internal processor bus.
Figure 1: Single bus organization of datapath
inside a processor
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 6
Some Fundamental Concepts
➢ The data stored in MDR may be placed on either bus. The
input of MAR is connected to the internal bus, and its output is
connected to the external bus.
➢ The control lines of the memory bus are connected to the
instruction decoder and control logic block.
➢ This unit is responsible for issuing the signals that control the
operation of all the units inside the processor and for
interacting with the memory bus.
➢ The number and use of the processor registers R0 through
R(n − 1) vary considerably from one processor to another.
➢ Registers may be provided for general-purpose use by the
programmer. Some may be dedicated as special-purpose
registers, such as index registers or stack pointers.
➢ Three registers, Y, Z, and TEMP in Figure 1, have not been
mentioned before. These registers are transparent to the
programmer, that is, the programmer need not be concerned
with them because they are never referenced explicitly by any
instruction.
Figure 1: Single bus organization of datapath
inside a processor
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 7
Some Fundamental Concepts
➢ They are used by the processor for temporary storage during
execution of some instructions. These registers are never used
for storing data generated by one instruction for later use by
another instruction.
➢ The multiplexer MUX selects either the output of register Y
or a constant value to be provided as input A of the ALU.
➢ The constant 4 is used to increment the contents of the
program counter. We will refer to the two possible values of
the MUX control input Select as Select4 and SelectY for
selecting the constant 4 or register Y, respectively.
➢ As instruction execution progresses, data are transferred from
one register to another, often passing through the ALU to
perform some arithmetic or logic operation.
➢ The instruction decoder and control logic unit is
responsible for implementing the actions specified by the
instruction loaded in the IR register.
➢ The decoder generates the control signals needed to select the
registers involved and direct the transfer of data.
Figure 1: Single bus organization of datapath
inside a processor
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 8
Some Fundamental Concepts
➢ The registers, the ALU, and the interconnecting bus
are collectively referred to as the datapath.
➢ With few exceptions, an instruction can be executed
by performing one or more of the following
operations in some specified sequence:
• Transfer a word of data from one processor register to
another or to the ALU
• Perform an arithmetic or a logic operation and store
the result in a processor register
• Fetch the contents of a given memory location and
load them into a processor register
• Store a word of data from a processor register into a
given memory location
➢ We now consider in detail how each of these
operations is implemented, using the simple
processor model in Figure 1.
Figure 1: Single bus organization of datapath
inside a processor
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 9
Some Fundamental Concepts
REGISTER TRANSFERS
➢ Instruction execution involves a sequence of steps in which data are
transferred from one register to another.
➢ For each register, two control signals are used to place the contents
of that register on the bus or to load the data on the bus into the
register.
➢ This is represented symbolically in Figure 2. The input and output of
register Ri are connected to the bus via switches controlled by the
signals Riin and Riout, respectively.
➢ When Riin is set to 1, the data on the bus are loaded into Ri.
Similarly, when Riout is set to 1, the contents of register Ri are
placed on the bus. While Riout is equal to 0, the bus can be used for
transferring data from other registers.
➢ Suppose that we wish to transfer the contents of register R1 to
register R4. This can be accomplished as follows:
➢ Enable the output of register R1 by setting R1out to 1. This places
Figure 2: Input and output gating
the contents of R1 on the processor bus. for registers in Figure 1
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 10
Some Fundamental Concepts
➢ Enable the input of register R4 by setting R4in to 1. This loads data
from the processor bus into register R4.
➢ All operations and data transfers within the processor take place
within time periods defined by the processor clock. The control
signals that govern a particular transfer are asserted at the start of the
clock cycle. In our example, Rlout and R4in are set to 1.
➢ The registers consist of edge-triggered flip-flops. Hence, at the next
active edge of the clock, the flip-flops that constitute R4 will load the
data present at their inputs.
➢ At the same time, the control signals R1out and R4in will return to 0.
We will use this simple model of the timing of data transfers for the
rest of this chapter. However, we should point out that other schemes
are possible. For example, data transfers may use both the rising and
falling edges of the clock.
➢ Also, when edge-triggered flip-flops are not used, two or more clock
signals may be needed to guarantee proper transfer of data. This is
Figure 2: Input and output gating
known as multiphase clocking.
for registers in Figure 1
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 11
Some Fundamental Concepts
➢ An implementation for one bit of register Ri is shown in Figure 3 as an example.
➢ A two-input multiplexer is used to select the data applied to the input of an edge-triggered D flip-flop. When
the control input Riin is equal to 1, the multiplexer selects the data on the bus. This data will be loaded into
the flip-flop at the rising edge of the clock. When Riin is equal to 0, the multiplexer feeds back the value
currently stored in the flip-flop.
➢ The Q output of the flip-flop is connected to the bus via a tri-state gate. When Riout is equal to 0, the gate's
output is in the high-impedance (electrically disconnected) state. This corresponds to the open-circuit state
of a switch. When Riout = 1, the gate drives the bus to 0 or 1, depending on the value of Q.
Figure 3: Input and output gating for one register bit
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 12
Some Fundamental Concepts
PERFORMING AN ARITHMETIC OR Logic Operation
➢ The ALU is a combinational circuit that has no internal storage. It performs
arithmetic and logic operations on the two operands applied to its A and B
inputs.
➢ In Figures 1 and 2, one of the operands is the output of the multiplexer
MUX and the other operand Figure 3 Input and output gating for one
register bit. is obtained directly from the bus.
➢ The result produced by the ALU is stored temporarily in register Z.
Therefore, a sequence of operations to add the contents of register R1 to
those of register R2 and store the result in register R3 is
1. Rlout, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in
➢ The signals whose names are given in any step are activated for the duration
of the clock cycle corresponding to that step. All other signals are inactive.
➢ Hence, in step 1, the output of register R1 and the input of register Y are
Figure 2: Input and output gating
enabled, causing the contents of R1 to be transferred over the bus to Y.
for registers in Figure 1
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 13
Some Fundamental Concepts
➢ In step 2, the multiplexer's Select signal is set to SelectY, causing the
multiplexer to gate the contents of register Y to input A of the ALU. At the
same time, the contents of register R2 are gated onto the bus and, hence, to
input B.
➢ The function performed by the ALU depends on the signals applied to its
control lines. In this case, the Add line is set to 1, causing the output of the
ALU to be the sum of the two numbers at inputs A and B.
➢ This sum is loaded into register Z because its input control signal is activated.
In step 3, the contents of register Z are transferred to the destination register,
R3. This last transfer cannot be carried out during step 2, because only one
register output can be connected to the bus during any clock cycle.
➢ In this introductory discussion, we assume that there is a dedicated signal for
each function to be performed. For example, we assume that there are separate
control signals to specify individual ALU operations, such as Add, Subtract,
XOR, and so on.
➢ In reality, some degree of encoding is likely to be used. For example, if the
ALU can perform eight different operations, three control signals would Figure 2: Input and output gating
suffice to specify the required operation. for registers in Figure 1
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 14
Some Fundamental Concepts
FETCHING A WORD FROM MEMORY
➢ To fetch a word of information from memory, the processor has to specify the address of the memory location
where this information is stored and request a Read operation.
➢ This applies whether the information to be fetched represents an instruction in a program or an operand specified
by an instruction. The processor transfers the required address to the MAR, whose output is connected to the
address lines of the memory bus.
➢ At the same time, the processor uses the control lines of the memory bus to indicate that a Read operation is
needed.
➢ When the requested data are received from the
memory they are stored in register MDR, from
where they can be transferred to other registers in
the processor.
➢ The connections for register MDR are illustrated in
Figure 4.
➢ It has four control signals: MDRin and MDRout
control the connection to the internal bus, and
MDRinE and MDRoutE control the connection to Figure 4: Connection and control signals for register
the external bus. MDR
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 15
Some Fundamental Concepts
➢ The circuit in Figure 3 is easily modified to provide the additional connections. A three-input multiplexer can
be used, with the memory bus data line connected to the third input. This input is selected when MDRinE 1.
A second tri-state gate, controlled by MDRoutE can be used to connect the output of the flip-flop to the
memory bus.
➢ During memory Read and Write operations, the timing of internal processor operations must be coordinated
with the response of the addressed device on the memory bus. The processor completes one internal data
transfer in one clock cycle.
➢ The speed of operation of the addressed device, on
the other hand, varies with the device. We saw that
modern processors include a cache memory on the
same chip as the processor.
➢ Typically, a cache will respond to a memory read
request in one clock cycle. However, when a cache
miss occurs, the request is forwarded to the main
memory, which introduces a delay of several clock
cycles.
Figure 4: Connection and control signals for register
MDR
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 16
Some Fundamental Concepts
➢ A read or write request may also be intended for a register in a memory-mapped I/O device. Such I/O
registers are not cached, so their accesses always take a number of clock cycles.
➢ To accommodate the variability in response time, the processor waits until it receives an indication that the
requested Read operation has been completed.
➢ We will assume that a control signal called Memory-Function-Completed (MFC) is used for this purpose.
The addressed device sets this signal to 1 to indicate that the contents of the specified location have been read
and are available on the data lines of the memory bus.
➢ As an example of a read operation, consider the instruction Move (R1),R2. The actions needed to execute this
instruction are:
1. MAR [R1]
2. Start a Read operation on the memory bus
3. Wait for the MFC response from the memory
4. Load MDR from the memory bus
5. R2 [MDR]
➢ These actions may be carried out as separate steps, but some can be combined into a single step. Each action
can be completed in one clock cycle, except action 3 which requires one or more clock cycles, depending on
the speed of the addressed device.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 17
Some Fundamental Concepts
➢ For simplicity, let us assume that the output of MAR is
enabled all the time. Thus, the contents of MAR are
always available on the address lines of the memory bus.
This is the case when the processor is the bus master.
➢ When a new address is loaded into MAR, it will appear on
the memory bus at the beginning of the next clock cycle,
as shown in Figure 5.
➢ A Read control signal is activated at the same time MAR
is loaded. This signal will cause the bus interface circuit to
send a read command, MR, on the bus. With this
arrangement, we have combined actions 1 and 2 above
into a single control step.
➢ Actions 3 and 4 can also be combined by activating
control signal MDRINE while waiting for a response from
the memory.
➢ Thus, the data received from the memory are loaded into
MDR at the end of the clock cycle in which the MFC
signal is received. Figure 5: Timing of a memory Read operation
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 18
Some Fundamental Concepts
➢ In the next clock cycle, MDRout is activated to transfer
the data to register R2.
➢ This means that the memory read operation requires
three steps, which can be described by the signals being
activated as follows:
1. Rout, MARin, Read
2. MDRinE, WMFC
3. MDRout, R2in
➢ where WMFC is the control signal that causes the
processor's control circuitry to wait for the arrival of the
MFC signal.
➢ Figure 5 shows that MDRinE is set to 1 for exactly the
same period as the read command, MR.
➢ Hence, in subsequent discussion, we will not specify the
value of MDR in explicitly, with the understanding that it
is always equal to MR.
Figure 5: Timing of a memory Read operation
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 19
Some Fundamental Concepts
STORING A WORD IN MEMORY
➢ Writing a word into a memory location follows a similar procedure. The desired
address is loaded into MAR.
➢ Then, the data to be written are loaded into MDR, and a Write command is issued.
Hence, executing the instruction Move R2,(R1) requires the following sequence:
1. R1 out, MARin
2. R2out, MDRin, Write
3. MDRoutE, WMFC
➢ As in the case of the read operation, the Write control signal causes the memory bus
interface hardware to issue a Write command on the memory bus.
➢ The processor remains in step 3 until the memory operation is completed and an
MFC response is received.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 20
Lets look into:
Some Fundamental Concepts
Execution of a Complete Instruction
Multiple Bus Organization
Hard-wired Control
Micro programmed Control
Basic concepts of pipelining
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 21
Execution of a Complete Instruction
➢ Let us now put together the sequence of elementary operations required to execute one instruction. Consider the instruction
Add (R3), R1 which adds the contents of a memory location pointed to by R3 to register R1.
➢ Executing this instruction requires the following actions:
1. Fetch the instruction.
2. Fetch the first operand (the contents of the memory location pointed to by R3).
3. Perform the addition.
4. Load the result into R1.
➢ Figure 6 gives the sequence of control steps required to perform these operations for the single-bus architecture of Figure 1.
Instruction execution proceeds as follows.
➢ In step 1, the instruction fetch operation is initiated by loading the contents of the
PC into the MAR and sending a Read request to the memory. The Select signal is
set to Select4, which causes the multiplexer MUX to select the constant 4.
➢ This value is added to the operand at input B, which is the contents of the PC, and
the result is stored in register Z.
➢ The updated value is moved from register Z back into the PC during step 2, while
waiting for the memory to respond. In step 3, the word fetched from the memory is
loaded into the IR.
Figure 6: Control Sequence for execution 0f
the instruction Add(R3), R1
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 22
Execution of a Complete Instruction
➢ Steps 1 through 3 constitute the instruction fetch phase, which is the same for all instructions. The instruction decoding
circuit interprets the contents of the IR at the beginning of step 4.
➢ This enables the control circuitry to activate the control signals for steps 4 through 7, which constitute the execution phase.
The contents of register R3 are transferred to the MAR in step 4, and a memory read operation is initiated.
➢ Then the contents of R1 are transferred to register Y in step 5, to prepare for the addition operation. When the Read operation
is completed, the memory operand is available in register MDR, and the addition operation is performed in step 6.
➢ The contents of MDR are gated to the bus, and thus also to the B input of the
ALU, and register Y is selected as the second input to the ALU by choosing
Select Y. The sum is stored in register Z, then transferred to R1 in step 7. The
End signal causes a new instruction fetch cycle to begin by returning to step 1.
➢ This discussion accounts for all control signals in Figure 7.6 except Yin in step
2. There is no need to copy the updated contents of PC into register Y when
executing the Add instruction.
➢ But, in Branch instructions the updated value of the PC is needed to compute the
Branch target address. To speed up the execution of Branch instructions, this
value is copied into register Y in step 2. Since step 2 is part of the fetch phase,
the same action will be performed for all instructions. This does not cause any Figure 6: Control Sequence for execution 0f
harm because register Y is not used for any other purpose at that time. the instruction Add(R3), R1
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 23
Execution of a Complete Instruction
BRANCH INSTRUCTIONS
➢ A branch instruction replaces the contents of the PC with the branch target address. This address is usually obtained
by adding an offset X, which is given in the branch instruction, to the updated value of the PC. Figure 7 gives a
control sequence that implements an unconditional branch instruction.
➢ Processing starts, as usual, with the fetch phase. This phase ends when the instruction is loaded into the IR in step 3.
➢ The offset value is extracted from the IR by the instruction decoding circuit, which will also perform sign extension
if required.
➢ Since the value of the updated PC is already available in register Y, the offset X is gated onto the bus in step 4, and
an addition operation is performed. The result, which is the branch target address, is loaded into the PC in step 5.
➢ The offset X used in a branch instruction is usually the difference
between the branch target address and the address immediately
following the branch instruction.
➢ For example, if the branch instruction is at location 2000 and if the
branch target address is 2050, the value of X must be 46. The reason for
this can be readily appreciated from the control sequence in Figure 7.
➢ The PC is incremented during the fetch phase, before knowing the type Figure 7: Control Sequence for an
unconditional instruction
of instruction being executed.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 24
Execution of a Complete Instruction
➢ Thus, when the branch address is computed in step 4, the PC value used is the updated value, which
points to the instruction following the branch instruction in the memory.
➢ Consider now a conditional branch. In this case, we need to check the status of the condition codes
before loading a new value into the PC. For example, for a Branch-on-negative (Branch<0)
instruction, step 4 in Figure 7 is replaced with
Offset-field-of-IR out, Add, Zin, If N = 0 then End
➢ Thus, if N=0 the processor returns to step 1 immediately after step 4. If N=1, step 5 is performed to
load a new value into the PC, thus performing the branch operation.
Figure 7: Control Sequence for an
unconditional instruction
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 25
Lets look into:
Some Fundamental Concepts
Execution of a Complete Instruction
Multiple Bus Organization
Hard-wired Control
Micro programmed Control
Basic concepts of pipelining
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 26
Multiple Bus Organization
➢ We used the simple single-bus structure of Figure 1 to illustrate the
basic ideas. The resulting control sequences in Figures 6 and 7 are
quite long because only one data item can be transferred over the bus
in a clock cycle.
➢ To reduce the number of steps needed, most commercial processors
provide multiple internal paths that enable several transfers to take
place in parallel.
➢ Figure 8 depicts a three-bus structure used to connect the registers
and the ALU of a processor. All general-purpose registers are
combined into a single block called the register file.
➢ In VLSI technology, the most efficient way to implement a number of
registers is in the form of an array of memory cells similar to those
used in the implementation of random-access memories (RAMs). The
register file in Figure 8 is said to have three ports.
➢ There are two outputs, allowing the contents of two different registers
to be accessed simultaneously and have their contents placed on buses Figure 8: Three bus organization of the
datapath
A and B.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 27
Multiple Bus Organization
➢ The third port allows the data on bus C to be loaded into a third
register during the same clock cycle.
➢ Buses A and B are used to transfer the source operands to the A and B
inputs of the ALU, where an arithmetic or logic operation may be
performed.
➢ The result is transferred to the destination over bus C. If needed, the
ALU may simply pass one of its two input operands unmodified to
bus C. We will call the ALU control signals for such an operation
R=A or R=B. The three-bus arrangement obviates the need for
registers Y and Z in Figure 1.
➢ A second feature in Figure 8 is the introduction of the Incrementer
unit, which is used to increment the PC by 4. Using the Incrementer
eliminates the need to add 4 to the PC using the main ALU, as was
done in Figures 6 and 7.
➢ The source for the constant 4 at the ALU input multiplexer is still
useful. It can be used to increment other addresses, such as the
Figure 8: Three bus organization of the
memory addresses in LoadMultiple and StoreMultiple instructions. datapath
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 28
Multiple Bus Organization
➢ Consider the three-operand instruction
Add R4,R5,R6
➢ The control sequence for executing this instruction is given in Figure 9.
➢ In step 1, the contents of the PC are passed through the ALU, using the R=B control signal, and loaded into the MAR
to start a memory read operation. At the same time the PC is incremented by 4.
➢ Note that the value loaded into MAR is the original contents of the PC. The incremented value is loaded into the PC at
the end of the clock cycle and will not affect the contents of MAR. In step 2, the processor waits for MFC and loads
the data received into MDR, then transfers them to IR in step 3.
➢ Finally, the execution phase of the instruction requires only one control step to complete, step 4. By providing more
paths for data transfer a significant reduction in the number of clock cycles needed to execute an instruction is
achieved.
Figure 9: Control sequence for the instruction AddR4,R5,R6 for
the three bus organization in Figure 8
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 29
Lets look into:
Some Fundamental Concepts
Execution of a Complete Instruction
Multiple Bus Organization
Hard-wired Control
Micro programmed Control
Basic concepts of pipelining
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 30
Hard-wired Control
➢ To execute instructions, the processor must have some means of generating the control signals needed in the proper
sequence. Computer designers use a wide variety of techniques to solve this problem. The approaches used fall into
one of two categories: hardwired control and microprogrammed control.
➢ Consider the sequence of control signals given in Figure 6. Each step in this sequence is completed in one clock
period. A counter may be used to keep track of the control steps, as shown in Figure 10.
➢ Each state, or count, of this counter corresponds to one control step.
➢ The required control signals are determined by the following
information:
• Contents of the control step counter
• Contents of the instruction register
• Contents of the condition code flags
• External input signals, such as MFC and interrupt requests
➢ To gain insight into the structure of the control unit, we start with a
simplified view of the hardware involved. The decoder/encoder block
in Figure 10 is a combinational circuit that generates the required
control outputs, depending on the state of all its inputs.
Figure 10: Control unit organization
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 31
Hard-wired Control
➢ By separating the decoding and encoding functions, we obtain the more detailed block diagram in Figure 11.
➢ The step decoder provides a separate signal line for each step, or time slot, in the control sequence. Similarly, the
output of the instruction decoder consists of a separate line for each machine instruction.
➢ For any instruction loaded in the IR, one of the output lines INS₁ through INSn is set to 1, and all other lines are set to
0.
➢ The input signals to the encoder block in Figure 11 are combined
to generate the individual control signals Yin, PCout, Add, End,
and so on.
Figure 11: Separation of the decoding and
encoding functions
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 32
Hard-wired Control
➢ An example of how the encoder generates the Zin control signal for the processor organization in Figure 1 is given
in Figure 12.
➢ This circuit implements the logic function
Zin = T1+T6 ADD + T4 BR +.... [7.1]
➢ This signal is asserted during time slot T₁ for all instructions, during T6 for an Add instruction, during T4 for an
unconditional branch instruction, and so on.
➢ The logic function for Zin is derived from the control sequences in Figures 6 and 7.
Figure 12: Generation of the Zin control
signal for the processor in Figure 1
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 33
Hard-wired Control
➢ As another example, Figure 13 gives a circuit that generates the End control signal from the logic function
End = T7 · ADD + T5 · BR+ (T5 · N+ T4 · Ñ) · BRN + . . .[7.2]
➢ The End signal starts a new instruction fetch cycle by resetting the control step counter to its starting value.
➢ Figure 11 contains another control signal called RUN. When set to 1, RUN causes the counter to be incremented by one at
the end of every clock cycle.
➢ When RUN is equal to 0, the counter stops counting. This is needed whenever the WMFC signal is issued, to cause the
processor to wait for the reply from the memory.
➢ The control hardware shown in Figure 10 or 11 can be viewed
as a state machine that changes from one state to another in
every clock cycle, depending on the contents of the instruction
register, the condition codes, and the external inputs.
➢ The outputs of the state machine are the control signals. The
sequence of operations carried out by this machine is
determined by the wiring of the logic elements, hence the name
"hardwired.“
➢ A controller that uses this approach can operate at high speed.
However, it has little flexibility, and the complexity of the instruction
Figure 13: Generation of the End control signal
set it can implement is limited.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 34
Hard-wired Control
A COMPLETE PROCESSOR
➢ A complete processor can be designed using the structure shown in
Figure 14.
➢ This structure has an instruction unit that fetches instructions from an
instruction cache or from the main memory when the desired
instructions are not already in the cache. It has separate processing units
to deal with integer data and floating-point data.
➢ Each of these units can be organized as shown in Figure 8. A data
cache is inserted between these units and the main memory. Using
separate caches for instructions and data is common practice in many
processors today.
➢ Other processors use a single cache that stores both instructions and
data. The processor is connected to the system bus and, hence, to the rest
of the computer, by means of a bus interface.
➢ Although we have shown just one integer and one floating-point unit in
Figure 14, a processor may include several units of each type to
Figure 14: Block diagram of a complete
increase the potential for concurrent operations. processor.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 35
Lets look into:
Some Fundamental Concepts
Execution of a Complete Instruction
Multiple Bus Organization
Hard-wired Control
Micro programmed Control
Basic concepts of pipelining
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 36
Micro programmed Control
➢ In previous Section, we saw how the control signals required inside the processor can be generated using a
control step counter and a decoder/encoder circuit.
➢ Now we discuss an alternative scheme, called microprogrammed control, in which control signals are
generated by a program similar to machine language programs.
➢ First, we introduce some common terms. A control word (CW) is a word whose individual bits represent the
various control signals in Figure 11.
➢ Each of the control steps in the control sequence of an instruction defines a unique combination of 1s and 0s in
the CW. The CWs corresponding to the 7 steps of Figure 6 are shown in Figure 15.
➢ We have assumed that SelectY is represented by Select
= 0 and Select4 by Select = 1. A sequence of CWs
corresponding to the control sequence of a machine
instruction constitutes the microroutine for that
instruction, and the individual control words in this
microroutine are referred to as microinstructions.
➢ The microroutines for all instructions in the instruction
set of a computer are stored in a special memory
called the control store. Figure 15 An example of microinstructions for Figure 6.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 37
Micro programmed Control
➢ The control unit can generate the control signals for any instruction by sequentially reading the CWs of the
corresponding microroutine from the control store. This suggests organizing the control unit as shown in Figure 16.
➢ To read the control words sequentially from the control store, a microprogram counter (uPC) is used. Every time
a new instruction is loaded into the IR, the output of the block labeled "starting address generator" is loaded into
the μPC.
➢ The μPC is then automatically incremented by the clock, causing successive
microinstructions to be read from the control store. Hence, the control signals
are delivered to various parts of the processor in the correct sequence.
➢ One important function of the control unit cannot be implemented by the simple
organization in Figure 16.
➢ This is the situation that arises when the control unit is required to check the
status of the condition codes or external inputs to choose between alternative
courses of action.
➢ In the case of hardwired control, this situation is handled by including an
appropriate logic function, as in Equation 7.2, in the encode circuitry. In
microprogrammed control, an alternative approach is to use conditional branch Figure 16 Block diagram of a
microinstructions. microprogrammed control unit.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 38
Micro programmed Control
➢ In addition to the branch address, these microinstructions specify which of the external inputs, condition codes,
or, possibly, bits of the instruction register, should be checked as a condition for branching to take place.
➢ The instruction Branch<0 may now be implemented by a microroutine such as that shown in Figure 17. After
loading this instruction into IR, a branch microinstruction transfers control to the corresponding microroutine,
which is assumed to start at location 25 in the control store.
➢ This address is the output of the starting address generator block in Figure 16. The microinstruction at location
25 tests the N bit of the condition codes.
➢ If this bit is equal to 0, a branch takes place to location
0 to fetch a new machine instruction. Otherwise, the
microinstruction at location 26 is executed to put the
branch target address into register Z, as in step 4 in
Figure 7.
➢ The microinstruction in location 27 loads this address
into the PC.
Figure 17 Microroutine for the instruction Branch <0.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 39
Micro programmed Control
➢ To support microprogram branching, the organization of the control unit should be modified as shown in Figure
18. The starting address generator block of Figure 16 becomes the starting and branch address generator.
➢ This block loads a new address into the μPC when a microinstruction instructs it to do so. To allow
implementation of a conditional branch, inputs to this block consist of the external inputs and condition codes
as well as the contents of the instruction register.
➢ In this control unit, the μPC is incremented every time a new microinstruction is fetched from the microprogram
memory, except in the following situations:
1. When a new instruction is loaded into the IR, the μPC is loaded
with the starting address of the microroutine for that instruction.
2. When a Branch microinstruction is encountered and the branch
condition is satisfied, the μPC is loaded with the branch
address.
3. When an End microinstruction is encountered, the μPC is
loaded with the address of the first CW in the microroutine for
the instruction fetch cycle (this address is 0 in Figure 17).
Figure 18 Organization of the control unit to
allow conditional branching in the
microprogram.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 40
Lets look into:
Some Fundamental Concepts
Execution of a Complete Instruction
Multiple Bus Organization
Hard-wired Control
Micro programmed Control
Basic concepts of pipelining
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 41
Basic concepts of pipelining
➢ The basic building blocks of a computer are introduced in preceding chapters. In this chapter, we discuss in
detail the concept of pipelining, which is used in modern computers to achieve high performance. We begin
by explaining the basics of pipelining and how it can lead to improved performance.
➢ Then we examine machine instruction features that facilitate pipelined execution, and we show that the
choice of instructions and instruction sequencing can have a significant effect on performance.
➢ Pipelined organization requires sophisticated compilation techniques, and optimizing compilers have been
developed for this purpose. Among other things, such compilers rearrange the sequence of operations to
maximize the benefits of pipelined execution.
BASIC CONCEPTS
➢ The speed of execution of programs is influenced by many factors. One way to improve performance is to
use faster circuit technology to build the processor and the main memory.
➢ Another possibility is to arrange the hardware so that more than one operation can be performed at the same
time. In this way, the number of operations performed per second is increased even though the elapsed time
needed to perform any one operation is not changed.
➢ We have encountered concurrent activities several times before. DMA devices make this possible because
they can perform I/O transfers independently once these transfers are initiated by the processor.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 42
Basic concepts of pipelining
➢ Pipelining is a particularly effective way of organizing concurrent
activity in a computer system.
➢ The basic idea is very simple. It is frequently encountered in
manufacturing plants, where pipelining is commonly known as an
assembly-line operation. Readers are undoubtedly familiar with the
assembly line used in car manufacturing.
➢ The first station in an assembly line may prepare the chassis of a car,
the next station adds the body, the next one installs the engine, and so
on.
➢ While one group of workers is installing the engine on one car, another
group is fitting a car body on the chassis of another car, and yet another
group is preparing a new chassis for a third car.
➢ It may take days to complete work on a given car, but it is possible to
have a new car rolling off the end of the assembly line every few
minutes.
➢ Consider how the idea of pipelining can be used in a computer. The
processor executes a program by fetching and executing instructions, one
after the other. Figure 1 Basic idea of instruction pipelining.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 43
Basic concepts of pipelining
➢ Let F; and E; refer to the fetch and execute steps for instruction I;.
Execution of a program consists of a sequence of fetch and execute
steps, as shown in Figure 1a.
➢ Now consider a computer that has two separate hardware units, one for
fetching instructions and another for executing them, as shown in
Figure 1b.
➢ The instruction fetched by the fetch unit is deposited in an
intermediate storage buffer, B1. This buffer is needed to enable the
execution unit to execute the instruction while the fetch unit is fetching
the next instruction.
➢ The results of execution are deposited in the destination location
specified by the instruction. For the purposes of this discussion, we
assume that both the source and the destination of the data operated on
by the instructions are inside the block labeled "Execution unit."
➢ The computer is controlled by a clock whose period is such that the
fetch and execute steps of any instruction can each be completed in
one clock cycle. Operation of the computer proceeds as in Figure 1c.
Figure 1 Basic idea of instruction pipelining.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 44
Basic concepts of pipelining
➢ In the first clock cycle, the fetch unit fetches an instruction I₁ (step F₁)
and stores it in buffer B1 at the end of the clock cycle. In the second
clock cycle, the instruction fetch unit proceeds with the fetch
operation for instruction I₂ (step F₂2).
➢ Meanwhile, the execution unit performs the operation specified by
instruction I1, which is available to it in buffer B1 (step E₁). By the
end of the second clock cycle, the execution of instruction I₁ is
completed and instruction I₂ is available. Instruction I2 is stored in B1,
replacing I₁, which is no longer needed. Step E₂ is performed by the
execution unit during the third clock cycle, while instruction I3 is
being fetched by the fetch unit.
➢ In this manner, both the fetch and execute units are kept busy all the
time. If the pattern in Figure 1c can be sustained for a long time, the
completion rate of instruction execution will be twice that achievable
by the sequential operation depicted in Figure 1a.
➢ In summary, the fetch and execute units in Figure 1b constitute a two-
stage pipeline in which each stage performs one step in processing an
instruction. Figure 1 Basic idea of instruction pipelining.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 45
Basic concepts of pipelining
➢ An inter- stage storage buffer, B1, is needed to hold the information
being passed from one stage to the next. New information is loaded
into this buffer at the end of each clock cycle.
➢ The processing of an instruction need not be divided into only two
steps. For example, a pipelined processor may process each
instruction in four steps, as follows:
F Fetch: read the instruction from the memory.
D Decode: decode the instruction and fetch the source operand(s).
E Execute: perform the operation specified by the instruction.
W Write: store the result in the destination location.
➢ The sequence of events for this case is shown in Figure 2a. Four
instructions are in progress at any given time. This means that four
distinct hardware units are needed, as shown in Figure 2b.
➢ These units must be capable of performing their tasks
simultaneously and without interfering with one another.
Information is passed from one unit to the next through a storage
buffer. Figure 2: A 4-stage pipeline.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 46
Basic concepts of pipelining
➢ As an instruction progresses through the pipeline, all the information
needed by the stages downstream must be passed along.
➢ For example, during clock cycle 4, the information in the buffers is as
follows:
• Buffer B1 holds instruction I3, which was fetched in cycle 3 and
is being decoded by the instruction-decoding unit.
• Buffer B2 holds both the source operands for instruction I₂ and the
specification of the operation to be performed.
• This is the information produced by the decoding hardware in cycle
3. The buffer also holds the information needed for the write step of
instruction 12 (step W2).
• Even though it is not needed by stage E, this information must be
passed on to stage W in the following clock cycle to enable that
stage to perform the required Write operation.
• Buffer B3 holds the results produced by the execution unit and the
destination information for instruction 1 .
Figure 2: A 4-stage pipeline.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 47
Basic concepts of pipelining
ROLE OF CACHE MEMORY
➢ Each stage in a pipeline is expected to complete its
operation in one clock cycle. Hence, the clock period should
be sufficiently long to complete the task being performed in
any stage.
➢ If different units require different amounts of time, the clock
period must allow the longest task to be completed. A unit
that completes its task early is idle for the remainder of the
clock period.
➢ Hence, pipelining is most effective in
improving performance if the tasks being performed in
different stages require about the same amount of time.
➢ This consideration is particularly important for the
instruction fetch step, which is assigned one clock period in
Figure 2a.
Figure 2 A 4-stage pipeline.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 48
Basic concepts of pipelining
➢ The clock cycle has to be equal to or greater than the time
needed to complete a fetch operation.
➢ However, the access time of the main memory may be as much
as ten times greater than the time needed to perform basic
pipeline stage operations inside the processor, such as adding
two numbers.
➢ Thus, if each instruction fetch required access to the main
memory, pipelining would be of little value.
➢ The use of cache memories solves the memory access problem.
In particular, when a cache is included on the same chip as the
processor, access time to the cache is usually the same as the time
needed to perform other basic operations inside the processor.
➢ This makes it possible to divide instruction fetching and
processing into steps that are more or less equal in duration.
➢ Each of these steps is performed by a different pipeline stage,
and the clock period is chosen to correspond to the longest one.
Figure 2 A 4-stage pipeline.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 49
Basic concepts of pipelining
PIPELINE PERFORMANCE
➢The pipelined processor in Figure 2 completes the processing of one instruction in each clock
cycle, which means that the rate of instruction processing is four times that of sequential
operation. The potential increase in performance resulting from pipelining is proportional to the
number of pipeline stages.
➢ However, this increase would be achieved only if pipelined operation as depicted in Figure 2a
could be sustained without interruption throughout program execution. Unfortunately, this is not
the case.
➢For a variety of reasons, one of the pipeline stages may not be able to complete its processing
task for a given instruction in the time allotted.
➢For example, stage E in the four-stage pipeline of Figure 2b is responsible for arithmetic and
logic operations, and one clock cycle is assigned for this task.
➢Although this may be sufficient for most operations, some operations, such as divide, may
require more time to complete.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 50
Basic concepts of pipelining
➢ Figure 3 shows an example in which the operation specified in instruction I2 requires three cycles to complete,
from cycle 4 through cycle 6. Thus, in cycles 5 and 6, the Write stage must be told to do nothing, because it has
no data to work with.
➢ Meanwhile, the information in buffer B2 must remain intact until the Execute stage has completed its operation.
This means that stage 2 and, in turn, stage 1 are blocked from accepting few instructions because the information
in B1 cannot be overwritten. Thus, steps D4 and Fs must be postponed as shown.
➢ Pipelined operation in Figure 3 is said to have
been stalled for two clock cycles.
➢ Normal pipelined operation resumes in cycle 7.
Any condition that causes the pipeline to stall is
called a hazard. We have just seen an example of
a data hazard.
➢ A data hazard is any condition in which either the
source or the destination operands of an instruction
are not available at the time expected in the
pipeline. As a result some operation has to be
Figure 3 Effect of an execution operation taking more than
delayed, and the pipeline stalls. one clock cycle
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 51
Basic concepts of pipelining
➢ The pipeline may also be stalled because of a delay in the availability of an instruction. For example, this may
be a result of a miss in the cache, requiring the instruction to be fetched from the main memory.
➢ Such hazards are often called control hazards or instruction hazards. The effect of a cache miss on pipelined
operation is illustrated in Figure 4.
➢ Instruction I₁ is fetched from the cache in cycle 1,
and its execution proceeds normally. However, the
fetch operation for instruction I2, which is started
in cycle 2, results in a cache miss.
➢ The instruction fetch unit must now suspend any
further fetch requests and wait for I2 to arrive. We
assume that instruction I₂ is received and loaded
into buffer B1 at the end of cycle 5. The pipeline
resumes its normal operation at that point.
➢ 'An alternative representation of the operation of a
pipeline in the case of a cache miss is shown in
Figure 4b.
Figure 4 Pipeline stall caused by a cache miss in F2
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 52
Basic concepts of pipelining
➢ This figure gives the function performed by each pipeline stage in each clock cycle. Note that the Decode unit is idle
in cycles 3 through 5, the Execute unit is idle in cycles 4 through 6, and the Write unit is idle in cycles 5 through 7.
Such idle periods are called stalls. They are also often referred to as bubbles in the pipeline. Once created as a
result of a delay in one of the pipeline stages, a bubble moves downstream until it reaches the last unit.
➢ A third type of hazard that may be encountered in pipelined
operation is known as a structural hazard. This is the
situation when two instructions require the use of a given
hardware resource at the same time.
➢ The most common case in which this hazard may arise is in
access to memory. One instruction may need to access
memory as part of the Execute or Write stage while another
instruction is being fetched.
➢ If instructions and data reside in the same cache unit, only
one instruction can proceed and the other instruction is
delayed. Many processors use separate instruction and data
caches to avoid this delay.
Figure 4 Pipeline stall caused by a cache miss in F2
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 53
Basic concepts of pipelining
➢ An example of a structural hazard is shown in Figure 5. This figure shows how the load instruction
Load X(R1),R2
can be accommodated in our example 4-stage pipeline. The memory address, X+[R1], is computed in step E2
in cycle 4, then memory access takes place in cycle 5.
➢ The operand read from memory is written into register R2 in cycle 6. This means that the execution step of this
instruction takes two clock cycles (cycles 4 and 5).
➢ It causes the pipeline to stall for one cycle, because both
instructions 12 and 13 require access to the register file in cycle
6.
➢ Even though the instructions and their data are all available, the
pipeline is stalled because one hardware resource, the register
file, cannot handle two operations at once.
➢ If the register file had two input ports, that is, if it allowed two
simultaneous write operations, the pipeline would not be stalled.
➢ In general, structural hazards are avoided by providing
Figure 5 Effect of a load instruction on
sufficient hardware resources on the processor chip. pipeline timing
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 54
Basic concepts of pipelining
➢ It is important to understand that pipelining does not result in individual instructions being
executed faster; rather, it is the throughput that increases, where throughput is measured by the
rate at which instruction execution is completed.
➢ Any time one of the stages in the pipeline cannot complete its operation in one clock cycle, the
pipeline stalls, and some degradation in performance occurs
➢ Thus, the performance level of one instruction completion in each clock cycle is actually the upper
limit for the throughput achievable in a pipelined processor organized.
➢ An important goal in designing processors is to identify all hazards that may cause the
pipeline to stall and to find ways to minimize their impact.
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 55
For Further Studies:
Reference
Carl Hamacher, Zvonko Vranesic, Safwat Zaky,
Computer Organization, 5th Edition, Tata
McGrawHill, 2002
Basic Processing Unit
Prepared by Prof. Anand H. D., Dept. of ECE, Dr. AIT, Bengaluru-56 52
THANK YOU
Prof. Anand H. D.
M. Tech. (PhD.)
Assistant Professor,
Department of Electronics & Communication Engineering
Dr. Ambedkar Institute of Technology, Bengaluru-56
Email: [email protected]
Phone: 9844518832
57