# R代写 | STAT346: Statistical Data Science I

## 1. (16 points) Critical Path.

Given the datapath and delays in the following figure, calculate the time it takes to execute the following
instructions. You may assume that the delay of anything not mentioned in the table (including wires and
ALU control) is zero.

(a) lw \$t0, 8(\$t1)
(b) bne \$t0, \$t1, 0x100

## 2. (10 points) Branch Prediction.

Assume the following instruction mix for a 5-stage MIPS pipeline:

• The processor’s base CPI = 1.
• We use “always branch not taken” scheme for branch prediction.
• 55% of the branches are not taken.
• 35% of the load instructions are immediately followed by an instruction that uses the loaded value.
• There are no other stalls in the pipeline.
• Branch misprediction has a 4 cycle penalty.
• Stalling due to a load instruction has a 2 cycle overhead.

Calculate the CPI of this pipeline.

## 3. (24 points) Pipelining Hazards.

Consider the MIPS assembly code given below.

1 xor r0, r0, r0
3 j L1
4 loop: lw r3, 0(r2)
5 mul r4, r3, r3
6 mul r3, r3, r1
8 div r3, r4, r3
9 sw r3, 0(r2)
11 L1: bne r0, r1, loop

We want to run this code on a 5-stage pipelined processor, with some modifications. The processor is a
typical 5-stage pipeline (F-D-X-M-W), with the following exceptions:

• The multiplier block used to execute the mul instruction is pipelined into four stages:

This means that a multiply instruction runs through the pipeline as follows: F-D-X0-X1-X2-X3-M-W and
up to four multiply instructions maybe in-flight at a time. All other instruction types are blocked from the
execute stage while any of the multiply stages are being used.

• The divider block used to execute the div instruction is iterative and takes four cycles:

This means that a divide instruction runs through the pipeline as follows: F-D-X0-X0-X0-X0-M-W. All other
instructions are blocked from the execute stage while a division is being done.

(a) Stalling for Structural Hazards

Draw a pipeline diagram (table) showing the execution of the MIPS code through the first iteration
of the loop, without bypassing. Assume data hazards and structural hazards are resolved using only
stalling. Assume branches are not taken, until they are resolved in the execute stage. What is the CPI
of the entire program?

Hint: Fill in this pipeline diagram

(b) Bypassing for Data Hazards

Draw a pipeline diagram similar to Part A, but now assume the processor has data bypassing. What is
the CPI of the entire program?

Hint: Fill in this pipeline diagram