2023年11月25日

计算机代写｜ECE 570 High-Performance Computer Architecture Assignment #1

这是一个美国的计算机代写架构Assignment

1- For this problem, consider the following two code sequences executing on the 5-stage datapath with normal forwarding and bypassing shown below:

Code 1
…
LD R4, 0(R2)
ADD R1, R4, R2
…

Code 2
…
SUB R4, R3, R2
BNEZ R4, Loop
…

(a) The first code is an example of a “load followed by its immediate use”. Show and explain the correct execution timing that would allow the datapath to properly execute Code 1.

(b) The second code is an example where MEM-to-ID forwarding (also known as branch forwarding) will benefit on the 5-stage pipeline.

(i) Show and explain the execution timing as to why the datapath would not properly execute the above code sequence without the MEM-ID forwarding.

(ii) Make the necessary modifications to the datapath to implement the MEM-ID forwarding and allow Code 2 to run with minimal stalling and show the execution timing for the modified design. Clearly explain your design.

2- Consider the following code, which represents Y = a´X+Y operation for a vector length of 100. Assume the pipeline latencies from Figure 3.2 in the text and a 1-cycle delay branch that is resolved in the ID stage. In addition, as discussed in Problem #1, the pipeline uses branch forwarding that forwards the result of an ALU operation from the MEM stage to the ID stage (i.e., MEM-to-ID forwarding).

DADDIU R4, R1, #800 ; R1 = X, R4 = upper bound for X
foo: L.D F2, 0(R1) ; load X(i)
MUL.D F4, F2, F0 ; F0 = a, perform a*x(i)
L.D F6, 0(R2) ; load Y(i)
ADD.D F6, F4, F6 ; perform a*X(i) + Y(i)
S.D F6, 0(R2) ; store Y(i)
DADDIU R1, R1, #8 ; increment X index
DADDIU R2, R2, #8 ; increment Y index
DSUBU R3, R4, R1 ; test if done
BNEZ R3, foo ; loop if not done

(a) Show how this loop would execute without any scheduling. Maximize the performance of this code by applying both instruction reordering (also known as pipeline scheduling) and delay branch techniques.
Ignoring the startup delays and assuming the loop executes 100 times, determine the number of cycles required to execute the code before and after the optimizations. Do not be concerned about what happens after the loop.

(b) Unroll the loop as many times as necessary to schedule it without stalls and show the instruction schedule.
Again, assuming the loop executes 100 times, determine the number of cycles required to execute the code before and after unrolling

3- Consider the following code segment within a loop body:

if (d==0)
d=1;
if (d==1)

Here is the equivalent MIPS code assuming d is assigned to registers R1.

bnez R1,L1 ; branch b1 (d!=0)
addi R1,R0,#1 ; d==0, so d=1
L1: subi R3,R1,#1
bnez R3,L2 ; branch b2 (d!=1)
…
L2:

Assume that the following list of 8 values of d is to be processed:

0, 2, 0, 2, 0, 2, 0, 2

(a) Suppose 2-bit BPBs are used to predict the execution of the two branches in this loop. 2-bit BPBs consists of four states: strongly not taken (sNT), weakly not taken (wNT), strongly taken (sT), and weakly taken (wT).
Show the trace of predictions and the actual outcomes of branches b1 and b2. Assume initial values of the 2–bit predictors are sNT. What are the prediction accuracies for b1 and b2? What is the overall prediction accuracy?

(b) Suppose two-level (1, 2) branch prediction is used predict the execution of the two branches in this loop. That is, in addition to the 2-bit predictor, a 1-bit global register (g) is used. Assume the 2-bit predictors are initialized to sNT and g is initialized to NT. Show the trace of predictions and the actual outcomes of branches b1 and b2. What are the prediction accuracies for b1 and b2? What is the overall prediction accuracy?

(c) Suppose a two-level (2, 2) branch prediction is used predict the execution of the two branches in this loop. That is, in addition to the 2-bit predictor, a 2-bit global register (g) is used. Assume the 2-bit predictors are all initialized to sNT and g is initialized to NT, NT, with the LSB representing the most recent branch outcome.
Therefore, initial predictions, g, and their meaning are given below

程序代写代做C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB

CS代写,留学生编程代写,CS作业代写,Java代写,程序代写，代码代写 | ITCS代写

本网站支持淘宝支付宝微信支付 paypal等等交易。如果不放心可以用淘宝交易！

E-mail:itcsdx@outlook.com 微信:itcsdx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。

美国cs代写

数据库代写 | FNDS015 ERD & MS Access Database java代写 | Assignment 2 CSSE7610

CONTACT

Assignment Example

Service Scope

Recent Case

2024年10月8日

ITCS代写

计算机代写｜ECE 570 High-Performance Computer Architecture Assignment #1

CONTACT

Assignment Example

Service Scope

Recent Case

MySQL数据库学习指南：留学生如何在不同国家的课程和就业形势下脱颖而出

北美计算机留学高校整理与热门专业前景分析

留学生计算机代写常见服务有哪些？

留学生程序代写靠谱吗

留学生如何选择机器学习方向的专业

Tags