Question 1 What Out-Of-Order processor hardware structure can be used to enforce that instructions commit in order?
Reorder buffer
Question 2 Register renaming is able overcome which of the data hazards? Select all that apply
WAW WAR
Question 3 How many SRAM bits are needed to implement an 8KB two-way set associative cache with 64B block size? Assume that each line (entry) has a single valid bit and no dirty bits. There is one bit per set for true LRU. Assume that the address size of the machine is 32-bits and that the machine allows for byte addressing.
68288
Question 4 Which of the following two processors will execute a program with the given instruction mix faster?
Name | Processor A | Processor B |
---|---|---|
Frequency | 1GHz. | 2GHz. |
CPI for ALU Instructions | 1 | 1.5 |
CPI for Branch Instructions | 2 | 3 |
CPI for Memory Instructions | 1 | 2 |
Instruction Mix: 50% ALU Instructions 10% Branch Instructions 40% Memory Instructions
Processor B
Question 5 In a pipelined processor, a single instruction takes the following synchronous exceptions (interrupts): Divide-by-Zero fault and Invalid Opcode. What should the interrupt cause be loaded with?
Invalid opcode
Question 6 Use the following architecture for questions 6-9: Given a 3-wide in-order processor, draw the optimal pipeline diagram and answer question 6-9, showing for each instruction, what stage of the pipeline it is in for each cycle for the execution of the code sequence below. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline Y can excute loads, stores, and ALU operations, and pipeline Z can execute loads, stores, and ALU operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. The machine can fetch three instructions per cycle, decode three instructions per cycle, execute three instructions per cycle, and writeback three instructions per cycle but maintains data dependencies. The operand steering logic can steer any operand to any ALU to enable any instruction to reach any pipeline, but the pipelines have restrictions on what instructions each can execute as described above. Assume that there are no alignment restrictions on instructions which can be simultaneously fetched from the instruction memory. Also, assume that instructions stall in the decode stage if there are structural or data hazards and stalling one pipeline does not inhibit the fetching of future instructions. The figure below shows the pipeline with pipeline stage names underlined. [image] Code sequence for questions 6-9: [image] Which instructions stall due to data hazard? Check all that apply
4: ADD R14, R11, R15 11: OR R11, R26, R18
Question 7 Use the following architecture for questions 6-9: Given a 3-wide in-order processor, draw the optimal pipeline diagram and answer question 6-9, showing for each instruction, what stage of the pipeline it is in for each cycle for the execution of the code sequence below. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline Y can excute loads, stores, and ALU operations, and pipeline Z can execute loads, stores, and ALU operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. The machine can fetch three instructions per cycle, decode three instructions per cycle, execute three instructions per cycle, and writeback three instructions per cycle but maintains data dependencies. The operand steering logic can steer any operand to any ALU to enable any instruction to reach any pipeline, but the pipelines have restrictions on what instructions each can execute as described above. Assume that there are no alignment restrictions on instructions which can be simultaneously fetched from the instruction memory. Also, assume that instructions stall in the decode stage if there are structural or data hazards and stalling one pipeline does not inhibit the fetching of future instructions. The figure below shows the pipeline with pipeline stage names underlined. [image] Code sequence for questions 6-9: [image] Which instructions stall due to structural hazard? Select all that apply
7: LW R22, 4(R19) 9: LW R25, 12(R19) 12: AND R13, R17, R29
Question 8 Use the following architecture for questions 6-9: Given a 3-wide in-order processor, draw the optimal pipeline diagram and answer question 6-9, showing for each instruction, what stage of the pipeline it is in for each cycle for the execution of the code sequence below. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline Y can excute loads, stores, and ALU operations, and pipeline Z can execute loads, stores, and ALU operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. The machine can fetch three instructions per cycle, decode three instructions per cycle, execute three instructions per cycle, and writeback three instructions per cycle but maintains data dependencies. The operand steering logic can steer any operand to any ALU to enable any instruction to reach any pipeline, but the pipelines have restrictions on what instructions each can execute as described above. Assume that there are no alignment restrictions on instructions which can be simultaneously fetched from the instruction memory. Also, assume that instructions stall in the decode stage if there are structural or data hazards and stalling one pipeline does not inhibit the fetching of future instructions. The figure below shows the pipeline with pipeline stage names underlined. [image] Code sequence for questions 6-9: [image] Which instructions stall in the fetch stage? Select all that apply
7: LW R22, 4(R19) 12: AND R13, R17, R29
Question 9 Use the following architecture for questions 6-9: Given a 3-wide in-order processor, draw the optimal pipeline diagram and answer question 6-9, showing for each instruction, what stage of the pipeline it is in for each cycle for the execution of the code sequence below. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline Y can excute loads, stores, and ALU operations, and pipeline Z can execute loads, stores, and ALU operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. The machine can fetch three instructions per cycle, decode three instructions per cycle, execute three instructions per cycle, and writeback three instructions per cycle but maintains data dependencies. The operand steering logic can steer any operand to any ALU to enable any instruction to reach any pipeline, but the pipelines have restrictions on what instructions each can execute as described above. Assume that there are no alignment restrictions on instructions which can be simultaneously fetched from the instruction memory. Also, assume that instructions stall in the decode stage if there are structural or data hazards and stalling one pipeline does not inhibit the fetching of future instructions. The figure below shows the pipeline with pipeline stage names underlined. [image] Code sequence for questions 6-9: [image] Which instructions stall in the decode stage? Select all that apply
4: ADD R14, R11, R15 9: LW R25, 12(R19) 11: OR R11, R26, R18
Question 10 Use the following architecture for questions 10-14: Draw the optimal pipeline diagram for the following code executing on the IO3 processor from lecture as shown below and answer questions 10-14. The IO3 processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions out-of-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, and writeback one result per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline M can excute loads and stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Assume that the issue queue can hold 16 instructions and begins empty. [image] Code sequence for questions 10-17: [image] Which instructions stall in the issue queue(IQ)? Select all that apply
4: MUL R7, R5, R6 6: ADDIU R14, R18, 1 7: ADDIU R13, R18, 2
Question 11 Use the following architecture for questions 10-14: Draw the optimal pipeline diagram for the following code executing on the IO3 processor from lecture as shown below and answer questions 10-14. The IO3 processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions out-of-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, and writeback one result per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline M can excute loads and stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Assume that the issue queue can hold 16 instructions and begins empty. [image] Code sequence for questions 10-17: [image] Of those that stall in the instruction queue (IQ), which instructions stall for at least one cycle due to a structural hazard? Select all that apply.
6: ADDIU R14, R18, 1 7: ADDIU R13, R18, 2
Question 12 Use the following architecture for questions 10-14: Draw the optimal pipeline diagram for the following code executing on the IO3 processor from lecture as shown below and answer questions 10-14. The IO3 processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions out-of-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, and writeback one result per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline M can excute loads and stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Assume that the issue queue can hold 16 instructions and begins empty. [image] Code sequence for questions 10-17: [image] On what cycle does Instruction 4 write back its results into R7? (Assume that Instruction 0 is fetched on cycle 0 and writes back on cycle 4, Instruction 1 is fetched on cycle 1 and writes back on cycle 5, etc.)
14
Question 13 Use the following architecture for questions 10-14: Draw the optimal pipeline diagram for the following code executing on the IO3 processor from lecture as shown below and answer questions 10-14. The IO3 processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions out-of-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, and writeback one result per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline M can excute loads and stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Assume that the issue queue can hold 16 instructions and begins empty. [image] Code sequence for questions 10-17: [image] On what cycle does Instruction 6 write back its results into R14? (Assume that Instruction 0 is fetched on cycle 0 and writes back on cycle 4, Instruction 1 is fetched on cycle 1 and writes back on cycle 5, etc.)
12
Question 14 Use the following architecture for questions 10-14: Draw the optimal pipeline diagram for the following code executing on the IO3 processor from lecture as shown below and answer questions 10-14. The IO3 processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions out-of-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, and writeback one result per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline M can excute loads and stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Assume that the issue queue can hold 16 instructions and begins empty. [image] Code sequence for questions 10-17: [image] Would adding register renaming logic enable faster completion of the code sequence used in Questions 10-17 on an IO3 architecture?
No
Question 15 Use the following architecture for questions 15-17: Draw the optimal pipeline diagram for the following code executing on the IO2I processor from lecture as shown below. The IO2I processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions in-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, writeback one result per cycle, and commit one instruction per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline L excutes loads, pipeline S executes stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Use a lower-case ‘r’ to denote if an instruction enters the reorder buffer, but does not immediately commit. Assume that the issue queue can hold 16 instructions and begins empty. [image] Code sequence for questions 10-17: [image] Which instructions spend multiple cycles waiting to commit after being written back into the ROB? Select all that apply
5: ADDIU R18, R11, 1 6: ADDIU R14, R18, 1 7: ADDIU R13, R18, 2
Question 16 Use the following architecture for questions 15-17: Draw the optimal pipeline diagram for the following code executing on the IO2I processor from lecture as shown below. The IO2I processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions in-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, writeback one result per cycle, and commit one instruction per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline L excutes loads, pipeline S executes stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Use a lower-case ‘r’ to denote if an instruction enters the reorder buffer, but does not immediately commit. Assume that the issue queue can hold 16 instructions and begins empty. [image] Code sequence for questions 10-17: [image] On what cycle does the final instruction commit? (Assume that Instruction 0 is fetched on cycle 0 and writes back on cycle 4, Instruction 1 is fetched on cycle 1 and writes back on cycle 5, etc.)
18
Question 17 Use the following architecture for questions 15-17: Draw the optimal pipeline diagram for the following code executing on the IO2I processor from lecture as shown below. The IO2I processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions in-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, writeback one result per cycle, and commit one instruction per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline L excutes loads, pipeline S executes stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case ‘i’ to denote if an instruction enters the issue queue, but does not immediately issue. Use a lower-case ‘r’ to denote if an instruction enters the reorder buffer, but does not immediately commit. Assume that the issue queue can hold 16 instructions and begins empty. [image] Code sequence for questions 10-17: [image] Would adding register renaming logic enable faster completion of the code sequence used in Questions 10-17 on an IO2I architecture?
No
Question 18 The following code is to be executed on a processor with 32 architectural registers. The processor is able to issue instructions out-of-order. The processor is a single issue machine. The processor has different functional unit latencies with multiply instructions having a latency of 4 cycles, ALU operations having a latency of 1 cycles, and loads and stores having a latency of 2 cycles. The processor stalls on WAW and WAR dependencies. Pretend that you are the compiler and perform changes to the following code to increase the performance of the code when executing on this out-of-order processor. Assume that all registers not used are free to be used by the compiler. Problem 18 Code Sequence: [image] Which of the following code sequences would increase the performance of the code on this OoO processor? Select all that apply.
MUL R5, R6, R7 ADD R8, R5, R6 MUL R10, R13, R8 SW R12, 0(R10) SUB R18, R6, R4 MUL R17, R18, R15 ADDIU R19, R5, 1 MUL R5, R6, R7 ADD R8, R5, R6 MUL R18, R13, R8 SW R12, 0(R18) SUB R10, R6, R4 MUL R17, R10, R15 ADDIU R19, R5, 1