Data Dependencies PDF

Title	Data Dependencies
Author	Joshua Ahn
Course	Computer Architecture
Institution	University of Chicago
Pages	8
File Size	631 KB
File Type	PDF
Total Downloads	101
Total Views	123

Preview

CLICK TO PREVIEW PDF

Summary

Here, we talk about data dependencies which are a characteristic of pipelining a CPU...

Description

Pipeline Dependency Handling Data Dependency –  Flow dependency: read-after-write (RAW) o True dependence – this is the only data dependency type that can cause a problem o r3  r1 op r2 o r5  r3 op r4  Anti dependency: write-after-read (WAR) o r5  r3 op r4 o r3  r6 op r7  Output dependency: write-after-write (WAW) o r3  r1 op r2 o r5  r3 op r4 o r3  r6 op r7 Anti and output dependencies only exist due to a limited number of architectural registers  Dependent only the name of the register, not the value of the register  Easy to handle in in-order processor pipelines  Write to the destination (1) in one stage and (2) in program order Flow dependencies are more “interesting” –  Always need to be obeyed because they constitute true dependence on a value Ways of handling flow dependencies –  Detect and wait until value is available in register file  Detect and forward/bypass data to dependent instruction  Detect and eliminate the dependence at the software level o No need for the hardware to detect dependence  Predict the needed value(s), execute “speculatively”, and verify  Do something else (fine-grained multithreading) o No need to detect We’ll focus on the first two solutions – for there to be a flow dependency between two instructions, the “distance” between them must be less than 3 (i.e., 1 or 2 other instructions between them). This is how we detect a data dependency:

Now, to resolve the data dependency, we can do a few things – 1) Stall the pipeline (i.e., inserting bubbles)  You make the dependent instruction wait in IF until its source data becomes available



 

To implement the stall, you must prevent the update of the PC and the IF/ID registers  ensure stalled instructions stay in their stages o Use a “write enable” signal for the register Force control values in ID/EX registers to 0 during the stalled (bubble) cycles  nop (nooperations) It is crucial that the EX, MEM, and WB stages continue to advance normally during stall cycles

  

Impact on performance – each stall cycle corresponds to one lost cycle in which no instructions can be completed For a program with N instructions and S stall cycles, average CPI = (N + S)/N S is dependent on – o frequency of RAW dependences o exact distance between the dependent instructions o distance between dependencies  if i1, i2, i3 are all dependent on i0, then once i1’s dependence on i0 is resolved, i2 and i3 must be okay too

2) Data forwarding/bypassing  Instructions IA and IB have RAW dependency that needs to be handled  Here, we must retrieve the operand from the datapath instead of the register file  Forward the value to the dependent instruction as soon as it is available  Case: Double Data Dependency



o You should retrieve the operand from the younger instruction if data dependencies occur in multiple outstanding instructions Forwarding Conditions and Logic

o The forwarding logic is similar to the stall detection logic, but you should now check in the execute stage o And ordering matters! You must check the youngest match first o However, even with data-forwarding, RAW dependence still exists on an immediately preceding LDUR instruction, this is called a load-use dependency  Requires a stall – this is an edge case

o To detect a load-use dependency:

o Now, what if you have a dependent STUR immediately following an LDUR? It depends – you can either be forced to have a stall or not to have a stall. You only have a stall when you must store at a base register and incorporate the offset to the memory address. 3) Software optimization – just re-order the instructions as long as the program semantic is preserved. Control Dependency –  The central question is: what should the fetch PC be in the next cycle? o The answer is easy if you have a non-control-flow instruction o But if you have a control-flow instruction, you have to take an extra step to determine the next fetch PC  How to handle control dependency? o Critical to keep the pipeline full of the correct sequence of dynamic instructions o Potential solutions –  Stall the pipeline until you know the next fetch address  Guess the next fetch address (branch prediction)  Eliminate control-flow instructions (predicated execution)  Employ delayed branching (branch delay slot)  Do something else (fine-grained multithreading)  Fetch from both possible paths (if you know the addresses of both possible paths – multiple execution) o Branch target and condition available in the EX-stage of the branch instruction (instead of MEM)  makes it easier  Under the stalling method –



o Until you have completely decoded the next instruction, you won’t know the exact value of the next PC o All instructions must be stalled for at least 1 cycle, but if you have a branching instruction, you must stall for 2 cycles o Examples of control flow instructions – function calls, branching, jumps, if conditionals, loops Under the branch prediction method – o When a branch is resolved –  branch target (Instk) is fetched  all instructions fetched since Insth (“wrong-path” instructions) must be discarded/flushed

o To flush an instruction, you simply set all of its write enable control signals to 0

o Performance Analysis –  Correct guess  no penalty  Incorrect guess  2 bubbles  Assume no data dependency related stalls, 20% control flow instructions, 70% of control flow instructions taken

o To resolve the branch condition and target address early, you could move the PC+PC offset calculation moved from the EX to ID stage. This is OK but it increases the critical path of the ID.  For a branch condition like CBZ though, this causes trouble – not a good idea to move the calculation from EX to ID stage  You would need to implement a stall for one cycle – it’s not possible for you to do any forwarding. This will also make the data dependency handling much harder o We could also reduce the probability of the wrong guess  Maximize the chances that the next sequential instruction is the next instruction be executed



Lay out the control flow graph such that the “likely next instruction” is on the not-taken path of a branch  Can be done in software (static) or hardware (dynamic)

Slides 44 of the lecture slides was not covered in class...