Title | Data Dependencies |
---|---|
Author | Joshua Ahn |
Course | Computer Architecture |
Institution | University of Chicago |
Pages | 8 |
File Size | 631 KB |
File Type | |
Total Downloads | 101 |
Total Views | 123 |
Here, we talk about data dependencies which are a characteristic of pipelining a CPU...
Pipeline Dependency Handling Data Dependency – Flow dependency: read-after-write (RAW) o True dependence – this is the only data dependency type that can cause a problem o r3 r1 op r2 o r5 r3 op r4 Anti dependency: write-after-read (WAR) o r5 r3 op r4 o r3 r6 op r7 Output dependency: write-after-write (WAW) o r3 r1 op r2 o r5 r3 op r4 o r3 r6 op r7 Anti and output dependencies only exist due to a limited number of architectural registers Dependent only the name of the register, not the value of the register Easy to handle in in-order processor pipelines Write to the destination (1) in one stage and (2) in program order Flow dependencies are more “interesting” – Always need to be obeyed because they constitute true dependence on a value Ways of handling flow dependencies – Detect and wait until value is available in register file Detect and forward/bypass data to dependent instruction Detect and eliminate the dependence at the software level o No need for the hardware to detect dependence Predict the needed value(s), execute “speculatively”, and verify Do something else (fine-grained multithreading) o No need to detect We’ll focus on the first two solutions – for there to be a flow dependency between two instructions, the “distance” between them must be less than 3 (i.e., 1 or 2 other instructions between them). This is how we detect a data dependency:
Now, to resolve the data dependency, we can do a few things – 1) Stall the pipeline (i.e., inserting bubbles) You make the dependent instruction wait in IF until its source data becomes available
To implement the stall, you must prevent the update of the PC and the IF/ID registers ensure stalled instructions stay in their stages o Use a “write enable” signal for the register Force control values in ID/EX registers to 0 during the stalled (bubble) cycles nop (nooperations) It is crucial that the EX, MEM, and WB stages continue to advance normally during stall cycles
Impact on performance – each stall cycle corresponds to one lost cycle in which no instructions can be completed For a program with N instructions and S stall cycles, average CPI = (N + S)/N S is dependent on – o frequency of RAW dependences o exact distance between the dependent instructions o distance between dependencies if i1, i2, i3 are all dependent on i0, then once i1’s dependence on i0 is resolved, i2 and i3 must be okay too
2) Data forwarding/bypassing Instructions IA and IB have RAW dependency that needs to be handled Here, we must retrieve the operand from the datapath instead of the register file Forward the value to the dependent instruction as soon as it is available Case: Double Data Dependency
o You should retrieve the operand from the younger instruction if data dependencies occur in multiple outstanding instructions Forwarding Conditions and Logic
o The forwarding logic is similar to the stall detection logic, but you should now check in the execute stage o And ordering matters! You must check the youngest match first o However, even with data-forwarding, RAW dependence still exists on an immediately preceding LDUR instruction, this is called a load-use dependency Requires a stall – this is an edge case
o To detect a load-use dependency:
o Now, what if you have a dependent STUR immediately following an LDUR? It depends – you can either be forced to have a stall or not to have a stall. You only have a stall when you must store at a base register and incorporate the offset to the memory address. 3) Software optimization – just re-order the instructions as long as the program semantic is preserved. Control Dependency – The central question is: what should the fetch PC be in the next cycle? o The answer is easy if you have a non-control-flow instruction o But if you have a control-flow instruction, you have to take an extra step to determine the next fetch PC How to handle control dependency? o Critical to keep the pipeline full of the correct sequence of dynamic instructions o Potential solutions – Stall the pipeline until you know the next fetch address Guess the next fetch address (branch prediction) Eliminate control-flow instructions (predicated execution) Employ delayed branching (branch delay slot) Do something else (fine-grained multithreading) Fetch from both possible paths (if you know the addresses of both possible paths – multiple execution) o Branch target and condition available in the EX-stage of the branch instruction (instead of MEM) makes it easier Under the stalling method –
o Until you have completely decoded the next instruction, you won’t know the exact value of the next PC o All instructions must be stalled for at least 1 cycle, but if you have a branching instruction, you must stall for 2 cycles o Examples of control flow instructions – function calls, branching, jumps, if conditionals, loops Under the branch prediction method – o When a branch is resolved – branch target (Instk) is fetched all instructions fetched since Insth (“wrong-path” instructions) must be discarded/flushed
o To flush an instruction, you simply set all of its write enable control signals to 0
o Performance Analysis – Correct guess no penalty Incorrect guess 2 bubbles Assume no data dependency related stalls, 20% control flow instructions, 70% of control flow instructions taken
o To resolve the branch condition and target address early, you could move the PC+PC offset calculation moved from the EX to ID stage. This is OK but it increases the critical path of the ID. For a branch condition like CBZ though, this causes trouble – not a good idea to move the calculation from EX to ID stage You would need to implement a stall for one cycle – it’s not possible for you to do any forwarding. This will also make the data dependency handling much harder o We could also reduce the probability of the wrong guess Maximize the chances that the next sequential instruction is the next instruction be executed
Lay out the control flow graph such that the “likely next instruction” is on the not-taken path of a branch Can be done in software (static) or hardware (dynamic)
Slides 44 of the lecture slides was not covered in class...