Riscv-boom - reading material PDF

Title Riscv-boom - reading material
Course Computer Architectures
Institution University of Lincoln
Pages 87
File Size 1.9 MB
File Type PDF
Total Downloads 64
Total Views 151

Summary

reading material...


Description

RISCV-BOOM Documentation

Chris Celio, Jerry Zhao, Abraham Gonzalez, Ben Korpan

Feb 23, 2019

Contents:

1

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

1 1 3 3 4 4 6

Instruction Fetch 2.1 The Rocket I-Cache . . . . . . . 2.2 Fetching Compressed Instructions 2.3 The Fetch Buffer . . . . . . . . . 2.4 The Fetch Target Queue . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

9 9 10 10 11

3

Branch Prediction 3.1 The Next-line Predictor (NLP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Backing Predictor (BPD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13 13 15

4

The Decode Stage 4.1 RVC Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23 23

5

The Rename Stage 5.1 The Purpose of Renaming . . . 5.2 The Explicit Renaming Design 5.3 The Rename Map Table . . . . 5.4 The Busy Table . . . . . . . . . 5.5 The Free List . . . . . . . . . . 5.6 Stale Destination Specifiers . .

2

6

7

Introduction and Overview 1.1 The BOOM Pipeline . . . . . 1.2 The RISC-V ISA . . . . . . . 1.3 The Chisel HCL . . . . . . . 1.4 Quick-start . . . . . . . . . . 1.5 The BOOM Repository . . . 1.6 The Rocket-Chip Repository .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

25 25 25 25 28 28 28

The Reorder Buffer (ROB) and the Dispatch Stage 6.1 The ROB Organization . . . . . . . . . . . . . . 6.2 ROB State . . . . . . . . . . . . . . . . . . . . 6.3 The Commit Stage . . . . . . . . . . . . . . . . 6.4 Exceptions and Flushes . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

29 29 29 31 31

The Issue Unit 7.1 Speculative Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33 33

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

i

7.2 7.3 7.4 7.5 7.6

Issue Slot . . . . . . . . . Issue Select Logic . . . . Un-ordered Issue Queue . Age-ordered Issue Queue Wake-up . . . . . . . . .

. . . . .

. . . . .

33 34 34 35 35

8

The Register Files and Bypass Network 8.1 Register Read . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Bypass Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37 37 38

9

The Execute Pipeline 9.1 Execution Units . . . . . . . . . . . . . . 9.2 Functional Units . . . . . . . . . . . . . . 9.3 Branch Unit & Branch Speculation . . . . 9.4 Load/Store Unit . . . . . . . . . . . . . . 9.5 Floating Point Units . . . . . . . . . . . . 9.6 Floating Point Divide and Square-root Unit 9.7 Parameterization . . . . . . . . . . . . . . 9.8 Control/Status Register Instructions . . . .

10 The Load/Store Unit (LSU) 10.1 Store Instructions . . . . . . 10.2 Load Instructions . . . . . . 10.3 The BOOM Memory Model 10.4 Memory Ordering Failures .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

39 40 41 42 43 43 43 45 45

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

47 47 49 49 49

11 The Memory System and the Data-cache Shim 12 Micro-architectural Event Tracking 12.1 Setup HPM events to track . . . . . 12.2 Reading HPM counters in software 12.3 Adding your own HPE . . . . . . . 12.4 External Resources . . . . . . . . .

51

. . . .

53 53 54 54 54

13 Verification 13.1 RISC-V Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 RISC-V Torture Tester . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Continuous Integration (CI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55 55 55 55

14 Debugging 14.1 FireSim Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Pipeline Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57 57 57

15 Physical Realization 15.1 Register Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Pipelining Configuration Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59 59 60

16 Future Work 16.1 The Rocket Custom Co-processor Interface (ROCC) . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 The Vector (“V”) ISA Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61 61 62

17 Parameterization 17.1 BOOM Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2 Other Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65 65 66

18 Frequently Asked Questions

67

ii

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

19 The BOOM Ecosystem 69 19.1 Scala, Chisel, Generators, Configs, Oh My! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 20 Terminology

73

21 Bibliography

75

22 Indices and tables

77

Bibliography

79

iii

iv

CHAPTER

1

Introduction and Overview

The goal of this document is to describe the design and implementation of the Berkeley Out–of–Order Machine (BOOM). BOOM is heavily inspired by the MIPS R10k and the Alpha 21264 out–of–order processors. Like the R10k and the 21264, BOOM is a unified physical register file design (also known as “explicit register renaming”). The source code to BOOM can be found at https://github.com/riscv-boom/riscv-boom.

1.1 The BOOM Pipeline

Fig. 1.1: Default BOOM Pipeline with Stages

1.1.1 Overview Conceptually, BOOM is broken up into 10 stages: Fetch, Decode, Register Rename, Dispatch, Issue, Register Read, Execute, Memory, Writeback and Commit. However, many of those stages are combined in the current implementation, yielding seven stages: Fetch, Decode/Rename, Rename/Dispatch, Issue/RegisterRead, Execute, Memory and Writeback (Commit occurs asynchronously, so it is not counted as part of the “pipeline”).

1.1.2 Stages Fetch Instructions are fetched from the Instruction Memory and pushed into a FIFO queue, known as the Fetch Buffer. Branch prediction also occurs in this stage, redirecting the fetched instructions as necessary.1 1 While the Fetch Buffer is N-entries deep, it can instantly read out the first instruction on the front of the FIFO. Put another way, instructions don’t need to spend N cycles moving their way through the Fetch Buffer if there are no instructions in front of them.

1

RISCV-BOOM Documentation

Decode Decode pulls instructions out of the Fetch Buffer and generates the appropriate Micro-Op(s) to place into the pipeline.2 Rename The ISA, or “logical”, register specifiers (e.g. x0-x31) are then renamed into “physical” register specifiers. Dispatch The Micro-Op is then dispatched, or written, into a set of Issue Queues. Issue Micro-Ops sitting in a Issue Queue wait until all of their operands are ready and are then issued.3 This is the beginning of the out–of–order piece of the pipeline. Register Read Issued Micro-Ops first read their register operands from the unified physical register file (or from the bypass network). . . Execute . . . and then enter the Execute stage where the functional units reside. Issued memory operations perform their address calculations in the Execute stage, and then store the calculated addresses in the Load/Store Unit which resides in the Memory stage. Memory The Load/Store Unit consists of three queues: a Load Address Queue (LAQ), a Store Address Queue (SAQ), and a Store Data Queue (SDQ). Loads are fired to memory when their address is present in the LAQ. Stores are fired to memory at Commit time (and naturally, stores cannot be committed until both their address and data have been placed in the SAQ and SDQ). Writeback ALU operations and load operations are written back to the physical register file. Commit The Reorder Buffer (ROB), tracks the status of each instruction in the pipeline. When the head of the ROB is not-busy, the ROB commits the instruction. For stores, the ROB signals to the store at the head of the Store Queue (SAQ/SDQ) that it can now write its data to memory. 2 Because RISC-V is a RISC ISA, currently all instructions generate only a single Micro-Op. More details on how store Micro-Ops are handled can be found in The Memory System and the Data-cache Shim. 3 More precisely, Micro-Ops that are ready assert their request, and the issue scheduler chooses which Micro-Ops to issue that cycle.

2

Chapter 1. Introduction and Overview

RISCV-BOOM Documentation

1.1.3 Branch Support BOOM supports full branch speculation and branch prediction. Each instruction, no matter where it is in the pipeline, is accompanied by a branch tag that marks which branches the instruction is “speculated under”. A mispredicted branch requires killing all instructions that depended on that branch. When a branch instructions passes through Rename, copies of the Register Rename Table and the Free List are made. On a mispredict, the saved processor state is restored. Although Fig. 1.1 shows a simplified pipeline, BOOM implements the RV64G and privileged ISAs, which includes single- and double-precision floating point, atomic memory support, and page-based virtual memory.

1.2 The RISC-V ISA BOOM implements the RV64GC variant of the RISC-V ISA. This includes the MAFDC extensions and the privileged specification (multiply/divide, AMOs, load-reserve/store-conditional, single- and double-precision IEEE 754-2008 floating point). More information about the RISC-V ISA can be found at http://riscv.org. RISC-V provides the following features which make it easy to target with high-performance designs: • Relaxed memory model – This greatly simplifies the Load/Store Unit, which does not need to have loads snoop other loads nor does coherence traffic need to snoop the LSU, as required by sequential consistency. • Accrued Floating Point (FP) exception flags – The FP status register does not need to be renamed, nor can FP instructions throw exceptions themselves. • No integer side-effects – All integer ALU operations exhibit no side-effects, other than the writing of the destination register. This prevents the need to rename additional condition state. • No cmov or predication – Although predication can lower the branch predictor complexity of small designs, it greatly complicates OoO pipelines, including the addition of a third read port for integer operations. • No implicit register specifiers – Even JAL requires specifying an explicit register. This simplifies rename logic, which prevents either the need to know the instruction first before accessing the rename tables, or it prevents adding more ports to remove the instruction decode off the critical path. • Registers rs1, rs2, rs3, rd are always in the same place – This allows decode and rename to proceed in parallel. BOOM (currently) does not implement the proposed “V” vector extension.

1.3 The Chisel HCL 1.3.1 The Chisel Hardware Construction Language BOOM is implemented in the Chisel hardware construction language. It is a embedded within Scala which allows highly parameterized designs. More information about can be found at http://chisel.eecs.berkeley.edu.

1.2. The RISC-V ISA

3

RISCV-BOOM Documentation

1.4 Quick-start The best way to get started with the BOOM core is to use the BOOM project template located in the main GitHub organization. There you will find the main steps to setup your environment, build, and run the BOOM core on a C++ emulator. Here is a selected set of steps from that repositories README: Listing 1.1: Quick-Start Code # Download the template and setup environment git clone https://github.com/riscv-boom/boom-template.git cd boom-template ./scripts/init-submodules.sh # You may want to add the following two lines to your shell profile export RISCV=/path/to/install/dir export PATH=$RISCV /bin:$PATH cd boom-template ./scripts/build-tools.sh cd verisim make run

Note: Listing 1.1 assumes you don’t have riscv-tools toolchain installed. It will pull and build the toolchain for you.

1.5 The BOOM Repository The BOOM repository holds the source code to the BOOM core; it is not a full processor and thus is NOT A SELFRUNNING repository. To instantiate a BOOM core, the Rocket-Chip generator found in the rocket-chip git repository must be used https://github.com/freechipsproject/rocket-chip, which provides the caches, uncore, and other needed infrastructure to support a full processor. The BOOM source code can be found in boom/src/main/scala. The code structure is shown below: • boom/src/main/scala/ – bpu/ * bpd-pipeline.scala * bpd/ · br-predictor.scala · gshare/ · gshare.scala · simple-predictors/ · base-only.scala · simple-predictors.scala · tage/ · tage.scala · tage-table.scala 4

Chapter 1. Introduction and Overview

RISCV-BOOM Documentation

* btb/ · bim.scala · btb-sa.scala · btb.scala · dense-btb.scala * misc/ · 2bc-table.scala – common/ * configs.scala * consts.scala * micro-op.scala * package.scala * parameters.scala * tile.scala * types.scala – exu/ * core.scala * decode.scala * fp-pipeline.scala * rob.scala * execution-units/ · execution-units.scala · execution-unit.scala · functional-unit.scala · fpu.scala · fdiv.scala * issue-units/ · issue-slot.scala · issue-unit-ageordered.scala · is...


Similar Free PDFs