Title | IDS Zusammenfassung SS16 - Informatik der Systeme |
---|---|
Author | Philipp Friedrich |
Course | Informatik der Systeme |
Institution | Eberhard Karls Universität Tübingen |
Pages | 17 |
File Size | 1.3 MB |
File Type | |
Total Downloads | 543 |
Total Views | 896 |
IDS Zusammenfassung - SS 16Philipp Friedrich25. Oktober 20161 Coding1 Kodierungsarten Quellkodierung (source coding) Daten enthalten oft redundante Informationen Komprimierung →Ziel: Geringerer Speicherbedarf bzw. schnellereUbertragung von Informationen ̈ Verlustbehaftet oder verlustfrei Beispiele K...
IDS Zusammenfassung - SS 16 Philipp Friedrich 25. Oktober 2016
1 1.1
Coding Kodierungsarten
• Quellkodierung (source coding) - Daten enthalten oft redundante Informationen - Komprimierung → Ziel: Geringerer Speicherbedarf bzw. schnellere ¨Ubertragung von Informationen - Verlustbehaftet oder verlustfrei • Beispiele - Komprimierte Datenformate: mpg, jpg, ... - Tools um Daten zu komprimieren: zip, rar, ... - Algorithmen: z.B. Run-Length Encoding, Huffman-Coding • Verschl¨usselung (mehr z.B. in Vorlesung Kommunikationsnetze) • Kanalkodierung (channel coding) ¨ bertragung gegen Bitfehler - Sichert Daten f¨ ur Speicherung oder U - Hinzuf¨ ugen von redundanter Information - Bitfehler k¨onnen erkannt und evtl. sogar korrigiert werden • Leitungskodierung (line coding) ¨ - Aufbereitung von Bits f¨ur eine Ubertragungsleitung • Verfahren auch in anderen Systemen wie z.B. Speichern verwendbar
1.2
Run Length Encoding (Laufl¨ angenkodierung)
• Symbol gefolgt von Anzahl seiner Wiederholungen • Beispiel – Original: Auus die Maaaaauuuuus (21 Zeichen) – Komprimiert: A1u2s1 1d1i1e1 1M1a5u5s1 (24 Zeichen) – Problem: Komprimierte Folge ist l¨anger • Verbesserung – Komprimiere erst ab minimaler Anzahl von Wiederholungen – Kennzeichne die Komprimierung mit Marker (escape) – Wenn dieser im Text auftritt, muss er komprimiert werden • Zum obigen Beispiel – Komprimiere erst ab 3 Wiederholungen – W¨ahle s“ als Marker ” • Komprimiert: Auuss1 die Msa5su5ss1 (21 Zeichen) • Einfaches, verlustfreies (Quell-)Kodierungsverfahren - Entfernt redundante Information durch Ersetzen mittels Rekonstruktionsvorschrift - Kann Input f¨ ur Huffman-Kodierung liefern • Noch ein Beispiel - Gegeben ist Folge von Hexadezimalziffern •Marker: A“ ” Komprimierte Sequenz: wiederholte Hexadezimalziffer Anzahl Wiederholungen als zweistellige Hexadezimalzahl Schranke: 4 • Original: 00000000000000000000FF00000000000000000AAAFFFFFFFF (50 Zeichen) (20x0, 2xF, 17x0, 3xA, 8xF) • Komprimiert: A014FFA011AA03AF08 (14 Zeichen)
1
1.3
Huffman-Kodierung
• Huffman-Kodierung - Liefert beweisbar optimalen Baum f¨ ur Symbole mit gegebenen H¨aufigkeiten (oder alternativ Auftrittswahrscheinlichkeiten) • Algorithmus - Erstelle eine Wald mit einem Baum f¨ur jedes Zeichen! Jeder dieser B¨aume enth¨alt nur einen einzigen Knoten: das Zeichen. Schreibe die H¨aufigkeit des Zeichens an den Baum! - Suche die beiden B¨aume, die die geringste H¨ aufigkeit haben! Fasse beide B¨aume zusammen, indem sie die Teilb¨aume einer neuen Wurzel werden! Berechne die Summe der H¨aufigkeiten der beiden Teilb¨ aume als die H¨aufigkeit des neuen Baumes! - Wiederhole Schritt 2 so oft, bis nur noch ein Baum ¨ubrig ist! - Nutze diesen Baum als Code-Baum zur Kodierung der Zeichen!
1.4
Hamming-Distanz
• Summe (Ci ⊕ Cj ) zweier CWs Ci und Cj - Addition der Stellen mittels Exclusive-OR - Beispiel: 101 ⊕ 100 = 001 • Gewicht w(C) eines CWs C - Anzahl der Einsen in C • Distanz d(Ci , Cj ) zweier CWs Ci und Cj - d(Ci , Cj ) = w(Ci ⊕j ) • Hamming-Distanz eines Codes - h = min(d(Ci , Cj )) ¨ - Mindestens h Stellen eines CWs m¨ussten bei der Ubertragung verf¨alscht werden, damit zuf¨alligerweise ein anderes g¨ ultiges CW dabei entsteht - Mit einem Code mit Hamming-Distanz h kann man bis zu n = h − 1 Bitfehler erkennen und n = | n2 | (abgerundet) Bitfehler korrigieren.
2 2.1
Kapitel 4 - Computer Abstractions and Technology ISA - Instruction Set Architecture
• Abstract interface between the hardware and the lowest level software (aka basic instruction set) • Encompasses all the information necessary to write a machine language program, including instructions, registers, memory access, I/O, ... • Enables implementations of varying cost and performance to run identical software
2.2
ABI - Application binary interface
• Combination of the user portion of the instruction set and the operating system interface • Used by application programmers • Defines a standard for binary portability across computers
2.3
Machine Clock Rate
• Machines proceed in in clock cycles (CCs) → Given in ns, ps
• Clock Rate = 1/CC → given in MHz, GHz
2.4
IS - Instruction Set
• Same IS may be supported by diffrent machines • Perfomace may vary
2
2.5
CPI - Clock cycles per instruction
• Depends on how IS is implemented on a machine • Complex instructions have large CPI
2.6
CPU Time
• Given: programm (instructions) - IC: instruction count - ICi : instruction count per CPI class i • Instruction mix - Influences ICi - Depends on program • Effective CPI - n: number of CPI classes Pn
- CP Ief f = • CPU Time: - tCP U =
2.7
i=1
CP Ii ·I Ci IC
Pn
CP Ii ·I Ci clockrate
i=1
Perfomance
• The performance of machine X for program P is defined as P erformance(X, P ) = tCP U 1(X,P ) • Faster is better • Comparison of two machines X and Y for specific program P t CP U (Y,P ) P erf ormance(X,P ) = n means P erf ormance(Y,P ) = t CP U (X,P ) • Performance of X is n times larger than the one of Y • X is n times faster than Y
2.8
Response time (overall execution time)
• Time between the start and the completion of a task • Important to individual users
2.9
Throughput
• Total amount of work done in a given time • Important to data center managers
2.10
Energy efficiency
• Power consumption especially important for power-limited applications, e.g., in the embedded market where battery life matters
3 3.1
Kapitel 5 - Sprache der Computer CISC - RISC
• CISC: Compley Instruction Set Computer - Single instructions can execute several low-level operations (such as a load from memory, an arithmetic operation, and a memory store) and/or are capable of multi-step operations or addressing modes within single instructions • RISC: Reduced Instruction Set Computer - CPU design strategy based on the insight that simplified instructions can provide higher performance if this simplicity enables much faster execution of each instruction. - Z.B ARM processirs in smart phones and tablet computers (iPad) → MIPS 3
3.2
MIPS Instruction Set
• Mircroprocessor without Interlocked Pipeline Stages • MIPS-32 ISA:
• Instruction Categories: - Computatuinal - Load/Store - Jump and Branch - Floating Point: - Coprocessor - Memory Management - Special
3.3
MIPS Arithmetic Istructions
• Each arithmetic instruction in MIPS assembly language performs one operation • Each specifies exactly three operands that are all contained in the datapath’s register file ($t0,$s1,$s2)
4
3.4
MIPS Formats and Fields
• R-Format: For instructions with up to 3 Registers • I-Format: For instructions with Immediate operands (explicit numbers) • J-Format: For Jump instructions R-Format Example: - op: opcode that specifies the operation - rs: register file address of the first source operand - rt: register file address of the second source operand - rd: register file address of the result’s destination - shamt: shift amount (for shift instructions) - funct: function code augmenting the opcode
I-Format Example:
3.5
MIPS Memory Access Instructions
• MIPS has two basic data transfer instructions for accessing memory - lw $t0, 4($s3) #load word from memory Data is loaded from memory into register $t0 - sw $t0, 8($s3) #store word to memory Data is stored from register $t0 into memory
3.6
MIPS Branch Instructions
• MIPS conditional bne $s0, $s1, Lbl beq $s0, $s1, Lbl Example: if (i==j) h = i +
branch instructions (uses I-Format) → #go to Lbl if $s06=$s1 → #go to Lbl if $s0=$s1 j;
bne $s0, $s1, Lbl1 add $s3, $s0, $s1 Lbl1: ...
5
• Set on less than instruction (R-Format): slt $t0, $s0, $s1 if( $s0 < $s1 then $t0 = 1 else $t0 = 0 • Less than or equal to: ble $s1, $s2, Label • Greater than: bgt $s1, $s2, Label • Greater than or equal to: bge $s1, $s2, Label
3.7
Summary of Operands in MIPS
• Register operands (R-Format): - Direct access - Used by arithmetic instructions - Only 32 registers to enable dast access in implementations • Immediate operands (I-Format): - Explicit numbers in instrctions • MIPS register 0 ($zero) is the constant 0 - Cannot be overwritten - Useful for common operations: add $t2, $s1, $zero • Memory operands - Only accessible via load and store instructions → Compilers use registers for variables as much as possible - Only spill to memory for less frequently used variables - Register optimization is important!
6
3.8
Leaf- and Non-Leaf Procedures
• Procedures - Must save register values $sx on the stack if it makes use of these registers - Must restore them before return - Can overwrite arguments $ax and temporaries $tx • Leaf procedure - Does not call other procedures • Non-leaf procedure - Calls other procedures (nested calls) - For nested call, caller needs to save on the stack: → Its return address → Any arguments $ax and temporaries $tx needed after the call - Registers restored from the stack after the call
3.9
3.10
Leaf Procedure Example
Non-Leaf Procedure Example
7
4 4.1
Kapitel 6 - Der Prozessor Pipelining
• Subdivide single instruction into multiple stages • Make clock cycle shorter and process one stage of an instruction per clock cycle • Process stages of consecutive instructions in parallel
• IFetch: Instruction Fetch and Update PC • Dec: Registers Fetch and Instruction Decode • Exec: Execute R-type; calculate memory address • Mem: Read/write the data from/to the Data Memory • WB: Write the result data into the register file → Clock cycle (pipeline stage time) is limited by the slowest stage - For some stages don’t need the whole clock cycle (e.g., WB) - For some instructions, some stages are wasted cycles (i.e., nothing is done during that cycle for that instruction) CPU time = CPI * CC * IC
4.2
Pipeline Hazards
• Structural hazards: attempt to use the same resource by two diffrent instructions at the same time • Data hazards: attempt to use data before its ready • Control hazards: attempt to make a decision about the program control flow before the condition has been evaluated and the new PC target adress calculated, e.g Branch and jump instructions, exeptions • Hazards can usually be resolved by waiting; Pipeline control must detect the hazard and take action to resolve it
4.3
Structural Hazard
Single Memory
8
Register File Access?
4.4
Data Hazard
Due to Register Usage
→ Pipeline Stalling to ”F ix” a Data Hazard
9
→ Data forwarding to fix a Data Hazard
Due to Loads
4.5
Control Hazards
• When the flow of instruction addresses is not sequential (i.e., PC = PC + 4) • Caused by change of flow instructions - Unconditional branches (j, jal, jr) - Conditional branches (beq, bne) - Exceptions • Occur less frequently than data hazards, • But there is nothing as effective against control hazards as forwarding is for data hazards → How to deal with control hazards • Stall (impacts CPI) • Move decision point as early in the pipeline as possible, thereby reducing the number of stall cycles • Predict and hope for the best!
10
4.6
Unconditional Jumps
4.7
Conditional Jumps
→ One way to fix a Branch Control Hazard
11
→ Another way to fix a Branch Control Hazard
4.8
Dynamic Branch Prediction
• Predicts branches dynamically using run-time information - Remembers last (one or two) outcomes of a particular inst addr - Assumes same outcome for next decision of that inst addr • Branch prediction buffer (aka branch history table (BHT)) - Remembers outcome of last one/two decisions (one/two-bit predictors) - In the IF stage addressed by the lower bits of the PC for prediction - Contains bit(s) passed to the ID stage through the IF/ID pipeline register that tells whether the branch was taken the last time it was executed - Branch decision occurs in the ID stage after determining that the fetched instruction is a branch and checking the prediction bit(s) • When outcome of decision is available - If the prediction is wrong, flush the incorrect instruction(s) in pipeline, restart the pipeline with the right instruction - Modify the prediction bit(s) in the BHT accordingly • Wrong predictions don’t affect correctness, just performance
4.9
Exceptions - Other Form of Control Hazards
• Exceptions - Events requiring immediate attention by the processor - Control passed from the user program to an exception handler of the OS • Interrupts (exception type) - Caused by external events – asynchronous to program execution - Instructions in pipeline can complete before control passed to the OS interrupt handler - Simply suspend and resume user program • Traps (exception type) - Caused by internal events – synchronous to program execution - Offending instruction must be stopped midstream and control immediately passed to the OS trap handler to remedy the cause for the trap - The offending instruction may be retried (or simulated by the OS) and the program may continue or it may be aborted • Nomenclature for exceptions, interrupts, and traps not consistently used
12
5 5.1
Kapitel 7 - Memory Hierachy Hierachy Levels
• Disk - Lowest Level • Main memory (DRAM) • Cache memory - (SRAM) Attatched to CPU, Highest Level
5.2
Actions on Cache Hit and Cache Miss
• On Cache hit: CPU proceeds normally • On Cache miss: - Stall the CPU pipeline - Fetch block from next level of hierachy - Instrctuins cache miss: Restart instruction fetch - Data cache miss: Complete data access
5.3
Data Transfer
• Example cache block read - 1 bus cycle for address transfer - 15 bus cycles per DRAM access - 1 bus cycle per data transfer • For 4-word block, 1-word-wide DRAM - Miss penalty = 1 + 4*15 + 4*1 = 65 bus cycles - Bandwidth = 16 bytes / 65 cycles ∼ 0.25 bytes/cycle
13
5.4
Handling Write-Hits
• On data-write hit, could just update the block in cache, but then cache and memory would be inconsistent. Write-through strategy • Updates block in cache and memory • Takes longer, ex: - Base CPI = 1 - 10 % of instructions are stores - Write to memory takes 100 cycles - Effective CPI = 1 + 0.1 · 100 = 11 • Solution: write buffer - Hold data waiting ro be wirtten to memory - CPU continues immediately - Only stalls on write if write buffer is already full Write-back strategy • Updates only block in cache - Keeps track of whether blocks are ’dirty’ - Writes dirty block back to memory when it is replaced by another block - Can use a write buffer to allow replacing block to be read first • More state needed • More complex on replacement
5.5
Handling Write-Misses
Strategies in combination with write-through • Write-allocate: fetch the block and write into cache • No write-allocate: don’t fetch the block - Write only into memory - Useful when programs write a whole block before reading it (e.g., initialization) • Strategy may be specific to memory page Strategies in combination with write-back • Usually only write-allocate • Takes longer than with write-through - Write-back needed for replaced block before overwriting cache memory
5.6
Direct-Mapped Cache Example
14
5.7
Fully Associative Cache
• Allows a given block to go in any cache enty - Tag is entire adress - Index is not used • Retrieval: search all entries of the cache at once - Requires one comparator per entry - Expensive hardware - Example: cache with S=2n blocks a 2m bytes requires 2n comparators Tag in Set Associative Caches
5.8
Comparison of Cache Variants
15
5.9
Replacement Policy for Caches
• Least-recently used (LRU) - Choose the one unused for the longest time - Simple for 2-way, managebale for 4-way, too hard beyond that • Least-frequently used (LFU) • Random - Perfomance ∼ as LRU for low associativity • First-in-first-out (FIFO) - Choose the ine longest in cache - Not so appropriate for caches
5.10
Sources of Misses
• Compulsory misses (aka cold start misses) - First access to a block • Capacity misses - Due to finite cache size - A replaced block is later accessed again • Conflict misses (aka collision misses) - In a non-fully associative cache - Due to competion for entries in a set - Would not occur in a fully associative cache of the same total size
5.11
AMAT
• Average Memory Acces Time (AMAT) AMAT = Hit itme + Miss rate · Miss penalty • Example: - CPU with 1 ns clock - Hit time = 1 cycle - Miss penalty = 20 cycles - I-cache miss rate = 5% - AMAT = 1 + 0.05 · 20 = 2 ns → 2 cycles per instruction
5.12
Impact of Cache Performance on Actual CPI
5.13
Multilevel Caches
Primary (L-1) cache • Attached to CPU • Focus on minimal hit time • Small, but fast • L-1 cache in a multilevel cache hierachy usually smaller than a single cache
16
Level-2 (L-2) cache • Services misses from primary cache • Focus on low miss rate to avoid main mermory access • Larger, slower, but still faster than the main memory • HIt time has less oberall impact • L-2 block size larger than L-1 block size Main memory services L-2 cache misses Some high-end systems include L-3 cache
5.14
Multilevel Cache Example
17...