IDS Zusammenfassung SS16 - Informatik der Systeme PDF

Title	IDS Zusammenfassung SS16 - Informatik der Systeme
Author	Philipp Friedrich
Course	Informatik der Systeme
Institution	Eberhard Karls Universität Tübingen
Pages	17
File Size	1.3 MB
File Type	PDF
Total Downloads	543
Total Views	896

Preview

CLICK TO PREVIEW PDF

Summary

IDS Zusammenfassung - SS 16Philipp Friedrich25. Oktober 20161 Coding1 Kodierungsarten Quellkodierung (source coding) Daten enthalten oft redundante Informationen Komprimierung →Ziel: Geringerer Speicherbedarf bzw. schnellereUbertragung von Informationen ̈ Verlustbehaftet oder verlustfrei Beispiele K...

Description

IDS Zusammenfassung - SS 16 Philipp Friedrich 25. Oktober 2016

1 1.1

Coding Kodierungsarten

• Quellkodierung (source coding) - Daten enthalten oft redundante Informationen - Komprimierung → Ziel: Geringerer Speicherbedarf bzw. schnellere ¨Ubertragung von Informationen - Verlustbehaftet oder verlustfrei • Beispiele - Komprimierte Datenformate: mpg, jpg, ... - Tools um Daten zu komprimieren: zip, rar, ... - Algorithmen: z.B. Run-Length Encoding, Huffman-Coding • Verschl¨usselung (mehr z.B. in Vorlesung Kommunikationsnetze) • Kanalkodierung (channel coding) ¨ bertragung gegen Bitfehler - Sichert Daten f¨ ur Speicherung oder U - Hinzuf¨ ugen von redundanter Information - Bitfehler k¨onnen erkannt und evtl. sogar korrigiert werden • Leitungskodierung (line coding) ¨ - Aufbereitung von Bits f¨ur eine Ubertragungsleitung • Verfahren auch in anderen Systemen wie z.B. Speichern verwendbar

1.2

Run Length Encoding (Laufl¨ angenkodierung)

• Symbol gefolgt von Anzahl seiner Wiederholungen • Beispiel – Original: Auus die Maaaaauuuuus (21 Zeichen) – Komprimiert: A1u2s1 1d1i1e1 1M1a5u5s1 (24 Zeichen) – Problem: Komprimierte Folge ist l¨anger • Verbesserung – Komprimiere erst ab minimaler Anzahl von Wiederholungen – Kennzeichne die Komprimierung mit Marker (escape) – Wenn dieser im Text auftritt, muss er komprimiert werden • Zum obigen Beispiel – Komprimiere erst ab 3 Wiederholungen – W¨ahle s“ als Marker ” • Komprimiert: Auuss1 die Msa5su5ss1 (21 Zeichen) • Einfaches, verlustfreies (Quell-)Kodierungsverfahren - Entfernt redundante Information durch Ersetzen mittels Rekonstruktionsvorschrift - Kann Input f¨ ur Huffman-Kodierung liefern • Noch ein Beispiel - Gegeben ist Folge von Hexadezimalziffern •Marker: A“ ” Komprimierte Sequenz: wiederholte Hexadezimalziffer Anzahl Wiederholungen als zweistellige Hexadezimalzahl Schranke: 4 • Original: 00000000000000000000FF00000000000000000AAAFFFFFFFF (50 Zeichen) (20x0, 2xF, 17x0, 3xA, 8xF) • Komprimiert: A014FFA011AA03AF08 (14 Zeichen)

1

1.3

Huffman-Kodierung

• Huffman-Kodierung - Liefert beweisbar optimalen Baum f¨ ur Symbole mit gegebenen H¨aufigkeiten (oder alternativ Auftrittswahrscheinlichkeiten) • Algorithmus - Erstelle eine Wald mit einem Baum f¨ur jedes Zeichen! Jeder dieser B¨aume enth¨alt nur einen einzigen Knoten: das Zeichen. Schreibe die H¨aufigkeit des Zeichens an den Baum! - Suche die beiden B¨aume, die die geringste H¨ aufigkeit haben! Fasse beide B¨aume zusammen, indem sie die Teilb¨aume einer neuen Wurzel werden! Berechne die Summe der H¨aufigkeiten der beiden Teilb¨ aume als die H¨aufigkeit des neuen Baumes! - Wiederhole Schritt 2 so oft, bis nur noch ein Baum ¨ubrig ist! - Nutze diesen Baum als Code-Baum zur Kodierung der Zeichen!

1.4

Hamming-Distanz

• Summe (Ci ⊕ Cj ) zweier CWs Ci und Cj - Addition der Stellen mittels Exclusive-OR - Beispiel: 101 ⊕ 100 = 001 • Gewicht w(C) eines CWs C - Anzahl der Einsen in C • Distanz d(Ci , Cj ) zweier CWs Ci und Cj - d(Ci , Cj ) = w(Ci ⊕j ) • Hamming-Distanz eines Codes - h = min(d(Ci , Cj )) ¨ - Mindestens h Stellen eines CWs m¨ussten bei der Ubertragung verf¨alscht werden, damit zuf¨alligerweise ein anderes g¨ ultiges CW dabei entsteht - Mit einem Code mit Hamming-Distanz h kann man bis zu n = h − 1 Bitfehler erkennen und n = | n2 | (abgerundet) Bitfehler korrigieren.

2 2.1

Kapitel 4 - Computer Abstractions and Technology ISA - Instruction Set Architecture

• Abstract interface between the hardware and the lowest level software (aka basic instruction set) • Encompasses all the information necessary to write a machine language program, including instructions, registers, memory access, I/O, ... • Enables implementations of varying cost and performance to run identical software

2.2

ABI - Application binary interface

• Combination of the user portion of the instruction set and the operating system interface • Used by application programmers • Defines a standard for binary portability across computers

2.3

Machine Clock Rate

• Machines proceed in in clock cycles (CCs) → Given in ns, ps

• Clock Rate = 1/CC → given in MHz, GHz

2.4

IS - Instruction Set

• Same IS may be supported by diffrent machines • Perfomace may vary

2

2.5

CPI - Clock cycles per instruction

• Depends on how IS is implemented on a machine • Complex instructions have large CPI

2.6

CPU Time

• Given: programm (instructions) - IC: instruction count - ICi : instruction count per CPI class i • Instruction mix - Influences ICi - Depends on program • Effective CPI - n: number of CPI classes Pn

- CP Ief f = • CPU Time: - tCP U =

2.7

i=1

CP Ii ·I Ci IC

Pn

CP Ii ·I Ci clockrate

i=1

Perfomance

• The performance of machine X for program P is defined as P erformance(X, P ) = tCP U 1(X,P ) • Faster is better • Comparison of two machines X and Y for specific program P t CP U (Y,P ) P erf ormance(X,P ) = n means P erf ormance(Y,P ) = t CP U (X,P ) • Performance of X is n times larger than the one of Y • X is n times faster than Y

2.8

Response time (overall execution time)

• Time between the start and the completion of a task • Important to individual users

2.9

Throughput

• Total amount of work done in a given time • Important to data center managers

2.10

Energy efficiency

• Power consumption especially important for power-limited applications, e.g., in the embedded market where battery life matters

3 3.1

Kapitel 5 - Sprache der Computer CISC - RISC

• CISC: Compley Instruction Set Computer - Single instructions can execute several low-level operations (such as a load from memory, an arithmetic operation, and a memory store) and/or are capable of multi-step operations or addressing modes within single instructions • RISC: Reduced Instruction Set Computer - CPU design strategy based on the insight that simplified instructions can provide higher performance if this simplicity enables much faster execution of each instruction. - Z.B ARM processirs in smart phones and tablet computers (iPad) → MIPS 3

3.2

MIPS Instruction Set

• Mircroprocessor without Interlocked Pipeline Stages • MIPS-32 ISA:

• Instruction Categories: - Computatuinal - Load/Store - Jump and Branch - Floating Point: - Coprocessor - Memory Management - Special

3.3

MIPS Arithmetic Istructions

• Each arithmetic instruction in MIPS assembly language performs one operation • Each specifies exactly three operands that are all contained in the datapath’s register file ($t0,$s1,$s2)

4

3.4

MIPS Formats and Fields

• R-Format: For instructions with up to 3 Registers • I-Format: For instructions with Immediate operands (explicit numbers) • J-Format: For Jump instructions R-Format Example: - op: opcode that specifies the operation - rs: register file address of the first source operand - rt: register file address of the second source operand - rd: register file address of the result’s destination - shamt: shift amount (for shift instructions) - funct: function code augmenting the opcode

I-Format Example:

3.5

MIPS Memory Access Instructions

• MIPS has two basic data transfer instructions for accessing memory - lw $t0, 4($s3) #load word from memory Data is loaded from memory into register $t0 - sw $t0, 8($s3) #store word to memory Data is stored from register $t0 into memory

3.6

MIPS Branch Instructions

• MIPS conditional bne $s0, $s1, Lbl beq $s0, $s1, Lbl Example: if (i==j) h = i +

branch instructions (uses I-Format) → #go to Lbl if $s06=$s1 → #go to Lbl if $s0=$s1 j;

bne $s0, $s1, Lbl1 add $s3, $s0, $s1 Lbl1: ...

5

• Set on less than instruction (R-Format): slt $t0, $s0, $s1 if( $s0 < $s1 then $t0 = 1 else $t0 = 0 • Less than or equal to: ble $s1, $s2, Label • Greater than: bgt $s1, $s2, Label • Greater than or equal to: bge $s1, $s2, Label

3.7

Summary of Operands in MIPS

• Register operands (R-Format): - Direct access - Used by arithmetic instructions - Only 32 registers to enable dast access in implementations • Immediate operands (I-Format): - Explicit numbers in instrctions • MIPS register 0 ($zero) is the constant 0 - Cannot be overwritten - Useful for common operations: add $t2, $s1, $zero • Memory operands - Only accessible via load and store instructions → Compilers use registers for variables as much as possible - Only spill to memory for less frequently used variables - Register optimization is important!

6

3.8

Leaf- and Non-Leaf Procedures

• Procedures - Must save register values $sx on the stack if it makes use of these registers - Must restore them before return - Can overwrite arguments $ax and temporaries $tx • Leaf procedure - Does not call other procedures • Non-leaf procedure - Calls other procedures (nested calls) - For nested call, caller needs to save on the stack: → Its return address → Any arguments $ax and temporaries $tx needed after the call - Registers restored from the stack after the call

3.9

3.10

Leaf Procedure Example

Non-Leaf Procedure Example

7

4 4.1

Kapitel 6 - Der Prozessor Pipelining

• Subdivide single instruction into multiple stages • Make clock cycle shorter and process one stage of an instruction per clock cycle • Process stages of consecutive instructions in parallel

• IFetch: Instruction Fetch and Update PC • Dec: Registers Fetch and Instruction Decode • Exec: Execute R-type; calculate memory address • Mem: Read/write the data from/to the Data Memory • WB: Write the result data into the register file → Clock cycle (pipeline stage time) is limited by the slowest stage - For some stages don’t need the whole clock cycle (e.g., WB) - For some instructions, some stages are wasted cycles (i.e., nothing is done during that cycle for that instruction) CPU time = CPI * CC * IC

4.2

Pipeline Hazards

• Structural hazards: attempt to use the same resource by two diffrent instructions at the same time • Data hazards: attempt to use data before its ready • Control hazards: attempt to make a decision about the program control flow before the condition has been evaluated and the new PC target adress calculated, e.g Branch and jump instructions, exeptions • Hazards can usually be resolved by waiting; Pipeline control must detect the hazard and take action to resolve it

4.3

Structural Hazard

Single Memory

8

Register File Access?

4.4

Data Hazard

Due to Register Usage

→ Pipeline Stalling to ”F ix” a Data Hazard

9

→ Data forwarding to fix a Data Hazard

Due to Loads

4.5

Control Hazards

• When the flow of instruction addresses is not sequential (i.e., PC = PC + 4) • Caused by change of flow instructions - Unconditional branches (j, jal, jr) - Conditional branches (beq, bne) - Exceptions • Occur less frequently than data hazards, • But there is nothing as effective against control hazards as forwarding is for data hazards → How to deal with control hazards • Stall (impacts CPI) • Move decision point as early in the pipeline as possible, thereby reducing the number of stall cycles • Predict and hope for the best!

10

4.6

Unconditional Jumps

4.7

Conditional Jumps

→ One way to fix a Branch Control Hazard

11

→ Another way to fix a Branch Control Hazard

4.8

Dynamic Branch Prediction

• Predicts branches dynamically using run-time information - Remembers last (one or two) outcomes of a particular inst addr - Assumes same outcome for next decision of that inst addr • Branch prediction buffer (aka branch history table (BHT)) - Remembers outcome of last one/two decisions (one/two-bit predictors) - In the IF stage addressed by the lower bits of the PC for prediction - Contains bit(s) passed to the ID stage through the IF/ID pipeline register that tells whether the branch was taken the last time it was executed - Branch decision occurs in the ID stage after determining that the fetched instruction is a branch and checking the prediction bit(s) • When outcome of decision is available - If the prediction is wrong, flush the incorrect instruction(s) in pipeline, restart the pipeline with the right instruction - Modify the prediction bit(s) in the BHT accordingly • Wrong predictions don’t affect correctness, just performance

4.9

Exceptions - Other Form of Control Hazards

• Exceptions - Events requiring immediate attention by the processor - Control passed from the user program to an exception handler of the OS • Interrupts (exception type) - Caused by external events – asynchronous to program execution - Instructions in pipeline can complete before control passed to the OS interrupt handler - Simply suspend and resume user program • Traps (exception type) - Caused by internal events – synchronous to program execution - Offending instruction must be stopped midstream and control immediately passed to the OS trap handler to remedy the cause for the trap - The offending instruction may be retried (or simulated by the OS) and the program may continue or it may be aborted • Nomenclature for exceptions, interrupts, and traps not consistently used

12

5 5.1

Kapitel 7 - Memory Hierachy Hierachy Levels

• Disk - Lowest Level • Main memory (DRAM) • Cache memory - (SRAM) Attatched to CPU, Highest Level

5.2

Actions on Cache Hit and Cache Miss

• On Cache hit: CPU proceeds normally • On Cache miss: - Stall the CPU pipeline - Fetch block from next level of hierachy - Instrctuins cache miss: Restart instruction fetch - Data cache miss: Complete data access

5.3

Data Transfer

• Example cache block read - 1 bus cycle for address transfer - 15 bus cycles per DRAM access - 1 bus cycle per data transfer • For 4-word block, 1-word-wide DRAM - Miss penalty = 1 + 4*15 + 4*1 = 65 bus cycles - Bandwidth = 16 bytes / 65 cycles ∼ 0.25 bytes/cycle

13

5.4

Handling Write-Hits

• On data-write hit, could just update the block in cache, but then cache and memory would be inconsistent. Write-through strategy • Updates block in cache and memory • Takes longer, ex: - Base CPI = 1 - 10 % of instructions are stores - Write to memory takes 100 cycles - Effective CPI = 1 + 0.1 · 100 = 11 • Solution: write buffer - Hold data waiting ro be wirtten to memory - CPU continues immediately - Only stalls on write if write buffer is already full Write-back strategy • Updates only block in cache - Keeps track of whether blocks are ’dirty’ - Writes dirty block back to memory when it is replaced by another block - Can use a write buffer to allow replacing block to be read first • More state needed • More complex on replacement

5.5

Handling Write-Misses

Strategies in combination with write-through • Write-allocate: fetch the block and write into cache • No write-allocate: don’t fetch the block - Write only into memory - Useful when programs write a whole block before reading it (e.g., initialization) • Strategy may be specific to memory page Strategies in combination with write-back • Usually only write-allocate • Takes longer than with write-through - Write-back needed for replaced block before overwriting cache memory

5.6

Direct-Mapped Cache Example

14

5.7

Fully Associative Cache

• Allows a given block to go in any cache enty - Tag is entire adress - Index is not used • Retrieval: search all entries of the cache at once - Requires one comparator per entry - Expensive hardware - Example: cache with S=2n blocks a 2m bytes requires 2n comparators Tag in Set Associative Caches

5.8

Comparison of Cache Variants

15

5.9

Replacement Policy for Caches

• Least-recently used (LRU) - Choose the one unused for the longest time - Simple for 2-way, managebale for 4-way, too hard beyond that • Least-frequently used (LFU) • Random - Perfomance ∼ as LRU for low associativity • First-in-first-out (FIFO) - Choose the ine longest in cache - Not so appropriate for caches

5.10

Sources of Misses

• Compulsory misses (aka cold start misses) - First access to a block • Capacity misses - Due to finite cache size - A replaced block is later accessed again • Conflict misses (aka collision misses) - In a non-fully associative cache - Due to competion for entries in a set - Would not occur in a fully associative cache of the same total size

5.11

AMAT

• Average Memory Acces Time (AMAT) AMAT = Hit itme + Miss rate · Miss penalty • Example: - CPU with 1 ns clock - Hit time = 1 cycle - Miss penalty = 20 cycles - I-cache miss rate = 5% - AMAT = 1 + 0.05 · 20 = 2 ns → 2 cycles per instruction

5.12

Impact of Cache Performance on Actual CPI

5.13

Multilevel Caches

Primary (L-1) cache • Attached to CPU • Focus on minimal hit time • Small, but fast • L-1 cache in a multilevel cache hierachy usually smaller than a single cache

16

Level-2 (L-2) cache • Services misses from primary cache • Focus on low miss rate to avoid main mermory access • Larger, slower, but still faster than the main memory • HIt time has less oberall impact • L-2 block size larger than L-1 block size Main memory services L-2 cache misses Some high-end systems include L-3 cache

5.14

Multilevel Cache Example

17...