CS405 CSA Notes- Module 1 PDF

Title	CS405 CSA Notes- Module 1
Author	Shandry deepesh
Course	computer system architecture
Institution	APJ Abdul Kalam Technological University
Pages	38
File Size	1.8 MB
File Type	PDF
Total Downloads	45
Total Views	148

Preview

CLICK TO PREVIEW PDF

Summary

The first module covers different parallel computer architectures...

Description

CS405 COMPUTER SYSTEM ARCHITECTUR

Shandry K K AP CSE

COURSE OUTCOMES Course name: Computer System Architecture: On successful completion of the course student will be able to C403.1 Summarize different parallel computer models. C403.2 Analyze the advanced processor, technologies and memory hierarchy. C403.3 Compare different multiprocessor system interconnecting and cache coherence mechanism. C403.4 Analyze different message passing mechanisms. C403.5 Analyze different pipelining mechanisms. C403.6 Appraise concepts architectures.

of

multithreaded

and

data

flow

Chapter 1

Parallel Computer Models Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani

In this chapter… • THE STATE OF COMPUTING • MULTIPROCESSORS AND MULTICOMPUTERS • MULTIVECTOR AND SIMD COMPUTERS • PRAM AND VLSI MODELS • ARCHITECTURAL DEVELOPMENT TRACKS

4

THE STATE OF COMPUTING Computer Development Milestone • How it all started… o 500 BC: Abacus (China) – The earliest mechanical computer/calculating device. • Operated to perform decimal arithmetic with carry propagation digit by digit o 1642: Mechanical Adder/Subtractor (Blaise Pascal) o 1827: Difference Engine (Charles Babbage) o 1941: First binary mechanical computer (Konrad Zuse; Germany) o 1944: Harvard Mark I (IBM) • The very first electromechanical decimal computer as proposed by Howard Aiken

• Computer Generations o 1st 2nd 3rd

4th

5th

o Division into generations marked primarily by changes in hardware and software technologies

5

THE STATE OF COMPUTING Computer Development Milestone • First Generation (1945 – o Technology & Architecture: 54) • • • •

Vacuum Tubes Relay Memories CPU driven by PC and accumulator Fixed Point Arithmetic o Software and Applications: • Machine/Assembly Languages • Single user • No subroutine linkage • Programmed I/O using CPU o Representative Systems: ENIAC, Princeton IAS, IBM 701 6

THE STATE OF COMPUTING Computer Development Milestone • Second Generation (1955 – o Technology & Architecture: 64) • • • • •

Discrete Transistors Core Memories Floating Point Arithmetic I/O Processors Multiplexed memory access o Software and Applications: • High level languages used with compilers • Subroutine libraries • Processing Monitor o Representative Systems: IBM 7090, CDC 1604, Univac LARC 7

THE STATE OF COMPUTING Computer Development Milestone • Third Generation (1965 – o Technology & Architecture: 74) • • • • •

IC Chips (SSI/MSI) Microprogramming Pipelining Cache Look-ahead processors o Software and Applications: • Multiprogramming and Timesharing OS • Multiuser applications o Representative Systems: IBM 360/370, CDC 6600, T1-ASC, PDP-8 Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University

8

THE STATE OF COMPUTING Computer Development Milestone • Fourth Generation (1975 – o Technology & Architecture: 90) • • • • •

LSI/VLSI Semiconductor memories Multiprocessors Multi-computers Vector supercomputers o Software and Applications: • Multiprocessor OS • Languages, Compilers and environment for parallel processing o Representative Systems: VAX 9000, Cray X-MP, IBM 3090 9

THE STATE OF COMPUTING Computer Development Milestone • Fifth Generation (1991 o Technology & Architecture: onwards) • Advanced VLSI processors • Scalable Architectures • Superscalar processors o Software and Applications: • Systems on a chip • Massively parallel processing • Grand challenge applications • Heterogeneous processing o Representative Systems: S-81, IBM ES/9000, Intel Paragon, nCUBE 6480, MPP, VPP500 10

THE STATE OF COMPUTING Elements of Modern Compute • Computing Problems • Algorithms and Data Structures • Hardware Resources • Operating System • System Software Support • Compiler Support

11

•

THE STATE OF COMPUTING Evolution of Computer Architecture The study of computer architecture involves both the following: o Hardware organization o Programming/software requirements

• The evolution of computer architecture is believed to have started with von Neumann architecture o Built as a sequential machine o Executing scalar data

• Major leaps in this context came as… o o o o

Look-ahead, parallelism and pipelining Flynn’s classification Parallel/Vector Computers Development Layers 12

• Fig 1.3

THE STATE OF COMPUTING Evolution of Computer Architecture

14

THE STATE OF COMPUTING Evolution of Computer Architecture

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University

15

THE STATE OF COMPUTING System Attributes to Performanc • • • • •

Machine Capability and Program Behaviour Peak Performance Turnaround time Cycle Time, Clock Rate and Cycles Per Instruction (CPI) Performance Factors o Instruction Count, Average CPI, Cycle Time, Memory Cycle Time and No. of memory cycles

• System Attributes o Instruction Set Architecture, Compiler Technology, Processor Implementation and control, Cache and Memory Hierarchy

• MIPS Rate, FLOPS and Throughput Rate • Programming Environments – Implicit and Explicit Parallelism 17

THE STATE OF COMPUTING System Attributes to Performanc • Cycle Time � (processor) • Clock Rate �= 1/� • Average no. of cycles per �� instruction �� • No. of instructions in program � • CPU Time = • Memory Cycle Time × �� • No. of Processor Cycles needed × � • No. of Memory Cycles needed • Effective CPU Time ��

18

THE STATE OF COMPUTING System Attributes to Performanc • MIPS Rate �×� �

�

� �×106

= 6 � ×10 = • Throughput 6 Rate • �=

× ��10 � • �=

�

�

�

�×106

× � ��

=

�

19

THE STATE OF COMPUTING System Attributes to Performanc • A program contains 450000 instructions, 320000 arithmetic data transfer instructions and benchmark 230000 control transfer instructions. Each arithmetic instruction takes 1 clock cycle to execute whereas each data transfer and control transfer instruction takes 2 clock cycles o execute. Effective no. of cycles instruction to On a per 400 MHz processors, determine: (CPI) o Instruction execution rate (MIPS rate) o Execution time for this program

20

THE STATE OF COMPUTING System Attributes to Performanc Performance Factors Instruction Count (I c )

System Attribut es

Average Cycles per Instruction (CPI)

Processor Cycle

Processor

Memory

Memory Access Time (�)

Cycles per Instruction (CPI and p)

References per Instruction (m)

Latency (k)

Instruction-set Architecture Compiler Technology Processor Implementation and Control Cache and Memory Hierarchy

21

Multiprocessors and Multicomputers • Shared Memory Multiprocessors o o o o

The UMA Model The NUMA Model The COMA Model The CC-NUMA Model

• Distributed-Memory Multicomputers o The NORMA Machines o Message Passing multicomputers

• Taxonomy of MIMD Computers • Representative Systems o Multiprocessors: BBN TC-200, MPP, S-81, IBM ES/9000 Model 900/VF, o Multicomputers: Intel Paragon XP/S, nCUBE/2 6480, SuperNode 1000, CM5, KSR-1 22

Multiprocessors and Multicomputers

23

24

Multiprocessors and Multicomputers

25

Multiprocessors and Multicomputers

26

27

Multivector and SIMD Computers • Vector o Vector Processor Variants Processors

• Vector Supercomputers • Attached Processors o Vector Processor Models/Architectures • Register-to-register architecture • Memory-to-memory architecture o Representative Systems: • Cray-I • Cray Y-MP (2,4, or 8 processors with 16Gflops peak performance) • Convex C1, C2, C3 series (C3800 family with 8 processors, 4 GB main memory, 2 Gflops peak performance) • DEC VAX 9000 (pipeline chaining support) 28

Multivector and SIMD Computers • SIMD o SIMD Machine Supercomputers

Model• S = < N, C, I, M, R > No. of PEs in the machine • N: Set of instructions (scalar/program flow) directly executed by control • C: unit Set of instructions broadcast by CU to all PEs for parallel execution • I: Set of masking schemes • M: Set of data routing functions o Representative • R: Systems: • MasPar MP-1 (1024 to 16384 PEs) • CM-2 (65536 PEs) • DAP600 Family (up to 4096 PEs) • Illiac-IV (64 PEs)

30

Multivector and SIMD Computers

31

PRAM and VLSI Models • Parallel Random Access o Time and Space Complexities Machines • • • •

Time complexity Space complexity Serial and Parallel complexity Deterministic and Non-deterministic algorithm o PRAM • Developed by Fortune and Wyllie (1978) • Objective: o Modelling idealized parallel computers with zero synchronization or memory access overhead • An n-processor PRAM has a globally addressable Memory 32

PRAM and VLSI Models

33

PRAM and VLSI Models • Parallel Random Access o PRAM Models Machines

o PRAM Variants • EREW-PRAM Model • CREW-PRAM Model • ERCW-PRAM Model • CRCW-PRAM Model o Discrepancy with Physical Models • Most popular variants: EREW and CRCW • SIMD machine with shared architecture is closest architecture modelled by PRAM • PRAM Allows different instructions to be executed on different processors simultaneously. Thus, PRAM really operates in synchronized MIMD mode with shared memory 34

PRAM and VLSI Models • VLSI Complexity Model � o The ��Model • Memory Bound on Chip Area • I/O Bound on Volume �� • Bisection Communicatio n Bound (Crosssection area) � •

Square of this area used as lower bound

35...