UNIT I - Introduction - Lecture notes 5 PDF

Title	UNIT I - Introduction - Lecture notes 5
Course	Distributed system
Institution	Rashtrasant Tukadoji Maharaj Nagpur University
Pages	10
File Size	246.1 KB
File Type	PDF
Total Downloads	54
Total Views	175

Preview

CLICK TO PREVIEW PDF

Summary

Download UNIT I - Introduction - Lecture notes 5 PDF

Description

“NIT- Department of Information Technology”

SESSION 2017-18

Name of Subject: Distributed Systems Subject Code: BEIT801T

TOIPC CODE

Semester: VIII-SEM

TOPIC TO BE COVERED

PAGE NO.

1.1

Distributed Computing System

2

1.2

Architecture for Distributed System

3

1.3

Hardware concepts

2-4

1.4

Software concepts

4-7

1.5

Advantages & distributed system

1.6

Disadvantage

Issues in design of a distributed operating system.

“Notes Compiled By: - Prof. Ashish Palandurkar”

7-11

11

“NIT- Department of Information Technology”

1.1 Distributed Computing System Over the past two decades, advancements in microelectronic technology have resulted in the availability of fast, inexpensive processors, and advancements in communication technology have resulted in the availability of cost-effective and highly efficient computer networks. The net result of the advancement in these two technologies is that the price performance ratio has now changed to favour the use of interconnected multiple processors in place of a single, high-speed processor. Computer architecture consisting of interconnected; multiple processors are basically of two types: 1. Tightly coupled systems: In these systems, there is a single system wide primary memory (address space) that is shared by all the processors. If any processor writes; for example, the value 100 to the memory location x, any other processor subsequently reading from location x will get the value 100. Therefore, in these systems, any communication between the processors usually takes place through the shared memory. 2. Loosely coupled systems: In these systems, the processors do not share memory, and each processor has its own local memory. If a processor writes the value 100 to the memory location x, this write operation will only change the contents of its local memory and will not affect the contents of the memory of any other processors. Hence, if another processor reads the memory location x, it will get whatever value was there before in that location of its own local memory .In these systems, all physical communication between the processors is done by passing messages across the network that interconnects the processors. Usually, Tightly coupled systems are referred to as parallel processing systems, and Loosely coupled systems are referred as distributed computing systems, or simply distributed systems. A distributed system is a collection of autonomous computers linked by a computer network that appear to the users of the system as a single computer. Some comments: • System architecture: the machines are autonomous; this means they are computers which, in principle, could work independently; • The user’s perception: the distributed system is perceived as a single system solving a certain problem (even though, in reality, we have several computers placed in different locations). By running a distributed system software the computers are enabled to: - coordinate their activities - share resources: hardware, software, data.

“Notes Compiled By: - Prof. Ashish Palandurkar”

“NIT- Department of Information Technology”

1.2 Architecture for Distributed System

Various hardware and software architectures are used for distributed computing. At a lower level, it is necessary to interconnect multiple CPUs with some sort of network, regardless of whether that network is printed onto a circuit board or made up of loosely coupled devices and cables. At a higher level, it is necessary to interconnect processes running on those CPUs with some sort of communication system. Distributed programming typically falls into one of several basic architectures or categories: client–server, 3-tier architecture, n-tier architecture, distributed objects, loose coupling, or tight coupling. Client–server: Smart client code contacts the server for data then formats and displays it to the user. Input at the client is committed back to the server when it represents a permanent change. 3-tier architecture: Three tier systems move the client intelligence to a middle tier so that stateless clients can be used. This simplifies application deployment. Most web applications are 3-Tier. n-tier architecture: n-tier refers typically to web applications which further forward their requests to other enterprise services. This type of application is the one most responsible for the success of application servers. Tightly coupled (clustered): refers typically to a cluster of machines that closely work together, running a shared process in parallel. The task is subdivided in parts that are made individually by each one and then put back together to make the final result. Peer-to-peer: an architecture where there is no special machine or machines that provide a service or manage the network resources. Instead all responsibilities are uniformly divided among all machines, known as peers. Peers can serve both as clients and servers. Space based: refers to an infrastructure that creates the illusion (virtualization) of one single address-space. Data are transparently replicated according to application needs. Decoupling in time, space and reference is achieved. Another basic aspect of distributed computing architecture is the method of communicating and coordinating work among concurrent processes. Through various message passing protocols, processes may communicate directly with one another, typically in a master/slave relationship. Alternatively, a "database-

“Notes Compiled By: - Prof. Ashish Palandurkar”

“NIT- Department of Information Technology”

centric" architecture can enable distributed computing to be done without any form of direct inter process communication, by utilizing a shared database. 1.3 Hardware and Software concepts Hardware Concepts 1. Multiprocessors 2. Multi computers 3. Networks of Computers Distinguishing Features:   Private versus shared memory   Bus versus switched interconnection Networks of Computers High degree of node heterogeneity:   High-performance parallel systems (multiprocessors as well as multicomputers)  High-end PCs and workstations (servers)   Simple network computers (offer users only network Access)   Mobile computers (palmtops, laptops)   Multimedia workstations High degree of network heterogeneity:   Local-area gigabit networks   Wireless connections   Long-haul, high-latency connections   Wide-area switched megabit connections Observation: Ideally, a distributed system hides these Differences 1.4 Software Concepts   Distributed operating system   Network operating system  Middleware Distributed Computing System Models Various models are used for building Distributed Computing System. These models can be broadly classified into five categories – minicomputer, workstation, workstation-server, processor pool, and hybrid. They are described below.

“Notes Compiled By: - Prof. Ashish Palandurkar”

“NIT- Department of Information Technology”

1. Minicomputer Model is a simple extension of the centralized time-sharing system. As shown in fig., a Distributed Computing System based on this model consists of a few minicomputer interconnected by a communication network. Each minicomputer usually has multiple users simultaneously logged on it. For this, several interactive terminals are connected to each minicomputer. Each user is logged on to one specific minicomputer, with remote access to other minicomputers. The network allows a user to access remote resources that are available on some machine other than the one on to which the user is currently logged. The minicomputer model may be used when resources sharing (such as sharing of information databases of different types, with each type of database located on a different machine) with remote users is desired. Example- ARPAnet is an example of a Distributed Computing System based on the minicomputer model. 2. Workstation Model as shown in fig., a Distributed Computing System based on the workstation model consists of several workstations interconnected by a communication network. A company’s office or a university department may have several workstation scattered throughout a building or campus, each workstations equipped with its own disk and serving as a single-user computer. It has been often found that in such an environment, at any one time, a significant proportion of the workstations are idle, resulting in the waste of large amounts of CPU time. Therefore, the idea of the workstation model is to interconnect all these workstation may be used to process jobs of users who are logged onto other workstations and do not have sufficient processing power at their own workstation to get their jobs processed efficiently. In this model, a user logs onto one of the workstation called his or her “home” workstation and submits jobs for execution. When the system finds that the user’s workstation does not have sufficient processing power for executing the processes of the submitted jobs efficiently, it transfers one or more of the processes from the user’s workstation to some other workstation that is idle and gets the process executed there, and finally the result of execution is returned to the user’s workstation. This model is not so simple to implements as it might appear at first sight because several issues must be resolved. These issues are as follows:  How does the system find the idle workstation?  How is a process transferred from one workstation to get it executed on another workstation?  What happens to a remote process if a user logs onto a workstation that was idle until now and was being used to execute a process of another workstation? Three commonly used approaches for handling the third issues are as follows:  The first approach is to allow the remote process share the resources of the workstation along with its own logged-on user’s processes. This method is easy to implement, but it defeats the main idea of workstation serving as personal computers, because if remote processes are allowed to execute simultaneously with the logged-on user’s own processes, the logged-on user does not get his or her guaranteed response.  The second approach is to kill the remote process. The main drawbacks of this method are that all processing done for the remote process gets lost and the file system may be left in an inconsistent state, making this method unattractive.  The third approach is to migrate the remote process back to its home workstation, so that its execution can be continued there. This method is different to implement because it require the system to support pre-emptive process migration facility.

“Notes Compiled By: - Prof. Ashish Palandurkar”

“NIT- Department of Information Technology”

Example- The Sprite system and an experimental system developed at Xerox PARC. 3. Workstation-Server-Model is a network of personal workstations, each with its own disk and a local file system. A workstations with its own local disk is usually called a diskful workstations and a workstations without a local disk is called a diskless workstations. With the proliferation of high-speed networks, diskless workstations have become more popular in network environments than diskful workstations, making the workstations-server model more popular than the workstation model for building Distributed Computing Systems. As shown in fig., a Distributed Computing System based on the workstations server model consists of a few minicomputers and several workstations interconnected by a communication network. Advantages:  In general, it is much cheaper to use a few minicomputer equipped with large, fast disks that are accessed over the network than a large number of diskful workstations, with each workstations having a small, slow disk.  Diskless workstations are also preferred to diskful workstations from a system maintenance point of view. Back up and hardware maintenance are easier to perform with a few large disks than with many small disks scattered all over a building or campus. Furthermore, installing new release of software is easier when the software is to be installed on a few file server machines than on every workstation.  In the workstation-server model, since all files are managed by the file servers, users have the flexibility to use any workstation and access the files in the same manner irrespective of which workstation the user is currently logged on. Note that this is not true with workstation model, in which each workstation has its local file system, because different mechanisms are needed to access local and remote files.  In the workstations-server model, the request-response described above is mainly used to access the services of the server machines. Therefore, unlike the workstations model, this model does not need a process migration facility, which is difficult to implement.  A user has guaranteed response time because workstations are not used for executing remote processes. However, the model does not utilize the processing capability of idle workstations. Examples- V-system. 4. Processor-Pool Model is based on the observation that most of the time a user does not need any computing power but once in a while he or she may need a very large amount of computing power for short time. Therefore, unlike the workstation-server model in which a processor is allocated to each user, in the processor-pool model the processors are pooled together to be shared by the users as needed. The pool of processors consists of a large number of microcomputers and minicomputers attached to the network. Each processor in the pool has its own memory to load and run a system program or an application program of the distributed computing system. As shown in figure, in the pure processor-pool model, the processors in the pool have no terminals attached directly to them, and users access the system from terminals that are attached to the network via special devices. These terminals are either small diskless workstation or graphic terminals, such as X terminals. A special server manages and allocates the processors in the pool to different users on a demand basis. When a user submits a job for computation, an appropriate number of processors are temporarily assigned to his or her job by the run server. For example, if the user’s computation job is the compilation of a program having n segments, in which each of the segments can be compiled independently to produce separate re-locatable object files, n processors from the pool can be allocated to this job to compile all the n segments in parallel. “Notes Compiled By: - Prof. Ashish Palandurkar”

“NIT- Department of Information Technology”

When the computation is completed, the processors are returned to the pool for use by other users. In the processor- pool model there is no concept of a home machine. That is, a user does not log onto a particular machine but to the system as a whole. This is in contrast to other models in which each user has a home machine onto which he or she logs and runs most of his or her programs there by defaults. As compared to the workstation-server model, the processor-pool model allows better utilization of the available processing power of a distributed computing system. Example- Amoeba and the Cambridge distributed computing system. 5. Hybrid model – To combine the advantage of the workstation-server and processors-pool models, a hybrid model may be used to build a distributed computing system. The hybrid model is based on the workstationserver but with the addition of pool processors. The processors in the pool can be allocated dynamically for computations that are too large for workstations or to that require several computers concurrently for efficient execution. This model gives guaranteed response to interactive jobs by allowing them to be processed on local workstation of the users. It is more expensive to implement than the workstation-server and processors-pool models.

1.5 Advantages & Disadvantage distributed system 1: Incremental growth: Computing power can be added in small increments 2: Reliability: If one machine crashes, the system as a whole can still survive 3: Speed: A distributed system may have more total computing power than a mainframe 4: Open system: This is the most important point and the most characteristic point of a distributed system. Since it is an open system it is always ready to communicate with other systems. An open system that scales has an advantage over a perfectly closed and self-contained system. 5. Economic: Microprocessors offer a better price/performance than mainframes

Disadvantages of Distributed Systems over Centralized ones 1:The distributed systems will have an inherent security issue. 2:Networking: If the network gets saturated then problems with transmission will surface. 3:Software:There is currently very little less software support for Distributed system. 4:Troubleshooting: Troubleshooting and diagnosing problems in a distributed system can also become more difficult, because the analysis may require connecting to remote nodes or inspecting communication between nodes. Design Issues with Distributed Systems Design issues that arise specifically from the distributed nature of the application: • Transparency • Communication • Performance & scalability “Notes Compiled By: - Prof. Ashish Palandurkar”

“NIT- Department of Information Technology”

• Heterogeneity • Openness • Reliability & fault tolerance • Security Transparency ☞ How to achieve the single system image? ☞ How to "fool" everyone into thinking that the collection of machines is a "simple" computer? • Access transparency - Local and remote resources are accessed using identical operations. • Location transparency - Users cannot tell where hardware and software resources (CPUs, files, data bases) are located; the name of the resource shouldn’t encode the location of the resource. • Migration (mobility) transparency - Resources should be free to move from one location to another without having their names changed. Replication transparency - The system is free to make additional copies of files and other resources (for purpose of performance and/or reliability), without the users noticing. Example: several copies of a file; at a certain request that copy is accessed which is the closest to the client. • Concurrency transparency - The users will not notice the existence of other users in the system (even if they access the same resources). • Failure transparency - Applications should be able to complete their task despite failures occurring in certain components of the system. • Performance transparency - Load variation should not lead to performance degradation. This could be achieved by automatic reconfiguration as response to changes of the load; it is difficult to achieve. Communication ☞ Components of a distributed system have to communicate in order to interact. This implies support at two levels: 1. Networking infrastructure (interconnections & network software). 2. Appropriate communication primitives and models and their implementation: • Communication primitives: - send - receive - remote procedure call (RPC) • communication models - Client-server communication: implies a message exchange between two processes: “Notes Compiled By: - Prof. Ashish Palandurkar”

“NIT- Department of Information Technology”

the process which requests a service and the one which provides it; - Group muticast: the target of a message is a set of processes, which are me which are members of a given group. Performance and Scalability Several factors are influencing the performance of a distributed system: • The performance of individual workstations. • The speed of the communication infrastructure. • Extent to which reliability (fault tolerance) is provided (replication and preservation of coherence imply large overheads)....