COEN 317 - Lecture note COEN 317 PDF

Title	COEN 317 - Lecture note COEN 317
Course	Web Services
Institution	Santa Clara University
Pages	44
File Size	3.3 MB
File Type	PDF
Total Downloads	61
Total Views	154

Preview

CLICK TO PREVIEW PDF

Summary

Lecture note COEN 317...

Description

Class 1: Transaction: a group of read and write operations. Internet is a form of distributed system. Hosts are connected by network. Distributed system: reduce latency and increase throughput. Definition of Distributed Systems (3): You know you have one when the crash of a computer you have never heard of stops you from getting any work done: because of the transparency of distributed system, if there is a crash, you do not know which component is causing the crash -> make distributed system more error-prone. Class 2:

Green layer: appears to clients that it is a single computer even if there are 3 computers. WAN has higher scalability compared to LAN. Centralized server (One server for all users): as more and more users start hitting that particular server, and your response time grows, that becomes performance bottleneck. Centralized data: one file server for database, single file server / single database, if file server is down, all data is gone (single point of failure). When you start querying database, read from the disk and load to buffer to query it, thus read and write operation is slow operation. Response time will go up, number of users you could service becomes problematic. Decentralize the system: servers can make decisions locally. If one system dies, other systems can make decisions. You will have local clocks for each system depending on geographies.

Transmission a message and get reply comes back indicating I received your message is called synchronous communication. If not get back, assume package lost and retransmit. -> work for small geographical area because of response time, thus not scalable in wide area network. To address this issue, use asynchronous communication model -> client sends to actual server and then once the message is transmitted, then server moves on with doing his own task and whenever the other guys receives it, it will go ahead and reply back. -> may be unreliable because if the message is lost, there is no way for client to know about it. -> client may reinitiate the request. Synchronous communication -> have scalability problem.

(a) Is better for clients having limited resources such as mobile applications In general, (b) is better one

Objects are scattered on multiple hosts. These objects use RMI (remote method invocation) or RPC to call methods of another object. Example: Scp / ring protocol: client use ring protocol to get highest ID from server, in this case, ring protocol is RPC.

One component publish, and other components subscribe to receive the message.

Asynchronous communication, thus highly scalable. Because component that publishes the data do not need response from other components. And after publishing the data, that component will do other tasks instead of waiting for the response. Tier relates to physical hardware. Layers are within tiers. Two-tier: one tier for client and one tier for server. Publish event: unknown receiver, do not know who can receive the event. Class 3:

(1) is direct end-to-end process-to-process communication, which is physical network topology and (2) is layered network topology because the message transmitted from P1 to P2 and vice versa need to go through a set of intermediate servers

P1 can be another group of processes. Communication cannot be done without knowing identifiers of participating entities. Example: (3) if P1 wants to send message to P2, I need to know the id of process P2 such as IP address or process ID. In the case of group communication in (1), P1 does not need to know id of P2, P3, P4, message is sent to a group having a particular ID, message will be multicast to a group with ID, message is sent to that group ID, message will then multicasted to members with that group ID.

Q: Because user Agent does not know which service Agent can handle its request, so user Agent sends multicast message to all service Agent? Suppose Service Agent A can handle user Agent’s request, then there is a one-to-one connection established between user agent and service agent A and request is sent from user agent to service agent and response is sent from service agent back to user agent? Could be either depending on protocol (type of data transmitted)

Where to specify myid in (1) ? It is specified somewhere previous to MPI_Comm_size Buffer is information transmitted. What does MPI_Bcast mean? Broadcast to all processes with same group ID Comminucation world: processes with same group ID forms a communication world. Buffer is message

Connection-oriented: connection has to be established between sender and receiver before message gets transmitted. Ex: if you want to call P2, once you call the person and that person picks up the phone, connection first gets established. Connectionless: connection does not need to be established before message is sent.

if message transmission is non-blocking, that means once the message is dropped off on the communication channel, the client will continue to execute until a server triggers a callback method. Ex: voice transmission. Asynchronous: send is non blocking send; callback mechanism Synchronous: send is blocking send

If there is priority in request, P1 has to know the port of P2 that handles higher priority request and port of P2 that handles lower priority request before sending request to P2

Protocol supporting layered communication: protocol suite.

Q: if host A wants to send message or request to host B, how can host A choose the IP address of Router1, IP address of Router 2 and then IP address of host B? How Router 1 chooses the IP address of Router2 and how Router2 chooses the IP address of host B? Network layer will determine the IP address of all routers in order to reach IP address of host B. Host A knows the IP address of host B. And network layer determines IP address of intermediate routers that can reach to host B.

Q: HTTP is application level protocol because it defines GET, POST, DELETE, PUT for a specific application only. For example: HTTP methods used for FB cannot be applied for Amazon? SMTP is application level protocol because it is for email application only? yes Q: we say UDP has better performance than TCP because it does not need 3-way handshake only? (3-way handshake consumes a lot) After establishing 3-way handshake, does TCP has the same performance in transmitting data packet as UDP? Yes.

Class 4: Remote method/procedure call: a process calls a remote procedure in another process Remote object call: a method as part of an object, a process remotely invoke that object in another process.

Client send message “Hello, world” and server echos back (echo program) (1) Instantiate a socket object and open a stream, SOCK_STREAM: TCP (stream of bytes) Q: s.connect(HOST, PORT) : performs 3 way handshake? s.bind(HOST, PORT): bind socket to that particular host and port. Socket is associated with that host and port. When bind happens, no other server can bind to that particular port. s.accept(): accept the connection. conn.recv(1024): 1024 refers to buffer size Q: In this case. while 1 statement in server is redundant since client sends at most 1024 bytes to server? For the client: it is one time connection. For the server: conn.close(): disconnect the server

Q: Why hello_1_svc is called if client does not explicit invoke this method Q: if we have hello_1_svc on client side and hello_1_svc on server side with different implementation, does client know which hello_1_svc to call? Also what should client include in order to call hello_1_svc on server side?

Q: once comp.executeTask(task) is executed, the main function of ComputeEngine is executed? Q: since Compute does not implement executeTask, why main function of ComputeEngine can be executed?

Q: what is args[1]? Q: how can Compute object be registered to Compute Registry?

Iterative processing has scalability problem

Based on the rules defined in repository with conversion rules and programs, decide which outgoing queue to send message to in (1). Destination client subscribes with particular message queue. Source publish a message to that particular message queue and then the message is published and then sent to destination client based on the rules defined.

(1) Listen: willing to accept connection

(1) Associate an ID to this communication world (2) For each process you have, you are making 1-to-1 connection to those processes. Q: what does i, 0 mean in (2)? Q: hwy MPI_Send and MPI_Recv runs in blocking mode? Q: synchronous does not help scalability because you need to wait for response to come back for each request you send?

POLL is down at server side (server polls the message queue to see whether there is any new message) and NOTIFY is down at message queue layer. Why we have notify? Message queue notifies the receiver that there are some messages so that receiver can GET What is the difference between GET and POLL?

Server receives the message containing process name, which is add, argument i, j, and unpack the message, and execute add, and return the result, which is k, back to client. Q: how to add implementation of add in server? Written by server developer and exposed as API to client

Client sends the message to server and both client and server continue to execute. Q: will callback function be triggered if there is any error executing local procedure or if succeed in executing the local procedure? Yes, callback is triggered by server Class 5: web services

Client marshals (creates) the message, that includes the method to call, the actual values passed to that particular function. The actual message gets passed to the server machine, and server receives client stub, unpack it, execute the message and send the result back to client.

Client make a call to a particular object, which is located on a server. RMI Registry has the mapping of objects to RMI Server. Whenever you add an object to RMI Server, RMI Server publishes the object to RMI Registry, and then RMI client send message to RMI Registry to locate the object. Once object is located, then RMI Client will invoke the method inside object. More related to object-oriented programming. WEB Services My server machines have backend functionalities that need to be exposed to client. WEB API/REST API: exposed to Application to call.

(1) Q: What does internal service do in client? Internal service performs the actual functions for each client such as billing.

Example: In BestBuy, if phone submit an order to buy a Dell laptop, client in (1) is the BestBuy user interface, company B is BestBuy company, and company A is Dell company, the http request will be sent to internal service of company B, which will issue a billing to client as response, and web service of company A will send the order to internal service of company A through middleware, which will do actual job such as getting the laptop from Dell warehouse. Asynchronous transmission mode: keep sending packets to communication medium Synchronous transmission mode: if not receive the packet, then retransmit. Q: Isochronous transmission mode: Min time delay is for error detection such packet out of order. For example, if receiver sends back packet 1, 2 back to sender, sender should also receive packet 1, 2, however, if packet 2 is received by sender less than min time delay, sender may receive packet 2, 1, which is misorder. Atomic Multicast: if one process does not receive the message, the other processes have to ignore the message Gossip protocol:

Q: If new member (in this case: J), what criteria determines which nodes are connected to J? Same mechanism as tree structure If a node fails, we have to repair the link.

We also need to drop non-useful link. Add useful link: add more reliability. Failure Detection: Two failure detector algorithms: heart-beating and Ping-Ack Heart-beating, Ping-Ack under good network condition: 100% completeness but not 100% accuracy. Q: do we have an example that does not have 100% completeness? Heart and ping due to Network partition. Because we cannot find failure if router crashes. Asynchronous system: there is no time bound for other process to reply, thus for Heart-beating and Ping-Ack, if other process is busy and reply late, it might be regarded as broken. Synchronous system: there is a time bound, the process has to process the response within time bound, accuracy is achieved. Process pj has to reply within that upper bound. If pj does not reply, then there is failure that pj does not reply within that particular time bound.

Network failure could always be there. Heart-beating and Ping-Ack are used for detecting process failure. Q: in this case, can we assume network condition is always good? yes Different types of heart-beating: Centralized Heart-beating Ring heart-beating: downside: 1. Delay in some nodes can cause delay for message to be received to other nodes. 2. Nodes would be unnecessarily busy forwarding message. All-to-All Heart-beating: downside: nodes would also be unnecessarily busy. Also difficulty to know the members in whole environment. Node is sometimes difficult to know the complete list of system in a dynamic environment.

Q: Use an example to simulate? If a node does not receive a pair of messages, then its neighbor is broken.

For example,if process 2 is ok, process 1 can receive i = 1 from process 2 and process 5, and also can receive i = 4 from process 5, however, if process 2 is broken, process 1 can only receive i = 1 from process 5, cannot receive i = 4, thus, process 1 can determine that process 2 is broken.

Class 6: Two failure detection method: Ping-Ack (2T) and Heart-beating protocol. (T) Completeness = every process failure is eventually detected. All-to-All Heart-beating: hard for each process to keep track of overall picture of network because network in P2P is dynamic. Measuring Speed: Detection Time. How quickly can you detect the failure. Measuring Accuracy: Depends on distributed application. Number of false failure detections per time unit (false positive): failure detector is inaccurate False negative: failure happens but failure detector does not detect it. Completeness is violated.

(1) Actual message is lost when writing to outgoing message buffer -> send-omission. May result from: 1. process p crash. 2. Buffer is full. Lose message from process itself to outgoing message buffer. (2) Channel omission: result from router crash. Communication channel is unreliable. (Communication channel is generally reliable) (3) Receive omission: May result from process q crash Arbitrary failures: 1. Arbitrary process failure: Ex: code bug 2. Arbitrary channel failure: Ex: packages are delivered in incorrect order / duplicate package received / Non-existent messages may be delivered. Timing failures: Timing failure is a failure of a process, or part of a process, in a synchronous distributed system or real-time system to meet limits set on execution time, message delivery, clock drift rate, or clock skew. In synchronous distributed systems: there is time bound (upper bound and lower bound) thus timing failures such as clock drift rate is applicable. In asynchronous distributed system: not applicable. In multimedia distributed systems: applicable. Because streaming data needs limited amount of time to be processed.

(1) Q: Process has many tasks than threshold to execute or process takes longer time than expected to execute task -> solution: 1. create more hardware so that create more process to reduce the execution time. 2. Vertical scale. Process is following algorithm. state1 ->state2 [unsorted array] -> [sorted array] If it takes time than threshold to sort the array (2) Solution: increasing number of communication channels.

Q: Process pi can determine process 1, 2, 3, 4, 5 are alive or not, but how can we determine whether pi is alive? Process 1,2,3,4,5 or other coordinator can take over the role of pi

On top of a server, I can run different types of OS, and consolidate different types of application. VM are isolated, providing resource sharing advantage. Q: resources are shared among VMs? yes I can expose one of VM to engineering department, one of VM to HR department. Without VM, I have to buy one hardware for HR, one hardware for engineering department. For this mechanism to work, we need hypervisor (software that controls VM (creation of VM, destroy of VM, manage resource of VM)) What resources does VM need? CPU, memory, Network interface and disk space Hypervisor exposes these hardware resources as virtualized resources to each of those VMs. Any requests that gets initiated has to go through hypervisor. Hypervisor manages virtual CPUs, virtual memory, virtual NIC, virtual disk and so on. Virtualization layer: virtualize physical resources in your hardware. (virutal CPU, virtual memory, virtual disk, and so on) managed by hypervisor. There is overhead associated with hypervisor. Everything has to go through hypervisor. Performance reduced by around 10%.

Full virtualization and Platform virtualization Full virtualization: complete simulation of underlying hardware Platform virtualization: limited simulation of underlying hardware. Have limited number of apps that can run.

Native VM: VMM(hypervisor) provides a virtualization layer. On top of hypervisor, I can have multiple VMs, those VMs are known as native VMs. Guest OS does not know it is running in a virtualized environment. Application has full control on Guest OS and actual resources. Ex: CentOS.

Hypervisor provides these VMs virtual CPU, virtual memory, virtual NIC. Full virtualization. Full isolation (Q: isolation between VMs? Yes, so that VM is independent to each other) Virtual machine layer can run on bare metal, but also could run on its dedicated VM.

Hosted VM: virtualization layer is within VM itself. Sharing OS across VMs. If I want to install application that deals with file system, I need to install kernel module in actual host OS. From actual bare metal, I am not able to install kernel module on host OS for security reasons. -> limited number of applications can be run on Hosted VM. Ex: container on Linux. Share OS among VMs. If OS is linux, all VMs will be running on Linux. Rebooting VM is faster because you do not need to reboot the whole OS, just reboot the VM itself Hypervisor is within VM because VMM virtualizes Host OS. VMM = Hypervisor = Virtual machine layer, they are all same thing. Guest Apps does not have access to Host OS. Full isolation but not full virtualization. Why not full virtualization? Host OS is being virtualized, no direct access to host OS. Highly scalable: I can create as many VM as I want. Nice synchronization between VM and host OS changes. Only administrator of the whole bare metal system has direct access to host OS, VM never has direct access to VM Hypervisor Types: Type1: bare metal hypervisor. Sits on bare metal hardware. Virtualize hardware resources to the VMs. Ex: Native VM. Type2: Hosted hypervisor. Run on the host VM. Ex: hosted VM. Host OS is unaware of being virtualized. Full virtualization , Type1 Hypervisor and Native VM are the same thing.

XEN: popular virtualization technology used on Amazon. Type 1 hypervisor. (1) Domain 0, or DOM 0. Administrator login to DOM 0 to manage VMs. Not meant to install application on Domain 0. Not meant for users to use Another view of XEN Architecture:

(1) Applications are applications that manages domain 0. Ex: Clustering software, password management software (password of users of domain 0) For users from guest domain, we do not manage their password.

The moment you install XEN Hypervisor, you will automatically create Domain 0 (entry point into XEN). Domain 0 can assign resources such as memory and NIC card to guest domain. Full isolation: one VM cannot access to any data of other VMs. Guest domain is for users such as engineering or HR departm...