HIGH AVAILABILITY IT SERVICES PDF

Title HIGH AVAILABILITY IT SERVICES
Author Lídia Leta
Pages 527
File Size 4.2 MB
File Type PDF
Total Downloads 299
Total Views 515

Summary

High Availability IT Services High Availability IT Services Terry Critchley CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2015 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No ...


Description

High Availability IT Services

High Availability IT Services

Terry Critchley

CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2015 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20141020 International Standard Book Number-13: 978-1-4822-5591-1 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com

To my wife, Chris; children, Philip and Helen; and the rest of my now extended family—Matt, Louise, and grandchildren, Ava, Lucy, and Ben

Contents Foreword ........................................................................................................................... xxv Preface ............................................................................................................................. xxvii Acknowledgments .......................................................................................................... xxxiii Author ..............................................................................................................................xxxv

SeCtion i

An AvAilAbility Primer

1 Preamble: A View from 30,000 Feet .............................................................................3 Do You Know. . .? ............................................................................................................... 3 Availability in Perspective .................................................................................................. 4 Murphy’s Law of Availability .................................................................................... 4 Availability Drivers in Flux: What Percentage of Business Is Critical? ...................... 4 Historical View of Availability: The First 7 × 24 Requirements?............................... 6 Historical Availability Scenarios ............................................................................... 8 Planar Technology ........................................................................................... 8 Power-On Self-Test .......................................................................................... 9 Other Diagnostics ........................................................................................... 9 Component Repair .......................................................................................... 9 In-Flight Diagnostics ......................................................................................10 Summary ..........................................................................................................................10

2 Reliability and Availability ........................................................................................13

Introduction to Reliability, Availability, and Serviceability ...............................................13 RAS Moves Beyond Hardware ................................................................................14 Availability: An Overview........................................................................................15 Some Definitions .....................................................................................................15 Quantitative Availability ..........................................................................................16 Availability: 7 R’s (SNIA) ........................................................................................16 Availability and Change....................................................................................................18 Change All around Us .............................................................................................19 Software: Effect of Change ..................................................................................... 20 Operations: Effect of Change ................................................................................. 20 Monitoring and Change ......................................................................................... 20 vii

viii



Contents

Automation: The Solution? .............................................................................................. 22 Data Center Automation ........................................................................................ 22 Network Change/Configuration Automation ......................................................... 23 Automation Vendors ............................................................................................... 23 Types of Availability ........................................................................................................ 24 Binary Availability .................................................................................................. 24 Duke of York Availability ........................................................................................25 Hierarchy of Failures .............................................................................................. 26 Hierarchy Example ................................................................................................. 26 State Parameters...................................................................................................... 27 Types of Nonavailability (Outages) .................................................................................. 28 Logical Outage Examples ....................................................................................... 29 Summary .................................................................................................................31 Planning for Availability and Recovery .............................................................................31 Why Bother?............................................................................................................31 What Is a Business Continuity Plan? .......................................................................31 What Is a BIA? ....................................................................................................... 32 What Is DR?............................................................................................................33 Relationships: BC, BIA, and DR ......................................................................................33 Recovery Logistics ...................................................................................................33 Business Continuity................................................................................................ 34 Downtime: Who or What Is to Blame? ........................................................................... 34 Elements of Failure: Interaction of the Wares ...................................................................35 Summary ......................................................................................................................... 37 DR/BC Source Documents ............................................................................................. 37

3 Reliability: Background and Basics ...........................................................................39

Introduction .................................................................................................................... 39 IT Structure—Schematic ....................................................................................... 40 IT Structure—Hardware Overview........................................................................ 40 Service Level Agreements ................................................................................................. 42 Service Level Agreements: The Dawn of Realism.................................................... 42 What Is an SLA?..................................................................................................... 43 Why Is an SLA Important? ..................................................................................... 43 Service Life Cycle............................................................................................................. 43 Concept of User Service ....................................................................................................45 Elements of Service Management .....................................................................................45 Introduction ............................................................................................................45 Scope of Service Management ................................................................................ 46 User Support........................................................................................................... 46 Operations Support ................................................................................................ 46 Systems Management ..............................................................................................47 Service Management Hierarchy ...............................................................................47 The Effective Service ............................................................................................... 48 Services versus Systems ........................................................................................... 49 Availability Concepts ....................................................................................................... 49

Contents



ix

First Dip in the Water............................................................................................. 49 Availability Parameters ........................................................................................... 50 Summary ..........................................................................................................................52

4 What Is High Availability? .........................................................................................53

IDC and Availability ........................................................................................................53 Availability Classification ................................................................................................. 54 Availability: Outage Analogy .................................................................................. 56 A Recovery Analogy ...................................................................................... 56 Availability: Redundancy.........................................................................................57 Availability: Fault Tolerance ....................................................................................57 Sample List of Availability Requirements.................................................................57 System Architecture ........................................................................................57 Availability: Single Node ............................................................................... 58 Dynamic Reconfiguration/Hot Repair of System Components ..................... 58 Disaster Backup and Recovery....................................................................... 58 System Administration Facilities ....................................................................59 HA Costs Money, So Why Bother? ................................................................59 Cost Impact Analysis ......................................................................................59 HA: Cost versus Benefit.......................................................................................... 60 Penalty for Nonavailability ..................................................................................... 60 Organizations: Attitude toward HA .............................................................. 60 Aberdeen Group Study: February 2012 ..........................................................61 Outage Loss Factors (Percentage of Loss) ...................................................... 62 Software Failure Costs ................................................................................... 62 Assessing the Cost of HA .............................................................................. 64 Performance and Availability .................................................................................. 64 HA Design: Top 10 Mistakes .........................................................................65 The Development of HA ...................................................................................................65 Servers .....................................................................................................................65 Systems and Subsystems Development ....................................................................67 Production Clusters ........................................................................................67 Availability Architectures ................................................................................................. 69 RAS Features .......................................................................................................... 69 Hot-Plug Hardware ....................................................................................... 69 Processors ...................................................................................................... 69 Memory ......................................................................................................... 70 Input/Output ................................................................................................ 71 Storage........................................................................................................... 71 Power/Cooling ............................................................................................... 71 Fault Tolerance .............................................................................................. 72 Outline of Server Domain Architecture ........................................................................... 72 Introduction ........................................................................................................... 72 Domain/LPAR Structure........................................................................................ 73 Outline of Cluster Architecture ........................................................................................74

x



Contents

Cluster Configurations: Commercial Cluster...........................................................74 Cluster Components ................................................................................................74 Hardware........................................................................................................74 Software ........................................................................................................ 75 Commercial LB ..............................................................................................76 Commercial Performance .............................................................................. 77 Commercial HA ............................................................................................ 77 HPC Clusters ......................................................................................................... 77 Generic HPC Cluster .................................................................................... 77 HPC Cluster: Oscar Configuration ............................................................... 78 HPC Cluster: Availability .............................................................................. 79 HPC Cluster: Applications ............................................................................ 79 HA in Scientific Computing.......................................................................... 80 Topics in HPC Reliability: Summary ............................................................ 80 Errors in Cluster HA Design ..........................................................................81 Outline of Grid Computing............................................................................................. 82 Grid Availability ..................................................................................................... 82 Commercial Grid Computing ................................................................................ 83 Outline of RAID Architecture ........................................................................................ 83 Origins of RAID .................................................................................................... 83 RAID Architecture and Levels ............................................................................... 84 Hardware....................................................................................................... 84 Software .........................................................................................................85 Hardware versus Software RAID ...................................................................85 RAID Striping: Fundamental to RAID .........................................................85 RAID Configurations............................................................................................. 86 RAID Components ....................................................................................... 86 ECC .............................................................................................................. 86 Parity ............................................................................................................. 87 RAID Level 0 ................................................................................................ 87 RAID Level 1 ................................................................................................ 87 RAID Level 3 ................................................................................................ 87 RAID Level 5 ................................................................................................ 88 RAID Level 6 ................................................................................................ 88 RAID Level 10 .............................................................................................. 88 RAID 0 + 1 Schematic ..............


Similar Free PDFs