Chapter 9 Essay PDF

Title	Chapter 9 Essay
Course	Database Management
Institution	Arkansas State University
Pages	10
File Size	88.1 KB
File Type	PDF
Total Downloads	11
Total Views	156

Preview

CLICK TO PREVIEW PDF

Summary

Chapter 9 Essay...

Description

122) Discuss the four key terms (subject-oriented, integrated, time-variant, and nonupdateable) related to data warehousing. Answer: A data warehouse is subject-oriented, meaning that it is organized around key subjects of the enterprise. It is also integrated, meaning that the warehouse holds verified data that is the gold standard. It is also time-variant, meaning that we keep track of time using a time dimension. Finally, a data warehouse is nonupdateable, meaning that data are loaded from operational systems and not users. LO: 9.2: Give two important reasons why an "information gap" often exists between an information manager's need and the information generally available. Difficulty: Moderate Classification: Synthesis AACSB: Reflective Thinking 123) What is data mining, and what are its goals? Answer: Data mining is knowledge discovery using a sophisticated blend of statistics, artificial intelligence, and computer graphics. The goals of data mining are: 1. To explain some observed event or condition 2. To confirm a hypothesis 3. To analyze data for new or unexpected relationships LO: 9.2: Give two important reasons why an "information gap" often exists between an information manager's need and the information generally available. Difficulty: Easy Classification: Synthesis AACSB: Reflective Thinking

1 Copyright © 2019 Pearson Education, Inc.

124) Discuss the history of data warehousing. Answer: Several key advances in information systems led to the emergence of data warehousing. In particular, as the relational model came to be popular and computer hardware advanced, data warehousing had a platform to build from. In addition, advances in middleware products as well as the emergence of end-user computing contributed to this. Perhaps the major player in the development of data warehousing was the separation of operational data from informational data. LO: 9.3: List two major reasons most organizations today need data warehousing. Difficulty: Moderate Classification: Synthesis AACSB: Reflective Thinking 125) What types of applications would benefit from real-time data warehousing? Answer: Since the goal of real-time data warehousing is to take immediate action at touch points with customers, any type of E-commerce application would benefit–for example, if you wanted to design an application to send an email message when a user abandons a shopping cart. Another example would be fraud-detection with credit card payment processing. Any type of just-in-time system, such as a transportation or scheduling system, would benefit by using a realtime data warehouse. LO: 9.3: List two major reasons most organizations today need data warehousing. Difficulty: Easy Classification: Synthesis AACSB: Reflective Thinking 126) Why are data mining applications growing rapidly? Answer: Data mining applications are growing rapidly because of the growth of data in data marts and data warehouses, the continual introduction of new data mining tools, and increasing competitive pressures. LO: 9.3: List two major reasons most organizations today need data warehousing. Difficulty: Moderate Classification: Synthesis AACSB: Reflective Thinking 127) Explain the difference between operational and informational systems as well as the primary factors that contribute to the need for separation. Answer: An operational system is built to run a business in real-time using current data. An informational system is designed to support decision making based on historical point-in-time data. There are three primary factors which dictate the need to separate operational and informational systems: 1. A data warehouse centralizes data that may reside on several different systems. 2. A properly designed data warehouse improves the quality and consistency of data. 3. Contention for resources can be reduced by separating the informational system from the operational system. LO: 9.4: Name and briefly describe the three levels in a data warehouse architecture. Difficulty: Moderate Classification: Synthesis AACSB: Reflective Thinking 2 Copyright © 2019 Pearson Education, Inc.

128) Explain the four basic steps to build an independent data mart. Answer: 1. Extraction–data are extracted from internal and external source system files and databases. 2. Transformation and loading–data are transformed, integrated and loaded. 3. At this point, data warehouse is a set of physically distinct databases. 4. Users access the data warehouse using query language and tools. LO: 9.4: Name and briefly describe the three levels in a data warehouse architecture. Difficulty: Moderate Classification: Synthesis AACSB: Reflective Thinking 129) Discuss the limitations of the independent data mart architecture. Answer: There are several limitations to the independent data mart including: 1. A separate ETL process for each data mart. 2. Inconsistencies between data marts because they are developed with different technologies sometimes. 3. It is difficult to relate data across data marts. 4. Scaling costs are excessive. Every new application that creates a separate data mart repeats the ETL process. 5. It is very expensive to make the data marts consistent. LO: 9.4: Name and briefly describe the three levels in a data warehouse architecture. Difficulty: Moderate Classification: Synthesis AACSB: Reflective Thinking 130) Explain the dependent data mart and operational data store architecture. Answer: The dependent data mart and operational data store architecture address some of the limitations of the independent data mart. Data is loaded from source data systems into a data staging area (called the operational data store). As part of this load process, the data are cleaned, reconciled, derived and transformed. The enterprise data warehouse is then loaded from the operational data store. Each individual data mart is loaded from the enterprise data warehouse. This is different from an independent data mart architecture, where each data mart is loaded from the data staging area. By having an enterprise data warehouse, the organization can maintain control of data quality and integrity by having one "gold standard," which is the enterprise data warehouse. LO: 9.4: Name and briefly describe the three levels in a data warehouse architecture. Difficulty: Moderate Classification: Synthesis AACSB: Reflective Thinking

3 Copyright © 2019 Pearson Education, Inc.

131) What is a logical data mart and what makes this architecture unique? Answer: A logical data mart is created by a relational view of a data warehouse. Some of the unique characteristics include: 1. Rather than having separate data marts, one can have several logical data marts which are only relational views of one physical data warehouse. 2. The data warehouse is used for cleansing and transformation rather than a separate staging area. 3. New data marts can be created quickly. There is no need to create a new physical database or write any loading routines. 4. The logical data mart is always up to date because it is a view of the warehouse. The data warehouse only needs to be updated, since views are populated when used. LO: 9.4: Name and briefly describe the three levels in a data warehouse architecture. Difficulty: Moderate Classification: Synthesis AACSB: Reflective Thinking 132) Discuss the three-layer data architecture. Answer: The three-layer data architecture describes how we view data in the data warehouse. The architecture is characterized by operational data, which is stored in various operational systems; reconciled data, which are stored in the enterprise data warehouse. Reconciled data are intended to be the single source for accurate data used by all decision support applications. Derived data are used by the data marts. It has been selected, formatted and aggregated for decision support applications. LO: 9.4: Name and briefly describe the three levels in a data warehouse architecture. Difficulty: Moderate Classification: Synthesis AACSB: Reflective Thinking 133) What is the role of metadata in the three-layer data architecture? Answer: Metadata describe other data. There are three types of metadata used in the three-layer architecture: 1. Operational–describes the data in the operational systems that feed the enterprise data warehouse. May exist in different formats and is often poor quality. 2. Enterprise data warehouse metadata describe the reconciled data layer as well as the rules for extracting, transforming, and loading operational data into reconciled data. 3. Data mart metadata–describe the derived data layer and the rules for transforming reconciled data to derived data. LO: 9.4: Name and briefly describe the three levels in a data warehouse architecture. Difficulty: Easy Classification: Synthesis AACSB: Reflective Thinking

4 Copyright © 2019 Pearson Education, Inc.

134) Discuss the difference between transient data and periodic data. Answer: Transient data are data in which changes to existing records are written over previous records. Periodic data are data which are never physically altered or deleted once added to the store. LO: 9.4: Name and briefly describe the three levels in a data warehouse architecture. Difficulty: Easy Classification: Synthesis AACSB: Reflective Thinking 135) What is OLAP and what types of OLAP tools are available? Answer: OLAP stands for On-Line Analytical Processing. It is the use of a set of reporting and querying tools which allows the user to use simple windowing techniques to analyze multidimensional views of their data. The multi-dimensional view of the data can be thought of as a data cube. There are several variations of OLAP including: ROLAP (Relational OLAP) which uses SQL and view the database as a traditional relational database. MOLAP (Multi-dimensional OLAP) creates a data store where the data is viewed in its multi-dimensional format. Some less common tools are DOLAP, which builds OLAP into the query language, and HOLAP, which offers both multi-dimensional as well as relation query language access to the data. LO: 9.4: Name and briefly describe the three levels in a data warehouse architecture. Difficulty: Difficult Classification: Synthesis AACSB: Reflective Thinking 136) Explain what the star schema is. Answer: The star schema is a simple database design. Dimensional (or descriptive) data are separated from fact or event data. The star schema consists of a fact table, which contains event data as well as links to dimensions. Dimension tables contain descriptive information. For example, we might have a dimension for sales territory which would have a unique ID as well as several levels of granularity. There might be a territory description, state, city, zip code, etc. The dimension key is also in the fact table. In this way, it is fairly simple to look up information and do some drill-down type queries. LO: 9.5: Describe the two major components of a star schema. Difficulty: Easy Classification: Synthesis AACSB: Reflective Thinking

5 Copyright © 2019 Pearson Education, Inc.

137) What is the surrogate key rule for the star schema, and what are the main reasons for this rule? Answer: Each and every key used to join the fact table with a dimension table should be a surrogate key (meaning system-assigned) and not a key which uses a business value. The reasons that a surrogate key is required are the following: 1. Slowly, over time, business keys change. A surrogate key allows us to handle changing business keys easily. 2. Surrogate keys are shorter and simpler. 3. Surrogate keys can be the same length and format, regardless of the business dimension. LO: 9.5: Describe the two major components of a star schema. Difficulty: Difficult Classification: Synthesis AACSB: Reflective Thinking 138) Explain a situation where multiple fact tables may be necessary. Answer: Multiple fact tables are useful in situations where different groups of users may need different sets of facts. One example would be a fact table to keep track of when a product is manufactured and another to keep track of when it is sold to a wholesaler. In this case, we may have two shared dimensions: date and product. These are called conformed dimensions. LO: 9.7: Design a data mart using various schemes to normalize and denormalize dimensions and to account for fact history, hierarchical relationships between dimensions, and changing dimension attribute values. Difficulty: Difficult Classification: Synthesis AACSB: Reflective Thinking 139) When would it be appropriate to use factless fact tables? Answer: There are some situations where we would want to just keep keys to dimensions but no facts. One case would be a data warehouse to keep track of events, such as attendance in a course. Another case would be coverage, taking inventory of the set of possible occurrences. LO: 9.7: Design a data mart using various schemes to normalize and denormalize dimensions and to account for fact history, hierarchical relationships between dimensions, and changing dimension attribute values. Difficulty: Moderate Classification: Synthesis AACSB: Reflective Thinking

6 Copyright © 2019 Pearson Education, Inc.

140) What are some approaches to handling slowly changing dimensions? Answer: Dimensional data changes slowly over time. There are three possible ways to handle slowly changing dimensions: 1. Overwrite the current value with the new value. This is not an acceptable solution, since it eliminates the description of the past which might be needed to interpret some data. 2. If we know how many changes there will be, make a multi-valued attribute which will contain old values as dimensions change. This might make queries fairly complex, though. 3. Create a new dimensional table row each time the dimensional object changes. The surrogate key will contain the date. LO: 9.7: Design a data mart using various schemes to normalize and denormalize dimensions and to account for fact history, hierarchical relationships between dimensions, and changing dimension attribute values. Difficulty: Difficult Classification: Synthesis AACSB: Reflective Thinking 141) What are some of the approaches to data integration? Answer: Data integration creates a unified view of business data. There are three approaches to data integration. The first approach, which is widely used in data warehousing, is consolidation. In this approach, data are loaded and transformed into a dataset useful for reporting or decision support. Another approach, data federation, provides a virtual view of integrated data without actually bringing the data into one database. The third approach, data propagation, duplicates data across databases. LO: 9.10: Describe the three types of data integration approaches. Difficulty: Moderate Classification: Synthesis AACSB: Analytical Thinking 142) Discuss the ETL process. Answer: The goal of the ETL process is data reconciliation. This data reconciliation occurs in two stages: when the data is initially loaded and when updates occur. There are five steps to the process of reconciliation (called the ETL process): 1. Mapping and metadata management–here the data needed for the warehouse are mapped back to the data in the operational system. 2. Extract–in this step, we capture any relevant data from the source system. 3. Cleanse–in this step, the data are cleansed so that things such as misspelled names, duplicate data, etc. are removed. 4. Load and Index–here the cleansed data are loaded into the data warehouse and appropriate indexes are created. LO: 9.11: Describe the four steps and activities of the Extract, Transform, and Load (ETL) process for data integration for a data warehouse. Difficulty: Moderate Classification: Synthesis AACSB: Reflective Thinking

7 Copyright © 2019 Pearson Education, Inc.

143) What are the characteristics of data after ETL? Answer: The goal of ETL is to provide a single, authoritative source for data that support decision making. Ideally, the data layer will have the following characteristics: 1. Detailed, providing maximum flexibility 2. Historical 3. Normalized–fully normalized to third normal form 4. Comprehensive 5. Timely 6. Quality controlled LO: 9.11: Describe the four steps and activities of the Extract, Transform, and Load (ETL) process for data integration for a data warehouse. Difficulty: Moderate Classification: Synthesis AACSB: Information Technology 144) What is the difference between an incremental extract and a static extract? Answer: A static extract captures a snapshot of source data at a specific point in time. An incremental extract captures only the changes in the source data that have occurred since the last snapshot. LO: 9.11: Describe the four steps and activities of the Extract, Transform, and Load (ETL) process for data integration for a data warehouse. Difficulty: Moderate Classification: Synthesis AACSB: Information Technology 145) What are some of the errors and inconsistencies in data that the cleansing process is looking to rectify? Answer: There are many errors and inconsistencies in data which the cleansing process needs to take care of. Some of these are: - misspelled names and addresses - old formats for addresses - impossible or invalid dates of birth - fields used for purposes not intended - missing data - mismatched addresses - duplicate data - different primary keys across sources LO: 9.11: Describe the four steps and activities of the Extract, Transform, and Load (ETL) process for data integration for a data warehouse. Difficulty: Moderate Classification: Synthesis AACSB: Reflective Thinking

8 Copyright © 2019 Pearson Education, Inc.

146) In the load and index stage of ETL, what is the difference between refresh mode and update mode? Answer: Refresh mode is an approach to filling a data warehouse that involves bulk rewriting of the target data at periodic intervals. Initially, the warehouse is filled. Then, at periodic intervals, the warehouse is rewritten, replacing the previous contents. Update mode, on the other hand, only writes changes to the source data without overwriting old data. LO: 9.11: Describe the four steps and activities of the Extract, Transform, and Load (ETL) process for data integration for a data warehouse. Difficulty: Moderate Classification: Synthesis AACSB: Reflective Thinking 147) Discuss data governance and what needs to be included in a data governance program. Answer: Data governance is a set of processes and procedures aimed at managing the data within an organization with an eye towards availability, integrity, and compliance with regulations. Data governance oversees security by providing data access policies. It also provides a mandate for dealing with data issues. A data governance program generally includes the following: 1. Sponsorship from senior management and business units 2. A data steward manager 3. Data stewards 4. A governance committee LO: 9.12: Explain the various forms of data transformations needed to prepare data for a data warehouse. Difficulty: Moderate Classification: Synthesis AACSB: Reflective Thinking 148) What is data transformation? Answer: Data transformation involves converting data in the format of the operational system into the format of the data warehouse. Data is accepted from the data capture component (after data scrubbing), is mapped to the format of the reconciled data layer, and then passed to the load and index component. LO: 9.12: Explain the various forms of data transformations needed to prepare data for a data warehouse. Difficulty: Moderate Classification: Synthesis AACSB: Reflective Thinking

9 Copyright © 2019 Pearson Education, Inc.

149) What are the record-level data transformation functions? Answer: Record-level data transformation functions operate on a set of records, such as a file or a table. The most important record level functions are: 1. Selection–this is the process of partitioning data according to some pre-defined conditions 2. Joining–combines the data from several tables into a single view 3. Normalization–decomposing relations with anomalies 4. Aggregation–process of transferring data from a detailed level to a summary level LO: 9.12: Explain the various forms of data transformations needed to prepare data for a data warehouse. Di...