Data Warehouse & Mining Viva Questions PDF

Title	Data Warehouse & Mining Viva Questions
Author	Durvank Deorukhkar
Course	Data Warehousing & Mining
Institution	University of Mumbai
Pages	5
File Size	149 KB
File Type	PDF
Total Downloads	19
Total Views	142

Preview

CLICK TO PREVIEW PDF

Summary

Data Warehouse & Mining Viva Questions....

Description

1. What is Data warehousing? A Data Warehousing (DW) is process for collecting and managing data from varied sources to provide meaningful business insights. A Data warehouse is typically used to connect and analyze business data from various sources.

2. What is data warehouse? A data warehouse is an electronic storage of an organization’s historical data for the purpose of reporting, analysis and data mining or knowledge discovery.

3. What Is Data Purging? The process of cleaning junk data is termed as data purging. Purging data would mean getting rid of unnecessary NULL values of columns. This usually happens when the size of the database gets too large.

4. What Are the Different Problems That "data Mining" Can Solve? • Data mining helps analysts in making faster business decisions which increases revenue with lower costs. • Data mining helps to understand, explore, and identify patterns of data. • Data mining automates process of finding predictive information in large databases. • Helps to identify previously hidden patterns.

5. What is Dimension Table? A dimension table is a table in star schema and snowflake schema of a data warehouse. A dimension table stores attributes, or dimensions, that describe the objects in a fact table.

6. What is Fact Table? A fact table is the central table in a star schema and snowflake schema of a data warehouse. Fact table contains the measurement of business processes, and it contains foreign keys for the dimension tables.

7. What is data mining? Data mining is the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis.

8. Difference between OLAP and OLTP OLAP OLAP is an acronym Online analytical processing Consists of historical data from various Databases. OLAP has long transactions. Based on SELECT commands to aggregate data for reporting Complex queries.

OLTP OLTP is an acronym for Online transaction processing Consists only operational current data. OLTP has short transactions. Based on INSERT, UPDATE, DELETE commands Simpler queries.

9. What is ETL? ETL is abbreviated as Extract, Transform and Load. ETL is a software which is used to reads the data from the specified data source and extracts a desired subset of data. Next, it transforms the data using rules and lookup tables and convert it to a desired state. Then, load function is used to load the resulting data to the target database.

10. What is Datamart A data mart is a subset of data stored within the overall data warehouse, for the needs of a specific team, section, or department within the business enterprise. Data marts make it much easier for individual departments to access key data insights more quickly and helps prevent departments within the business organization from interfering with each other’s data.

11. What is the difference between Datawarehouse and OLAP? Datawarehouse is a place where the whole data is stored for analyzing, but OLAP is used for analyzing the data, managing aggregations

12. What is Star Schema? A star schema is a data warehousing architecture model where one fact table references multiple dimension tables, which, when viewed as a diagram, looks like a star with the fact table in the center and the dimension tables radiating from it. It is the simplest among the data warehousing schemas and is currently in wide use.

13. What is Snowflake Schema The snowflake schema is an extension of a star schema. The main difference is that in this architecture, each dimension table can be linked to one or more-dimension tables as well. The aim is to normalize the data.

14. What is Metadata Metadata is defined as data about the data. The metadata contains information like number of columns used, fix width and limited width, ordering of fields and data types of the fields.

15. What is a Decision Tree Algorithm? Decision tree is a supervised learning algorithm used for classification. It uses a flowchart like a tree structure to show the predictions that result from a series of feature-based splits. It starts with a root node and ends with a decision made by leaves.

16. What is Naïve Bayes Algorithm? Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for solving classification problems. It is mainly used in text classification that includes a high-dimensional training dataset. It is one of the simple and most effective Classification algorithms.

17. Explain clustering algorithm. Clustering algorithm is used to group sets of data with similar characteristics also called as clusters. These clusters help in making faster decisions and exploring data. The algorithm first identifies relationships in a dataset following which it generates a series of clusters based on the relationships. The process of creating clusters is iterative. The algorithm redefines the groupings to create clusters that better represent the data.

18. Explain Association algorithm in Data mining? Association rule mining is a procedure which is meant to find frequent patterns, correlations, associations, or causal structures from data sets found in various kinds of databases such as relational databases, transactional databases, and other forms of data repositories. Given a set of transactions, association rule mining aims to find the rules which enable us to predict the occurrence of a specific item based on the occurrences of the other items in the transaction.

19. Differentiate Star Schema and Snowflake Schema Star Schema It contains a fact table surrounded by dimension tables. Simple DB Design. High level of Data redundancy Denormalized Data structure and query also run faster. Single Dimension table contains aggregated data

Snowflake Schema One fact table surrounded by dimension table which are in turn surrounded by dimension table Very Complex DB Design. Very low-level of data redundancy Normalized Data Structure. Data Split into different Dimension Tables.

20. What are the characteristics of data warehouse? 







Subject Oriented o A data warehouse is subject oriented because it provides information around a subject rather than the organization's ongoing operations. o These subjects can be product, customers, suppliers, sales, revenue, etc. A data warehouse does not focus on the ongoing operations, rather it focuses on modelling and analysis of data for decision making. Integrated o A data warehouse is constructed by integrating data from heterogeneous sources such as relational databases, flat files, etc. o This integration enhances the effective analysis of data. Time Variant o The data collected in a data warehouse is identified with a particular time period. o The data in a data warehouse provides information from the historical point of view. Non-volatile o Non-volatile means the previous data is not erased when new data is added to it.

21. What are typical data mining techniques? 1. Classification: This analysis is used to retrieve important and relevant information about data, and metadata. This data mining method helps to classify data in different classes. 2. Clustering: Clustering analysis is a data mining technique to identify data that are like each other. This process helps to understand the differences and similarities between the data. 3. Regression: Regression analysis is the data mining method of identifying and analyzing the relationship between variables. It is used to identify the likelihood of a specific variable, given the presence of other variables. 4. Association Rules: This data mining technique helps to find the association between two or more Items. It discovers a hidden pattern in the data set....