Big Data Chapter 1 Questions with Answers PDF

Title Big Data Chapter 1 Questions with Answers
Author Daniel Sabrel
Course Big Data Technologies (Elective II)
Institution Tribhuvan Vishwavidalaya
Pages 3
File Size 100.7 KB
File Type PDF
Total Downloads 85
Total Views 151

Summary

Big Data Chapter One: Introduction to Big Data, Solutions to IoE Old questions...


Description

Chapter 1: Introduction to Big Data 1. What are the attributes of Big Data? Volume - size Velocity - speed of creation and processing Variety - types of content Variability - areas of use Veracity - accurateness Visualization & Value - presentation and some positive output 2. Define DFS. Why distributed computing is necessary for big data? [5] 2075 OR Explain with example about the distributed system in Big Data. [8] 2074 OR Explain the role of Distributed computing in Big Data? [5] B2073 - Distributed system is the system that is composed of autonomous computers, connected through a network and middleware for coordinating and sharing the resources, in such a way that the whole system looks as a single integrating computer facility for the user. - Big data consists of massive amount of data which can’t be stored in a single computer or node. - So, there is necessity for big data to be distributed across multiple nodes. - Distributed system helps to solve big data problems without the requirement of a single resource capable to handle it. - This makes the big data analytics cost efficient and performance improvement. - Distributed computing helps big data analytics in: ▪ Time constraint ▪ Response time ▪ Data Availability ▪ Scalability ▪ System performance 3. Clock synchronization in DFS may be the big challenge. How this clock synchronization problem can be solved? [10] 2075 4. What is the role of Data Scientist? [4] 2074 i. Data capture and Interpretation: handle the raw data using the latest technologies and present the acquired knowledge in understandable manner. ii. New analytical techniques: make discoveries about business processes iii. Community of Science: Communicate the results and suggest implications for new business direction. iv. Provide Solution: develop data analysis solutions using modeling/analysis methods and languages v. Improve data management and performance: 5. Why do we need data analytics process? [5] B2073 We need data analytics process to: i. Avoid sampling and aggregation ii. Reduce data movement and replication

iii. Bring the analytics as close as possible to the data iv. Optimize computation speed Example: Reducing Patient Readmission Rates (Medical data), Analytics to Reduce the Student Dropout Rate (Educational Data) The process consists of: Phase 1 - Discovery Phase: a) Acquisition – acquiring and mechanism b) Pre-processing – structured & into consistent, standard format, trust c) Integration – consolidate, relevant, eliminate redundant, clean and distill d) Analysis – search for relationships, classification, association, descriptions, predictions, interpretations e) Interpretation – review to understand result, retrace, provide foundation and check trustworthiness f) Algorithm/model – classification, régression, segmentation, association, sequence analysis algorithm Phase 2 – Application: - the algorithmic model generated as a result of data analytics is used in the real domain 6. What are the current trends in big data analytics? What are the technical challenges and characteristics of big data? [10] 2073 Big Data analytics is the process of analysing the huge volume of variety of data so as to discover the hidden patterns and other essential information that can be used for decision making process. --------------------------------------remaining------------------------------------------------Technical Challenges: i. Big data consists of huge amount of data sets. The main challenge evolves in identifying the relevant data from such mass of data and determining how to make best use of the relevant data. ii. Even though the data and analysis method are determined, there is struggle in finding the appropriate and skilled manpower capable of working with both new technology and data analysis for relevant business insight. iii. The variety of data types and formats may generate hindrance in the data analysis as it is very difficult to connect variety of data points for a single insight. Data integration is the important aspect of effective big data analysis, but it is also one of the major challenges that prevails. iv. The technology landscape in the data world is evolving very fast. So, efficient handling of the technology along with adaptation to cope with technology is must for big data analysis. v. The organizational structure for big data project management should be apart from other project management tasks because this field is very much different and needs a strong and motivational project management team. vi. During the big data analysis, the data analysts do not get full benefits from the data they have, due to the security concerns about data protection.

vii. The technology infrastructure necessary to work with big data is very expensive....


Similar Free PDFs