Introduction to big data beginners guide pdf PDF

Title Introduction to big data beginners guide pdf
Course Mata Kuliah Pilihan
Institution Universitas Padjadjaran
Pages 15
File Size 596.6 KB
File Type PDF
Total Downloads 56
Total Views 160

Summary

this is some introduction of big data that really help us to learn more about data especially about big data...


Description

An Introduction to

Big Data A Beginner’s Guide

TABLE OF CONTENTS Introduction

01

What is big data?

02

The characteristics of big data

03

Applications of big data

04

Real-life examples of big data implementation

06

Key big data & analytics terms you should know

08

How to build your career in data analytics

10

Get ready to launch your career in data analytics

12

INTRODUCTION Data analytics is the “brain” of some of the biggest and most successful brands of our times. From the big tech giants, Facebook, Google, Amazon, and Netflix to entertainment conglomerates like Disney, to disruptors like Uber and Airbnb, enterprises are increasingly leveraging data analytics to drive innovation, business growth, and profitability. However, it’s not just these big names making the use of data analytics. 2017 marked a crucial year when 53% of organizations across telecom, finance, education, and healthcare were found adopting data analytics — a sharp jump from 17% in 2015. Today, the number has grown massively, with 67% of small businesses spending more than $10K annually on analytics tools and technologies. As businesses grapple with more data than ever, they are increasingly relying on data analytics to gain insights and make informed decisions. This is pushing their demands for skilled specialists who can help them crunch through Big Data, unlock the potentials and opportunities, and predict trends and failures. The beginner’s handbook is aimed at introducing you to the concept of big data, its characteristics, and applications. We’ll also discuss how to get started with a career in big data and the courses you should pursue to move up the career ladder in this emerging field.

1 | www.simplilearn.com

WHAT IS BIG DATA? Big data is an all-inclusive term, representing the enormous volume of complex data sets that companies and governments generate in the present-day digital environment. Big data, typically measured in petabytes or terabytes, materializes from three major sources—transactional data, machine data, and social data. Big data analytics is a comprehensive and systematic analysis of big data, which organizations implement to unearth correlations, hidden patterns, and insights that allow companies to make appropriate business decisions with speed and precision. Big data aids the generation of improved and accurate data leads that enable enterprises to: Reduce operating costs Increase customer retention Gain a competitive advantage Improve their overall business strategy

2 | www.simplilearn.com

THE CHARACTERISTICS OF BIG DATA It is important to discuss the characteristics of Big Data because not all data is Big Data. So, what type of data constitutes ‘Big Data’? Defined using the 5Vs, Big Data characteristics include:

Volume: The amount of data created and collected.

Variability: Refers to inconsistencies sometimes exhibited by data sets.

Velocity: Applies to the data production rate.

Veracity: The knowledge of whether or not the data source is credible.

Variety: Indicates different data formats, such as sensor data, text data, video data, or numeric data. These big data characteristics play a crucial role in quickly unlocking the value of data via big data analytics.

3 | www.simplilearn.com

APPLICATIONS OF BIG DATA This section of the big data handbook will give you a glimpse of how Big Data is transforming key industries, driving competitiveness and performance.

Retail Leading online retail platforms are wholeheartedly deploying big data analytics throughout a customer’s purchase journey, to predict trends, forecast demands, optimize pricing, and identify customer behavioral patterns. Big data analytics is helping retailers implement clear strategies that minimize risk and maximize profit.

Healthcare Big data is revolutionizing the healthcare industry, especially the way medical professionals in the past diagnosed and treated diseases. In recent times, effective analysis and processing of big data by machine learning algorithms provide significant advantages for the evaluation and assimilation of complex clinical data, which prevent deaths and improve the quality of life by enabling healthcare workers to detect early warning signs and symptoms.

Financial Services and Insurance The increased ability to analyze and process big data is dramatically impacting the financial services, banking, and insurance landscape. In addition to using big data for swift detection of fraudulent transactions, lowering risks, and supercharging marketing efforts, few companies are taking the applications to the next levels. Enterprises such as Aviva and Progressive are taking data collection and analytics to the next level, offering discounts on insurance premiums to vehicle owners in exchange for monitoring and studying their activities via incar devices or smartphone applications.

4 | www.simplilearn.com

Manufacturing Thanks to advancements in robotics and automation technologies, modern-day manufacturers are becoming more and more datafocused, heavily investing in automated factories that exploit big data to streamline production and lower operational costs. Top global manufacturers are also integrating sensors into their products, capturing big data to provide valuable insights on product performance and its usage.

Energy To combat the rising costs of oil extraction and exploration difficulties because of economic and political turmoil, the energy industry is turning toward data-driven solutions to increase profitability. Big data is optimizing every process while cutting down energy waste from drilling to exploring new reserves, production, and distribution.

Logistics & Transportation State-of-the-art warehouses use digital cameras to capture stock level data, which, when fed into ML algorithms, facilitates intelligent inventory management with prediction capabilities that indicate when restocking is required. In the transportation industry, leading transport companies now promote the collection and analysis of vehicle telematics data, using big data to optimize routes, driving behavior, and maintenance.

Government Cities worldwide are undergoing large-scale transformations to become “smart”, through the use of data collected from various Internet of Things (IoT) sensors. Governments are leveraging this big data to ensure good governance via the efficient management of resources and assets, which increases urban mobility, improves solid waste management, and facilitates better delivery of public utility services.

5 | www.simplilearn.com

REAL-LIFE EXAMPLES OF BIG DATA IMPLEMENTATION Here are some real-life examples of how top brands are using big data insights to boost data-driven decisions.

Amazon Fresh and Whole Foods American multinational supermarket chain, Whole Foods, and Amazon Fresh, a subsidiary of e-commerce company Amazon.com, are fantastic examples of how big data analytics promotes innovation and improves product development. Whole Foods and Amazon Fresh leverage big data analytics to understand how users buy products and how sellers engage with suppliers. The business-critical insights help these organizations to innovate personalized solutions continually.

Coca-Cola In 2015, US multinational beverage corporation Coca-Cola used big data analytics to develop a datadriven customer loyalty program that significantly helped the company retain its customers. In an interview with ADMA Managing Editor Alicia Tan, Director of Data Strategy and Precision Marketing at Coca-Cola, Justin de Graaf said the organization was successful in collecting critical “first-party” big data, which enabled the corporation to strengthen customer engagement, improve retention, and increase the consumption of both new and existing products.

6 | www.simplilearn.com

Netflix

PepsiCo

CA-based global media services provider, Netflix, implemented big data analytics to enhance its 100-million subscribers’ experience, with targeted advertising and recommendations based on their preferences. To achieve this, the company analyzes massive data sets to gain insights from what their subscribers like, watch, and search.

Food, snack, and beverage corporation, PepsiCo, Inc., relies heavily on big data to efficiently manage its supply chains. The company uses warehouse and POS inventory data to predict and reconcile shipments and manufacturing needs. Here’s what the Customer Supply Chain Analyst at PepsiCo says about the relevance of big data analytics in its supply chain management.

UOB Singaporean multinational banking organization, United Overseas Bank (UOB), applied big data analytics to develop a solid risk management strategy, which allowed UOB to bring down the processing time for risk calculation. Previously, it used to take approximately 18 h, but using big data analytics, the Bank can now assess its risk in a few minutes.

7 | www.simplilearn.com

KEY BIG DATA & ANALYTICS TERMS YOU SHOULD KNOW In this section, we present you with some basic Big Data and analytics terms that you should be familiar with when dealing with this subject.

Descriptive Analytics

Prescriptive Analytics

It is a preliminary stage of data processing that serves to interpret historical data to provide a better understanding of useful information about what has happened and, often, prepare the data for further analysis.

Prescriptive analytics is essentially based on predictive analytics, but it further includes actions and makes data-driven decisions depending on the impacts of various actions.

Geospatial Analytics Predictive Analytics Analytics that involves the processing of recent and historical data used to identify future probabilities and trends.

This type of analytics is used to analyze data about physical objects tied to a geographical location. Examples include GPS, satellite photography, and historical data.

Behavioral Analytics

Anomaly Detection

The type of analytics that uses data about people’s behavior to understand the intent and predict future actions.

Also referred to as ‘Outlier Analysis,’ is a data mining step that involves identifying items or events in a dataset that deviate from its projected pattern or expected behavior. Anomalies can indicate exceptions, exclusions, or contaminants and often deliver vital and actionable information.

Diagnostic Analytics This type of analytics helps to complete root cause analysis, reviewing past performance to provide insights on what happened and why.

8 | www.simplilearn.com

Anonymization

Correlation Analysis

The act of making data anonymous by breaking the links between users in a database and their records in order to prevent the detection of the source of the records.

This is a technique to determine a statistical relationship between variables, often to identify predictive factors among the variables.

Cluster Computing Batch Processing A technique of processing massive data volumes where a batch of transactions is collected over a period of time. Hadoop is based on batch processing of data.

The process of computing, which involves a ‘cluster’ of pooled resources of multiple servers.

NoSQL

Bayes Theorem

It refers to database management systems that are designed to handle large volumes of unstructured data.

This is one of the most important rules of probability theory used in data science and analytics.

Cassandra

Classification Analysis A systematic process for extracting important and relevant information about data and assigning it to a particular group or class.

This is a distributed and open-source NoSQL database management system designed to handle large volumes of data across distributed servers. It is managed by The Apache Software Foundation.

Clustering Analysis This is a means of recognizing similar items and clustering them in order to spot the differences as well as the similarities within the data.

9 | www.simplilearn.com

HOW TO BUILD YOUR CAREER IN DATA ANALYTICS If you are looking to carve your career path in data analysis, there are many data analytics skills to master and relevant tools to acquaint yourself with. Let’s talk about some of them.

Programming R and Python are two common programming languages you should be familiar with when taking up data analyst roles. While R supports statistical computing and graphics, Python is a good language for large projects due to its ease of use. Other useful languages include SAS, Java, MATLAB, SQL, Tensorflow, Scala, and Julia.

Math and Statistics When it’s the subject of data, math and statistics are bound to be on your list. Many statistical skills are necessary to succeed as a data analyst, including the formation of data sets, a basic knowledge of mean, median, mode, SD and other variables, advanced knowledge of linear algebra, and matrices, relational algebra, CAP theorem, framing data, and series.

Data Processing Platforms Data analysts often need to use big data processing platforms like Hadoop and Apache Spark for crunching large datasets. The knowledge of these frameworks is necessary to gather data from multiple devices, and scrub, model, and interpret the data sets to gain more in-depth insight into trends and relationships.

10 | www.simplilearn.com

Visualization The insights derived from data analysis amount to nothing if they are not presented clearly, and in a way that’s understood by the stakeholders. Working knowledge of Tableau, one of the most widely used data visualization tools, is a great skill to have for a data analyst.

Machine Learning The heart of any large-scale data analysis lies in automation. Machine Learning (ML) enables computers to learn and perform tasks without human intervention. Data analysts should know how to create, apply, and train the most appropriate models and algorithms to datasets to find solutions for specific problems. Apart from these skills, a questioning mind and genuine interest in working with data, numbers, and technology will take you further in this field. The ability to work independently and be a team player, along with a good understanding of visual encoding tools, like asggplot, matplotlib, d3.js, and seaborne, are prized qualities that hiring companies look for in aspiring data analysts.

GET READY TO LAUNCH YOUR CAREER IN DATA ANALYTICS As businesses race to rapidly deploy big data analytics, the demand for Database Developers, Data Analysts, Data Scientists, Big Data Engineers, Database Administrators, and Data Modelers is on the rise. To land a dream job in this domain, a bachelor’s degree in information management, mathematics, computer science, or statistics can prove to be a perfect foundation, but it is not sufficient. A more specialized certification offers an edge to an aspiring candidate’s resume, showcasing their highly sought-after data analytics skills. This big data handbook recommends Simplilearn’s Big Data Hadoop Certification Training Course, which helps learners like you become industryready. The training course is designed by data science specialists and industry experts to help you develop a strong portfolio of big data skills, including Spark SQL, Spark RDD optimization techniques, parallel processing, functional programming, and real-time data processing, to name a few. Aligned to Cloudera’s CCA175 exam, the Simplilearn certification course offers ten hours of self-paced video, 48 h of instructor-led training, and four real-life, industry-based projects using Big Data Stack, Hive, and Hadoop. Register now to boost your career opportunities in big data analytics. Other related courses we offer: Big Data Hadoop Certification Training Course MongoDB Certification Training Course Apache Scala and Spark Certification Training

12 | www.simplilearn.com

INDIA Simplilearn Solutions Pvt Ltd. # 53/1 C, Manoj Arcade, 24th Main, Harlkunte 2nd Sector, HSR Layout Bangalore - 560102 Call us at: 1800-212-7688

USA Simplilearn Americas, Inc. 201 Spear Street, Suite 1100, San Francisco, CA 94105 United States Phone No: +1-844-532-7688

www.simplilearn.com...


Similar Free PDFs