Chapter 2 Business Problems and Data Science Solutions PDF

Title	Chapter 2 Business Problems and Data Science Solutions
Author	Eric Hutchinson
Course	Mba Internship
Institution	Clemson University
Pages	5
File Size	177.7 KB
File Type	PDF
Total Downloads	78
Total Views	139

Preview

CLICK TO PREVIEW PDF

Summary

Download Chapter 2 Business Problems and Data Science Solutions PDF

Description

10/12/2018

Data Science Solutions to Business Problems by Provost and Fawcett - Noteshelf

Data Science Solutions to Business Problems by Provost and Fawcett Published on September 11, 2017 Author Kristoffer (http://noteshelf.org/author/noteshelf/)

This note is part of a collection of notes from the book “Data Science for Business by Provost and Fawcett” (http://noteshelf.org/data-science-businessprovost-fawcett/) A critical skill in data science is the ability to decompose a data analytics problem into pieces such that each piece matches a known task for which tools are available.

The following is a list of the most fundamental tasks described by Provost and Fawcett.

1. Classification and class probability Is an attempt to predict, for each individual in a population, which of a set of classes this individual belongs to. It predicts whether something will happen. Example: Among all the customers of a company, which is likely to respond to a given offer?

2. Regression Attempt to estimate or predict, for each individual, the numerical value of some variable for that individual. It predicts how much something will happen. Example: How much will a given customer use the service?

3. Similarity matching Attempt to identify similar individuals in a population together by their similarity. Example: Product recommendation (finding people who are similar to you in terms of the products they have liked or have purchased)

4. Clustering http://noteshelf.org/9-tasks-in-big-data-analytics/

1/5

10/12/2018

Data Science Solutions to Business Problems by Provost and Fawcett - Noteshelf

Attempt to group individuals in a population together by their similarity. Looks at similarities between objects based on the object’s’ attributes, Example: Do our customers form a natural group or segments?

5. Co-occurrence grouping Attempts to find associations between entities based on transactions involving them. Looks at the similarity of objects based on them appearing together. Example: What items are commonly purchased together?

6. Profiling Attempts to characterize the typical behavior of an individual, group or population. Example: What is the typical cell phone usage of this customer segment?

7. Link predictions Attempts to predict connections between data items. Estimating the strength of the link. Example: Social media suggestions: “Since you and Karen share 10 friends, maybe you’d like to be Karen’s friend?”

8. Data reduction Attempts to take a large set of data and replace it with a smaller set of data that contains much of the important information in the larger set. Example: Converting a massive data set on consumer movie-viewing to a smaller data set of consumer preferences that are latent in the data.

9. Causal modeling Attempts to understand how events influence each other. Example: Which of our different marketing actions lead to the increase of sales?

Unsupervised vs. supervised data mining problems Unsupervised is when there is no target. http://noteshelf.org/9-tasks-in-big-data-analytics/

2/5

10/12/2018

Data Science Solutions to Business Problems by Provost and Fawcett - Noteshelf

Subclasses: Example: “Do our customers naturally fall into different groups?” Supervised is when a specific target is defined Subclasses: Classification and regression Example: “Can we find groups of customers who have particularly high likelihoods of canceling their service soon after their contracts expire?

Distinction between data mining and results It is important to distinguish between (1) mining data to find patterns and (2) using the results of data mining.

The CRISP data mining process This diagram show that is ts very often necessary to go though the process more than once before you can solve the problem.

http://noteshelf.org/9-tasks-in-big-data-analytics/

3/5

10/12/2018

Data Science Solutions to Business Problems by Provost and Fawcett - Noteshelf

1. Business Understanding: Think carefully about the problem to be solved and the use scenario. 2. Data understanding: Estimating the costs and benefits of each data source. Know that your data might have been collected with a different purpose. 3. Data Preparation: For example converting data to tabular format, removing or inferring missing values, and converting data to different types. 4. Modeling Pattern capturing, finding regularities in the data. This is where data mining techniques are applied to the data. 5. Evaluation: Assess the data mining results rigorously and to gain confidence that they are valid and reliable before moving on. 6. Deployment: Implementing a predictive model in some information systems business process. Note that it is not necessary to fail in deployment to start the cycle again.The Evaluation stage may reveal that results are not good enough to deploy, and we need to adjust the problem definition to get different data.

5 groups of related analytical techniques in data science 1. Statistics Provides us with knowledge (averages, sum,.. ) that underlies analytics and can be though of as a component of the larger field of Data Science. In relation to data mining, hypothesis testing can help determine

http://noteshelf.org/9-tasks-in-big-data-analytics/

4/5

10/12/2018

Data Science Solutions to Business Problems by Provost and Fawcett - Noteshelf

whether an observed pattern is likely to be a valid, general regularity as opposed to a chance occurrence in some particular data set. 2. Database Querying Query: A specific request for a subset of data or statistics about data. A query tool could help us answer: “Who are the most profitable customers in the Northeast?”. This activity differs fundamentally from data mining in that there is no discovery of patterns or models. 3. Data warehousing: Collect and coalesce data form across an enterprise, often from multiple transaction-processing systems. 4. Regression Analysis: Involve estimating or predicting values for cases that are not in the analyzed data set. 5. Machine learning and data mining: Concerned with methods for improving the knowledge or performance of an intelligent agent over time, in response to the agent’s experience in the world. Involves analyzing data from the environment and making predictions about unknown quantities. Click to see other notes from the book “Data Science for Business by Provost and Fawcett” (http://noteshelf.org/data-science-business-provost-fawcett/) Source:Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking (https://www.amazon.com/gp/product/1449361323/ref=as_li_tl? ie=UTF8&camp=1789&creative=9325&creativeASIN=1449361323&linkCode= as2&tag=noteshelf0d-20&linkId=6185cc77bf5e3ec75a02d7fb77a315c5)

Categories Data Modeling & Design (http://noteshelf.org/category/science-math/data-modeling-design/), Science & Math (http://noteshelf.org/category/science-math/)

http://noteshelf.org/9-tasks-in-big-data-analytics/

5/5...