Exam 13 2017, questions and answers PDF

Title Exam 13 2017, questions and answers
Author TSU TUNG KU
Course Data analysis
Institution 元智大學
Pages 30
File Size 491.5 KB
File Type PDF
Total Downloads 98
Total Views 132

Summary

Download Exam 13 2017, questions and answers PDF


Description



 THIS DOCUMENT CONTAINS QUESTIONS THAT REPRESENT THE SORT OF QUESTIONS THAT MIGHT APPEAR ON THE FINAL QUIZ FOR DATA MINING FOR BUSINESS ANALYTICS (MANAGERIAL). T  his document includes answers for some of the questions.  THESE ARE INTENDED TO REPRESENT THE FORMAT AND STYLE OF QUESTIONS, NOT NECESSARILY THE CONTENT.  The  first  part  contains  questions that are specifically associated with particular chapters of the DS for Biz book.  The second part thencontains questions that span multiple chapters of the DS for Biz book. N  B: On the Final Quiz, the questions will not be associated with particular chapters of the book.   

Chapters 1 & 2 

data analytic thinking, supervised vs unsupervised, the data mining process

 

 2.1 Multiple Choice Choice  In the following, choose the single best answer.  1) (True/False) We  can  build unsupervised data mining models when we lack labels for the target variable in the training data.  2) (True/False) For supervised data mining  the value of the target variable is known when the model is used.  3) (True/False) Estimating the probability ofafraudulenttransaction is an example of data mining.  4) (True/False)  Finding  the  most profitable customer is an example of an unsupervised learning task  5) (True/False) Finding the characteristics that differentiate my most profitable customers from my less profitable customers is an example of an unsupervised learning task.  6) (True/False) Choosing  which  customers  are most likely to leave is an example of the use of DM results.  1

7) (True/False) Discovering patterns of thedefaultsonautoloansis not an example of the model in use.  8) Which is not  a reason why data mining technologiesareattractingsignificant attention nowadays? a) There is too much data for manual analysis b) Data are difficult to transfer from databases c) Data can be a resource for competitive advantage d) Machine learning algorithms are easily available  9) Regression is distinguished from classification by: a) class probability estimation b) numerical attributes c)  numerical target variable d) hypothesis testing   2.2 Short Answer In the following, give brief answers (at most 2 sentences per question).  10)  What  is  a  leak  in  predictive  modeling? Are leaks really a problem? Give a brief example. A leak is a situation where  a  variable  collected in historical data gives information on the  target  variable—information  that appears in historical data but is not actually available when thedecisionhas to bemade. Itwill make us overestimate the predictive performance of our models. An example is predicting whether a customer will be a big spender knowing the categories/numbers of items they have purchased.         

2

Chapter 3 

feature selection with entropy and information gain, tree induction

 

3.1 Multiple Choice Choice   In the following, choose the single best answer:  1) (True/False) Induction reasons from general knowledge to specific facts.  2) Entropy a) is a measure of information gain b) is used to calculate information gain c) is a measure of correlation between numeric variables d) has a strong odor  3) (True/False)  In  a  classification tree, a non-leaf node is referred to as “decision node” because it allows us to give a class prediction.  4) (True/False)  Tree-structured  models cannot give us estimates of the probability of customer churn.  5) (True/False) In a classification  tree,  decision nodes can only ask questions about the attributes of the examples we want to classify.  3.2 Short Answer  In the following, give brief answers (at most 2 sentences per question).  6) What does it mean for one attribute to give information about another attribute? Give an  example  of  how  one  would find an  attribute that gives information about another attribute. What  it  means is that that the first attribute reduces the uncertainty about the second attribute. Example: old pirate (Book, p. 43-44)          3

 

Chapter 4 

linear discriminants, linear regression, logistic regression, SVMs

 

4.1 Multiple Choice Choice   In the following, choose the single best answer:  1) (True/False)  Support-Vector  Machines (SVMs)  approach classification problem by finding the widest possible bar that fits between points of two different classes.  2) Which of the following is not true about logistic regression: a) Logistic regression can be used topredict the probability of membership in a certain class. b) Logistic regression takes a categorical target variable in training data. c) A  logistic regressionrepresentstheoddsofclassmembershipasa linear function of the attributes. d) Logistic  regression requires numeric attributes and categorical attributes should be converted to numeric attributes.  3) Which of the following does not describe SVM (support vector machine)? a) SVMs are based on supervised learning b) SVM chooses the line to minimize the margin between two classes c) SVM can be applied when the data are not linearly separable  4.2 Short Answer  In the following, give brief answers (at most 2 sentences per question).  4) When we fit a parameterized numeric model to data, we find the optimal model parameters. What does this mean? By optimal parameters we mean the value of the parameters that best   fit   t he training data.  The  term “best  fit”  is used with respect to the objective function of our learning procedure; this translates to minimizing  an  error/loss/cost function (e.g. minimize the number  of  misclassified data  points, minimize the mean-squared error, minimize the negative log-likelihood).    4

4.3 Matching Matching In the following, choose the best matching for each set; each letter should be used once.  __ Logistic regression

a. numerical target variable not bounded

__ Support Vector Machines

b. decision nodes

__ Linear Regression

c. log odds

__ Classification Trees

d. widest margin

   c d a b                



5

Chapter 5 

cv, overfitting

 

5.1 Multiple Choice Choice   In the following, choose the single best answer:  1) (True/False) Cross-validation is used to estimate generalization performance  2) (True/False) Adding more complexity to a modelwill generallyincrease its performance on the training set.  3) (True/False)  Complex  models  generally  give  better  generalization performance than simple models  4) A fitting curve plots: a) True positive rate vs. false positive rate฀ b) True positive rate vs. false negative rate c) Generalization performance vs. size of training set d) G  eneralization performance vs. model complexity  5) Which is not a technique for reducing/avoiding overfitting in tree induction? a) choose largest improvement in information gain฀ b) stop growing tree based on the number of training examples at a leaf฀ c) select tree size based on validation data d) reduce tree size by cutting off branches and replacing them with leaves  6) Which is not a benefit of using cross-validation for model induction evaluation? a) It provides an estimate of generalization performance b) It provides statistics on estimated performance, so that we can understand how performance will vary across data sets c) I t’s quick to compute relative to other holdout methods d) It makes better use of limited data by using all data for both training and testing  7) Learning curves a) Are used to select an optimal parameter complexity b) Are equivalent to fitting curves c) Plot true positive rate vs false positive rate d) C  an illustrate whether obtaining more data would be a good investment e) Are shown for a given amount of training data 6

  8) More complex models a) have better predictive performance b) tend to overfit more c) are easier to train than simpler models d) are very interpretable  5.2 Short Answer  1) Using  a linear model that perfectly separates a set of data points with two labels is not always a good idea. Why is that? Give an example. 

7

Chapter 6 

similarity, neighbours, clusters

 



6.1 Multiple Choice Choice   In the following, choose the single best answer:  1) (True/False)  Evaluation  is more difficult for unsupervised data mining than supervised data mining  2) (True/False)  When  using  clustering a target variable does not have to be precisely defined at training time  3)  (True/False)  kNN  techniques  are computationally efficient in the “use” phase of predictive modeling.  4) (True/False)  In  the use phase, k-means classifies new instances by finding the k most similar training instances  and  applying a combination function to the known values of their target variables  5) (True/False)  A  2-nearest  neighbor  model  is more likely to overfit than a 20-nearest neighbor model (cf. Chapter 5). 6) Similarity measures are most essential for a) Naïve Bayes b) Tree Induction c) H  ierarchical Clustering d) Logistic Regression  7) Which is not true of k-Nearest Neighbor (k-NN)? a) It can incorporate domain knowledge b) I t builds a simple induced model c) It is robust to noisy data d) It is easy to explain how it works   



8

6.2 Short Answer 8) Distance is a key notion  underlying  many data  mining algorithms, such as k-nearest neighbor  (k-NN).  What  problem  is  there  with  comparing consumers using regular Euclidean distance, for example when they are described by age (in years), income (in dollars), and number of credit cards? How can this problem be fixed?  9) Similarity is a key notionunderlyingmanydatamining techniques. If you use Euclidean distance  to  find  similar  examples, how can  you deal with categorical attributes? The k-nearest-neighbor technique estimates thetargetvariablebasedonthe kmost similar examples.  How  exactly  would you  estimate the target variable for a regression problem?  Explain  the pros and cons of using different values for k, for example k=1 and k=N, where N is the total number of training examples. How would you choose k?  10)  Evaluation  for  clustering can be challenging; briefly discuss two different ways to understand the meaning of the clusters found by k-means clustering.  11)  Give  an  example  where clustering can be used to improve business decisions. Explain briefly.    

 



9

Chapter 7  

7.1 Multiple Choice Choice   1) (True/False) The  error  rate  of a classifier is equal to the number of incorrect decisions made over the total number of decisions made.  2) A binary classifier achieves 95% accuracyonatestsetconsisting of 95% positive and 5%  negative  instances. If we use the same classifier on a test set composed of 50% positive and 50% negative instances, we expect to get: a) higher accuracy b) lower accuracy c) the same accuracy d) cannot be determined  7.2 Short answer  1) Two of your data scientists A andB are working on aprojectfor preliminary screening of a population ofpeople for theearlydetectionofProvost’sQuizinoma. Although very rare, this disease is deadly for the person bearing it if not identified in time, so your task  is  quite  important.  After  preliminary screening, a $750 blood test can determine the  presence of the disease with almost perfect accuracy. You decided to motivate your  analysts by structuring their work as a competition: both data scientists A and B have  to work independently on the problem and then present their results separately. After the competition period is over, on the test data, data scientist A reports 99.9% percent  correctly  classified instances from her model, while data scientist B reports only  86.3%  percent  correctly  classified instances from his model. Describe carefully how  you  would  determine  which algorithm is preferable? Illustrate with some hypothetical example numbers.  2) In  a  classification application we are asked to predict whether kids are going to be infected with the flu virus during 2018 or not, and if yes vaccinate them against it. The vaccine costs $10.If a child is vaccinated, there is only a 10% chance that she will be infected. If a kid gets infected, the cost of treatment is about $1000. Write down the cost-benefit matrix for the problem.   10



P

N

P

110

10

N 1000

0

  

7.3 Matching Matching  1)  ___ accuracy

a. TP/(TP+FP)

___ recall

b. TP/(TP+FN)

___ precision

c. 1 - (FP+FN)/(P+N)

   c b a               



11

Chapter 8  

8.1 Multiple Choice Choice   In the following, choose the single best answer: 1) The area under the ROC curve is not? a) equal to the Mann-Whitney-Wilcoxon statistic b) a measure of the quality of a model’s probability estimates c) l ikely to be at least 0.5 d) l arger when false positive errors cost more  1) (True/False) Adding a budget  constraint  to  the problem formulation might change the

choice of the best ranking classifier. 2) (True/False) A profit curve can assume negative values. 

1) The points on a model’s ROC curve a) represent the performance of different thresholds b) r epresent different rankings of examples c) represent the cost of different classifications  2) I want to rank credit applicants bytheirestimatedlikelihood of default. Which technique would be  least helpful in assessing the quality of a ranking model mined from data? (cf. prior chapters) a) holdout testing b) calculate area under the ROC curve

c) calculate percent correctly classified instances d) cross-validation e) domain knowledge validation   8.2 Short Answer  3) What exactly does the area under the ROC curve represent? Be as precise as possible.  The  area  under  the  ROC  curve  represents  the  probability that a randomly selected positive example will be ranked above a randomly selected negative example. This is the same as the Mann-Whitney -Wilcoxon statistic.

4) Give a shortexample of using the same model used in different contexts with different thresholds to make different decisions. An examplewould be a model that predicts the GPA a student will achieve. When used used on prospective students, we might want toset some threshold A to extend offers 12

to theonesabovethat threshold. When used on current students, we might want to set a different threshold B to discover the students performing poorly and offer them help (e.g. free tutoring).  5) Give  two  different reasons why using ROC curves  canbe  more effective for assessing model quality than the percent of classifications that are correct (a.k.a. "vanilla" accuracy). 6) Last month your boss sent a mailing to 20,000 of your existing customers with a special offer on a Hoosfoos credeen. The response was exciting: 1% of them responded, which brought in $200,000 in revenue. She has now delegated to you the task of continuing the program, and has given you a budget of $10,000, which will allow you to target another 20,000 customers (out of your customer base of 100,000). You don’t want to just target them randomly, as your boss did. You build a tree model and a logistic regression. Describe how to evaluate them as follows. Describe (a) the confusion matrix and (b) how you will fill it out for one of the models. Describe  (c)  the  cost/benefit matrix for this problem, including the costs and benefits for this case. (d) Show the evaluation function you will use to compare your systems. (e) How do (a) and (c) come into play in this evaluation function?       

   

 13

Chapter 9  

9.1 Multiple Choice Choice   1) You roll a trick 6-sided die twice. The trick is that the die has the same number on all sides. What is the conditional probabilitythatthe sum of the numbers that come up on the two rolls will be greater than 7 given that the first roll is 5 ? a) 1/3 b) 1/6 c) 2/3 d) 3/6 e) 6/6  9.2 Short Answer Answer  2) Explain why Naive Bayes is naive.   3) Explain  the  meaning of each of the different terms in Bayes Rule. Describe one way that this rule is used for data mining. Explanation of the different terms in chapter 9 of the book.  

 



14

Chapter 10  10.1 Multiple Choice Choice  1) (True/False) What is considered a stopword depends on the context of the textual data.  2) One key part of the data miningprocessis creating attributes to describe examples. In order to represent documents (such as emails) as examples, we create term (e.g., word) based attributes to describe  the  documents. Which of the following is not a common approach? a) whether or not the term appears in the document (binary attribute) b) term frequency (number of times term appears in document) c) term frequency/total number of terms in document d) t erm frequency times the term’s frequency in the document corpus  10.2 Short Answer Answer  3) The  word ‘good’ does not always imply positive sentiment in a review. Give an example. Describe a way that we can circumvent this problem. An  example  is  ‘The  movie  was not good’. Using 2-grams instead we can catch ‘not good’ as a different feature than ‘really good’. However this is approach is far from perfect - for example: ‘I can not understand why people say the movie is not good’.



 



15

Chapter 13   Q) You are on an interview where they notice that you've taken a data mining class. (a) They ask you about what you learned there, and besides talking about nitty-gritty modeling stuff, you want to give a bigger picture. Explain why it is important to think about data mining project strategically, with respect to making internal investments. What sort of investments might you have to make? (b) Now they're interested and ask you if you believe a firm can achieve sustained competitive advantage from data...


Similar Free PDFs