Anl303&305 PCQ - Answers to Pre-course quiz PDF

Title	Anl303&305 PCQ - Answers to Pre-course quiz
Course	Quantitative Methods
Institution	Singapore University of Social Sciences
Pages	51
File Size	465.8 KB
File Type	PDF
Total Downloads	35
Total Views	136

Preview

CLICK TO PREVIEW PDF

Summary

Answers to Pre-course quiz...

Description

Which of the following is not a measure of location? standard deviation Which of the following is a data mining solution for mitigating risks in procurement frauds? Use a predictive model to detect fraudulent procurement transactions Which of the following about clustering is false? Clustering helps analysts to find items that co-occur frequently. Which of the following data attributes are categorical in type? ethnicity of individuals in a country What are the possible types of predictive models? Estimation and Classification models Consider the following statements A and B. A: "Which customers are likely to buy the product?" and B: "Which insurance claims are likely to be fraudulent?" A and B are data mining problems The diagram shows the output of a/an ______. decision tree Which node in IBM SPSS Modeler provides visualisation tools for multivariate data exploration? Sample node (Wrong) Graphboard is the correct answer For 10 consecutive months, a convenience store owner keeps track of the number of customers who purchase brand XYZ of detergent in a month. The recorded data was {1, 2, 3, 3, 3, 4, 5, 3, 2, 1}. What is the median, and the 80th percentile of the data? 3, 3 Which of the following data attributes are of nominal scale of measurement? gender of political candidates in an election

Which of the following is not a data mining technique? Business Process Reengineering Instead of ________, a heat map uses _____ to represent the values in a cell. numbers, colors When discussing the use of multivariate data exploration tool, which of the following is not an appropriate tool? Barcha rt Mr. X keeps track of the number of times he goes running per week, for 8 consecutive weeks. His data looks like this: {3, 1, 3, 2, 3, 0, 3, 1}. The frequency of 3 is ____. 4 Data mining produces ______ which can make predictions automatically. data mining models Which of the following business problems can be solved using data mining? All of the listed choices In the modelling stage of data mining, the key task is to _________. Identify the appropriate tools and techniques to use in a data mining application Parallel coordinates plots are only useful when the objects fall into a small number of groups, and the number of data objects is not too large In the output of the Data Audit node, which of the following is most useful for deciding?whether normalisation should be carried out? Rang e Which of the following is an open source data mining software package? Wek a The ____ of a categorical attribute is the value that has the highest frequency

Mode In post-modelling stage, the key task is to ___________. deploy the data mining model or results Graphical techniques help analysts to _____. display data profile using graphs

When a data file is being accessed by another application (such as MS Excel), the IBM SPSS Modeler will have problem reading the data values of the data file because of ____. a file access conflict Which of the following is/are appropriate example(s) of data mining applications? all of the listed choices Which of the following about the given a rule X -> Y is true: X is the antecedent and Y is the consequent. Data mining focuses on the exploration and discovery of _______. previously unknown patterns or trends A model that predicts who is most likely to purchase a product is best built on: data that describes existing customers who have purchased the product recently. Consider the following statements A and B. A: "How can I sell more of my product to customers?" and B: "Which customers are most likely to purchase the product?" A is a business problem and B is a data mining problem Which of the following is not on a ratio scale of measurement? The temperature, in Celsius of the hood of a car Variables that are measured on a ______ scale are qualitative or categorical variables. nomin

al In post-modelling stage, the key task is to ___________. Deploy the data mining model or results Which of the following is the proper form of an association rule: If antecedent(s) then consequent(s) Which of the following nodes can be used to generate a heatmap? Graphboard mode Which of the following is not part of the pre-modelling stage in data mining? Interpreting the generated model Data mining ______ which is used by people to improve the strategy, tactics and operational decision-making of an organization. Discovers new and actionable business knowledge Which of the following nodes in IBM SPSS Modeler can be used to obtain information regarding outliers in a dataset? Data Audit node Data exploration?is performed before?data preparation?is attempted in order to All of the listed choices

In IBM SPSS Modeler, the data type 'Flag' is used for data with _____ distinct values 2 Which of the following is not a known clustering technique? W-means Consider the following statements A and B. A: "How to increase profit margin?" and B: "How do I reduce customer churns?" A and B are business problems The first step of pre-modelling in data mining is to ____________. Define the business problem

Which of the following nodes is suitable for reading in a CSV data file? Var. File node

Which of the following Field delimiters is not readily available in the Var. File node?

When analysing bivariate data, which of the following is a suitable tool for data exploration? Scatterplot

When differentiating the various aspects of data mining, which of the following is true about the term "data mining"? There is currently no universally accepted definition of data mining.

Which of the following data attributes is numerical in type? number of hours spent on outdoor activities per week by children below 15 Which of the following is an appropriate application of cluster analysis? Market Segmentation In a parallel coordinates plot, each record is represented as a _____ instead of a _____. Line, point Which of the following nodes can be used to generate a histogram? Histogram node In the modeling stage of data mining, the key task is to _________.

Identify the appropriate tools and techniques to use in a data mining application

In the quality tab of the Data Audit node of IBM SPSS Modeler, the "Complete fields (%)" refers to the percentage of _____ that contain _________.

fields, no missing values

Which of the following factor(s) is/are potential driver(s) of Data Mining?

All of the listed choices

In many large datasets, there is typically an "ID" field that is also recorded. It typically distinguishes records from one another. How can this field be used?

It can be used to identify duplicate records in the data

Boxplots are useful graphical tools for exploring quantitative attributes with a five number summary: minimum, lower quartile, ______ , upper quartile and maximum

median

Which of the following nodes can be used to generate a scatter plot?

Graphboard node

In a scatterplot, which of the following attributes can be used in order to allow multiple variables to be presented instead of just two? I.Colors II.Shapes III.Sizes of points All of the listed choices

When discussing the use of multivariate data exploration tool, which of the following is not an appropriate tool? Barchart When recommending data analytics tools, one should note that data description techniques help analysts to _____.

summarise data using descriptive statistics Which of the following nodes can be used to generate a bar chart? Distribution node

Which of the following is/are important factor(s) for successful data mining? All of the listed choices

Quiz 2 When assessing the application of association analysis, which of the following is not a reason to perform association analysis? To find groups in data In association analysis, we are on the lookout for interesting rules with _____ support and ____ confidence. High, high The ordinal scale of measurement assigns numbers that serve the purpose of: Ranking

The nominal scale of measurement assigns numbers that serve the purpose of: Classification Which column in the Data Audit node gives you no information about missing values? Outliers Which of the following (in the IBM SPSS Modeler) is the most suitable type (i.e., Measurement) for a variable that can have two distinct values? Flag Clustering is a method that aims at grouping data with _____ together. Similar characteristics Variables that could be measured on an interval or ratio scale are: All of the listed choices Which of the following types (in the IBM SPSS Modeler) should be used for a numeric data? Continuous Variables that could be measured on a nominal scale are: All of the listed choices Which of the following about data cleaning is not true? Data cleaning may involve validating data at the point it is entered into the system In association analysis, the confidence of a rule X -> Y measures how often items in X occur in transactions containing Y Market basket analysis is the discovery of the buying habits of customers through searching for: items that are frequently purchased together Which of the following is not a parameter of the Apriori node? Only false values for flags The mode of a categorical attribute is the value that has the: Highest frequency In IBM SPSS Modeler, a "continuous" variable can be: I-a real number The interval scale of measurement assigns numbers such that: the intervals between numbers can be meaningfully interpreted Which of the following types (in the?IBM SPSS Modeler) should be used for a data variable with multiple distinct values that have an inherent order?

Ordered set Which of the following is not a parameter of the Apriori node? Minimum rule support __________________ converts the scale of data into a form that is appropriate for the data mining technique Data transformation Which of the following nodes can be used to perform association rule mining? Apriori node Data quality may be compromised by: All of the listed choices In web mining, association analysis is applied to understand: the frequency with which combinations of web pages are visited by a given user Which of the following is not a proper normalization technique when preparing data for mining? Normalization to the third normal form Types of data commonly encountered in data mining exclude: Imaginary data

Which of the following about k-means is true? K-means requires distance measure _____________ enables one to obtain a smaller data-set that is almost as informative as the original one. Data reduction When selecting clustering techniques for execution, which of the following is not a known clustering technique? W-means Given a field named “Price”, which of the following nodes can be used to create a new field named GST, where GST = 0.07*Price?

Derive node In association analysis the rule support of X -> Y is defined as the frequency of simultaneous occurrence of X and Y Possible processes that may introduce noisy or anomalous data are: All of the listed choices Which of the following is not a data reduction technique? Min-max normalization Which of the following nodes can be used to perform min-max normalization on continuous data? Auto-data prep node Which of the following nodes can be used to treat missing values for continuous or nominal data? Auto-data prep node Which of the following nodes can be used to perform convert continuous data into categorical data? Binning node Which of the following is not an example of numeric data: Gender of a person Which of the following is not a method of dealing with missing data? Replacement with the null?value Given a rule X -> Y, the rule support is interpreted as: The fraction of total transactions that contains both X and Y I – Two distinct values In cluster analysis, a centroid means: The centre or average of a group of observations within a cluster Which of the following is NOT an application of association rule mining? Market segmentation When evaluating an association rule, which of the following is the proper form? If antecedent(s) then consequent(s)

Measure(s) of dispersion is/are: Range and Standard deviation Measure(s) of location is/are: Mean and median The variance indicates how ______ the distribution of the data is around its mean. Spread out Which of the following is not an appropriate parameter setting in the Apriori node? Maximum antecedent support = 25%

Possible approach(es) to identify outliers and unusual cases in the data include: all of the listed choices A heat map is like a table that uses ____ instead of numbers to represent the values for the cells. Colours To correct incomplete or missing data, one can: All of the above eliminate records which contain missing values

encode and identify missing values

substitute the missing values with values recommended by domain experts

The range, is defined as the: Difference between the maximum and minimum value Which of the following is not an algorithm that performs association analysis? SUPPORT VECTOR MACHINES

“There are four different algorithms in IBM SPSS Modeler that perform association analysis; they are Apriori, Generalised Rule Induction (GRI), Carma and Sequence” So it's the last option. I forgot what it was haha When categorising data preparation tasks, which of the following is not an appropriate category? Data virtualization Which of the following is not a known binning technique? Horizontal Discretisation Putative subdivision

Which of the following nodes can be used to reduce the number of records to be used for analysis? Transpose node Filler node

Which of the following is not a rule evaluation measure in the Apriori node? Consequent Importance Which of the following does not belong to ratio scale? Temperature In IBM SPSS Modeler, a "flag" variable is used for data with: III- no distinct values Used for data with two distinct value

Which of the following types (in the IBM SPSS Modeler) should be used for a variable that can have multiple distinct values with no inherent order? Continuous Nominal

In association analysis, the support count for a rule X -> Y is defined as

the frequency of occurrence of X

____________ identifies and removes data anomalies and inconsistencies Data cleaning Records may contain missing values because: All of the listed choice

Quiz 3 The __________ of a predictive model is defined as the ratio of the number of incorrectly predicted cases to the total number of cases. misclassification rate

Which of the following nodes can be used to generate a decision tree model?

CART node Which of the following is not an impurity measure for categorical targets when using the CART node? Entropy

Entropy he most appropri ate techniq ue for custom er segmen ta Which of the following nodes is most likely to be used during the data preparation stage? Auto Data Prep node

Which of the following nodes is most likely to be used during the data preparation stage?

Auto data prep node Derive node

Estimation refers to the prediction of a target variable that is ____ in nature

continuou s

Which of the following tasks is not part of the Business Understanding phase?f we are looking to predict whether a certain credit card transaction was fraudulent or

Describing and exploring data The error rate of a predictive model is defined as: 1 minus the accuracy rate ` Consider the following CRISP-DM diagram. The Modeling stage may need to link back to the Data Preparation stage to ______.

prepare the data in a form required by some data mining techniques The appropriate technique for predicting customers who are likely to buy a product is based on

predictive analysis Which of the following is not true of decision trees?

They are not able to handle missing values easily Once a business problem is well understood, __________ can be formulated. business objectives

Which of the following is/are not part of a decision tree?

flower s A medical test falsely suggesting that a person is healthy when the person is really sick is a

false negative

A medical test falsely suggesting that a person is sick when in fact the person is healthy is a false positive

Which of the following is the most appropriate technique and defence for market basket analysis?

association rule mining, because it finds commonly purchased items. Unlike a business goal, a data mining goal is:

specified in technical terms In a decision tree, a ____ node is a child node with no further subdivisions or splitting. lea f

The accuracy of a predictive model is defined as the ratio of the number of ____ predicted cases to the total number of cases

correctl y When evaluating decision tree models, which of the following is not a model evaluation measure?

True neutral rate When interpreting a decision tree, a node which contains the larger set of observations is known as the parent node while its subdivisions are known as:

child

nodes Which of the following is not a task in the data understanding phase? Selection of data mining techniques

In CRISP-DM, which of the following is not a subtask in the "modelling" phase? deploying the model

In data-mining, what are the two majors types of predictive modeling? Classification and Estimation

Which of the following is not a predictive model? Self-organizing map

True negatives refer to the number of correct predictions for the negative cases

Which of the following tasks is not part of the Modeling phase? Deployment planning

Which of the following statements is not true about CRISP-DM?Which of the following subtasks is not part of the "Deployment" phase in CRISP-DM?

It forces the analyst to focus on modeling issues only. Which of the following is not a task under "data preparation" in CRISP-DM? Verifying data quality

When executing a CART for classification, one is concerned with the prediction of a target variable that is ___ in nature.

categorica l CRISP-DM stands for:

Cross-Industry Standard Process for Data When assessing the application of decision trees, which of the following is generally not true?Mining Which of the following is the most appropriate description of an unseen data example? A data example used for testing A data example that is never used (wrong) A data example that is used occasionally A data example used for training In CRISP-DM, which of the following is not a task in the "data understanding" phase? Which of the following parameter cannot affect the size of a decision trehe __________ of a predictive model is defined as the ratio of the number of incorrectly predicted cases to the total number of case report generation from data The sensitivity of a predictive model is defined as: I?? the true positive rate II? the ratio of the correctly predicted positive cases to the total number of actual positive cases III the ratio of the wrongly predicted positive cases to the total number of actual positive cases Both I and II Which of the following about CRISP-DM is true?

CRISP-DM is independent of analytic tasks or platform Which of the following parameter cannot affect the size of a decision tree?

Name of the C&R Tree node Which of the following nodes is most likely to be used during the data understanding stage? Data Audit node

It is important to ensure that the data recorded really meet the data specification in terms of: all of the listed choices

Which of the following is the default setting for Maximum Tree Depth?

5 Which of the following is NOT an algorithm for generating a decision tree?

Apriori The resulting visual representation and explicit rules make decision t...