D Silytc Notes + Reviewer PDF

Title D Silytc Notes + Reviewer
Course Introduction to analytics
Institution De La Salle University
Pages 9
File Size 319.3 KB
File Type PDF
Total Downloads 382
Total Views 944

Summary

TERMS:● Analytics - an encompassing and multidimensional field that uses mathematics, statistics, predictive modeling and machine learning techniques to find meaningful patterns and knowledge in recorded data. ● Bar chart - A vertical or horizontal rectangle represents the frequency for each categor...


Description

TERMS: ●

● ●

● ●

● ●

● ● ● ● ● ● ● ● ● ● ● ● ●

Analytics - an encompassing and multidimensional field that uses mathematics, statistics, predictive modeling and machine learning techniques to find meaningful patterns and knowledge in recorded data. Bar chart - A vertical or horizontal rectangle represents the frequency for each category. Big data - a term applied to a dataset that exceeds the processing capacity of conventional database systems, or it doesn’t fit the structural requirements of traditional database architecture. Bullet graph - features a single measure that extends into ranges representing qualitative measures of performance. Business analytics - the use of traditional and newly developed statistical statistics methods, advances in IS, and techniques from management science to explore and investigate past performance. Census - An examination of all of the population of measurements. Central Limit Theorem - even if the population of individual items is not normal, there are circumstances when the population of all sample means is not normal. Census Sample - data gathered on every member of the population. Census Study - occurs if the entire population is very small or it is reasonable to include the entire population. Coefficient of Variation - measures the size of the standard deviation relative to the size mean. Continuous - Could be meaningfully divided into finer levels; it can be measured on a scale or continuum and can have any numerical value. Convenience sampling - sampling where we select elements because they are convenient to sample. Correlation coefficient - measure of the strength of the relationship that does not depend on the magnitude of the data. Cross-sectional data - data collected at the same or approximately the same point in time. Dashboard - provides a graphical presentation of the current status and historical trends of key performance indicators. Gauges - charts that present data similar to a speedometer. Data - facts and figures from which conclusions can be drawn. Data Set - the data that are collected for a particular study. Elements may be people, objects, events or other countries. Data Mining - the use of predictive analytics, algorithms, and IS techniques to extract useful knowledge from huge amounts of data. Data Warehousing - a process of centralized data management and retrieval. Its objective is the creation and maintenance of a central repository for all of an organization’s data.

● ● ● ● ●

● ● ● ● ●





● ● ●

● ● ● ● ● ● ● ● ●

Descriptive Analytics - the use of traditional and newer graphics to represent easy-to-understand visual summaries of up-to-the-minute data. Descriptive Statistics - the science of describing the important aspects of a set of measurements. Discrete - A count that involves only integers; the discrete values cannot be subdivided into parts. Element - an object on which a measurement is taken. Empirical Rule - or the “68–95–99.7” rule is a rule where 68% of the data falls within one standard deviation, 95% percent within two standard deviations, and 99.7% within three standard deviations from the mean. Error - mistakes that we can make when judging the null hypothesis. Existing Sources - data already gathered by public or private sources. Experimental and observational studies - data we collect ourselves for a specific purpose. Finite population - a population of limited size. Frequency distribution - a table that summarizes the number of items in each of several non overlapping classes; a list of data classes with the count of values that belong to each class. Histogram - a picture of the frequency distribution; a bar graph-like representation of data that buckets a range of outcomes into columns along the x-axis. The y-axis represents the number count or percentage of occurrences in the data for each column and can be used to visualize data distributions. Hypothesis - prediction about the relationship among two or more variables or groups based on a theory or previous research; assumptions or theories that a researcher makes and tests. Infinite population - a population of unlimited size. Interval - a set of real numbers that contains all real numbers lying between any two numbers of the set. Judgement sampling - samples in which a person who is extremely knowledgeable about the population elements he or she feels are most representative. Mean - average or expected value. Measurement - a way to assign a value of a variable of an element. Median - middle point of the ordered measurements. Mode - most frequent value. Nominal - any numeral used for identification, however it was assigned. Non-probability Sample - members are selected from the population in some nonrandom manner. Ogive - a graph of a cumulative distribution. Ordinal - a number that indicates the position or order of something in relation to other numbers, like, first, second, third, and so on. Outliers - measurements that are very different from other measurements. 1

● ● ● ●

● ● ● ● ● ● ● ● ●



● ● ● ● ● ● ● ●

Parameter - numerical characteristic of a population. Pareto chart - A bar chart having the different kinds of defects listed on the horizontal scale. Pie chart - A circle divided into slices where the size of each slice represents its relative frequency or percent frequency. Population - A set of all elements about which we wish to draw conclusions; a collective term used to describe the total quantity of things (or cases) of the type which are the subject of your study; can consist of certain types of objects, organizations, people or even events. Population of Measurements - Measurement of the variable of interest for each and every population unit. Population Mean - value to expect on average and on the long-run. Population Parameter - a number calculated from all the population measurements that describes some aspect of the population. Predictive Analytics - methods used to find anomalies, patterns, and associations in data sets to predict future outcomes. Prescriptive Analytics - looks at the variables and constraints, along with predictions from predictive analytics, to recommend courses of action. Probability sampling - sampling where we know the chance that each element in the population will be included in the sample. Probability Sample - each member of the population has a known non-zero probability of being selected. Process - a sequence of operations that takes inputs and turns them into outputs. Qualitative - the possible measurements fall into several categories; Can’t be expressed as a number and can’t be measured; Consists of words, pictures, and symbols, not numbers. Quantitative - the possible measurements of the values of a variable are numbers that represent quantities; Can be expressed as a number or can be quantified; can be measured by numerical values. Range - Largest measurement minus the smallest measurement. Ratio - the quantitative relation between two amounts showing the number of times one value contains or is contained within the other. Relative frequency - summarizes the proportion of items in each class. Sample - A subset of the population; a collection of sampling units drawn from a sampling frame. Sample Correlation Coefficient - point estimate for the population correlation coefficient. Sample Mean - a point estimate of the population mean. Sample Statistic - a number calculated using the sample measurements that describes some aspect of the sample. Sampling - a process of selecting just a small group of cases from out of a large group.

● ● ● ● ● ● ●

● ● ● ● ● ● ●

Sampling Error - the deviation between an estimate from an ideal sample and the true population value. Sampling Frame - a selected category pertaining to certain groups that will be of interest to your study; list of sampling units. Sampling Units - nonoverlapping collections of elements from the population that cover the entire population. Sparklines - line charts drawn without axes to embed in text. Standard Deviation - square root of the population variance. Statistic - numerical characteristic of a sample. Statistical Inference - the science of using a sample of measurements to make generalizations about the important aspects of a population of measurements. Statistical Power - the probability of rejecting a null hypothesis that is false; probability of finding relationships or differences that exist. Time series data - data collected over time periods. Treemaps - display information in a series of clustered rectangles. Variable - any characteristic of an element. Variance - average of the squared deviations of all the population measurements from the population mean. Voluntary response sampling - samples in which participants self-select. Z-score - number of standard deviations that x is from the mean.

CHAPTER 1: Introduction to Business Analytics AREAS OF ANALYTICS: 1. Test Analytics 2. Data Mining 3. Statistics 4. Optimization 5. Visualization DATA ANALYTICS METHODOLOGY: 1. Data collection 2. Data preparation 3. Data analysis 4. Model building 5. Results 6. Put into use ANALYTIC PROCESS: 2

1. 2. 3. 4.

Data Analyze Generate reports Smarter decision making

1.

Example: For n = 65 ● if K = 6 because 26 = 64, but 64 < n ● if K = 7 because 27 = 128, and 128 > n ● So use K = 7 classes

WHEN IS ANALYTICS NOT PRACTICAL? 1. When There’s No Data. 2. When There’s No Precedent. 3. When the Decision Maker Has Considerable Experience. 4. When the Variables Can’t Be Measured. CHALLENGES OF BUSINESS ANALYTICS: 1. Environment 2. Competition 3. Customers INDUSTRIES BENEFITING FROM ANALYTICS: 1. Insurance 2. Travel and tourism 3. Finance 4. Health 5. Telecommunications 6. Retail 7. Agriculture CURRENT TRENDS IN ANALYTICS 1. Data Quality Management (DQM) 2. Data discovery 3. Artificial intelligence 4. Predictive and prescriptive analytics tools 5. Connected clouds 6. Data governance and trust 7. Security - digital ethics and privacy 8. Growing importance of the CDO & CAO 9. Collaborative business intelligence 10. Consumer experience

CHAPTER 2: Descriptive Statistics and Analytics: Tabular and Graphical Methods CONSTRUCTING A FREQUENCY DISTRIBUTION:

Find the no. of classes. (Group all of the n data into K number of classes.) a. n = all of the given data b. K = no. of classes; the smallest whole number for which 2k is greater than or equal to n.

2.

Find the class length. a. Formula

=

( 𝑙𝑎𝑟𝑔𝑒𝑠𝑡 𝑛 − 𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑛) 𝐾

=

( 𝑙𝑎𝑟𝑔𝑒𝑠𝑡 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑚𝑒𝑛𝑡 − 𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑚𝑒𝑛𝑡) 𝑛𝑜. 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠

Example:

(29− 10) = 7

2.7143

Then, round off as needed. 3.

Form non overlapping classes of equal width. a. The classes start on the smallest value. i. Lower limit of first class = smallest value ii. Upper limit of first class = smallest value + class length. Example: Class 1 Class 2 Class 3

4.

10-16 17-23 24-30

Tally and count. Example: Class 10...


Similar Free PDFs