Title | STAT 1401 - Chapter 1 |
---|---|
Course | Discrete Math ematics |
Institution | University of Winnipeg |
Pages | 8 |
File Size | 475.8 KB |
File Type | |
Total Downloads | 54 |
Total Views | 136 |
Sohail Khan...
DATA COLLECTION & ITS GRAPHICAL PRESENTATION --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- UNIT 1 - LECTURE 1 What is Statistics? ● STATISTICS is a science of collecting, summarizing, analyzing, interpreting information (data) and drawing conclusions ○ NOTE: Example of test question - “is population a statistic?” etc. ● Statistics has 2 types: ○ DESCRIPTIVE i s the science of describing the important aspects of a set of measurements. It consists of methods for organizing and summarizing information ○ INFERENTIAL i s if the population is small enough, then we could take a census and not have to sample and make any statistical inferences ■ CENSUS is a complete count of a population ● DESCRIPTIVE STATISTICS includes the construction of graphs, charts, and tables and the calculation of various descriptive measures such as averages, measures of variation, and percentiles ○ ie. Describes data (for eg, a chart of the graph) ○ Helps to summarize and analyze data but does not help to conclude data ○ QUESTION: Can I use descriptive statistics on the population and sample? ■ A: Yes ● INFERENTIAL STATISTICS c onsists of methods which use sample information to conclude about the population ○ i.e. Allows you to make predictions (“inferences”) from that data ○ Inferential statistics is only needed if you don’t know the truth ○ Helps you infer or conclude what is unknown to you ○ The purpose is to use a subset of the population ● Statisticians analyze the information obtained from a sample of the (eg., voting) population to make inferences (draw conclusions) about the preferences of the entire voting population (18+ individuals) ○ Inferential statistics provides methods for drawing such conclusions Populations and Samples ● POPULATION is the entire group of individuals that we want information about ○ i.e. Values as the largest collection of values of a random variable for which we have an interest at a particular time ○ eg. We are interested in the weights of all the children enrolled in a certain country elementary school system ○ Populations could be infinite (values consist of a fixed number) or infinity (consists of an endless succession of values) ● A SAMPLE is a group of individuals from the population that we examine in order to gather information ○ i.e. Part of a population ○ If I know the entire population, no use to take a sample out of it
Example: Descriptive Statistics ● The table below shows data on 25 mutual funds. Descriptive statistics can be used to provide summaries of this data. For example, a tabular summary of the data for the categorical variable fund type is shown on the table. A graphical summary of the same data, called a bar chart, is shown in the figure. These types of tabular and graphical summaries generally make the data easier to interpret. From the table and figure, we can see that the majority of the mutual funds are of the Domestic Equity type. ● FREQUENCY i s the number of times a data value occurs (count) ● The percent is found by dividing the frequency with the total of data values (25)
Example: Inferential Statistics ● Suppose a researcher wants to use the sample data to make an inference about the average hours of useful life for the population of all light bulbs that could be produced with the new filament ● Suppose he has a random sample of 200 light bulbs and the average lifetime for each light bulb is 76 hours ● We can use this sample result to estimate that the average lifetime for the light bulbs in the population is 76 hours ● This is an example of survival analysis ● What we would do instead: Take a random sample, let them run and base your results on 2 light bulbs to generalize the population Example: Descriptive and Inferential Statistics ● You might stand in a mall and ask a sample of 100 people if they like shopping at Forever 21. You could make a bar chart of yes or no answers (this is descriptive statistics) or you could use your research (this is inferential statistics) to reason that around 75-80% of the population (all shoppers in all malls) like shopping at Forever 21 Parameters and Statistics ● A PARAMETER is a number that describes some aspect of the population ○ Usually unknown ○ Greek letter mu is used to calculate the average of the population ■ Generally not known because the population is not known ● A STATISTIC is a number that is computed from the sample ○ Often used to estimate the unknown parameter ● If parameters are known to us, they are constants and we won’t need statistics ○ If they’re not known, we use statistics to estimate it ● X bar, x (statistic) is a variable because it will vary from sample to sample Example: Descriptive vs. Inferential Statistics ● Example 1: 30 of the 198 students enrolled in statistics 1501 were asked if they wanted to test 2 to be a take-home or an in-class assessment. 20 or about 67% of the students polled indicated a preference for an in-class exam. The professor concluded that all students in statistics 1501 would prefer an in-class examination for the second assessment ● QUESTION: Did the professor perform a descriptive study or an inferential study? ● ANSWER: He performed an inferential study because he took a sample (30) of students from the total number of students (198) Basic Terms ● An ELEMENT OR MEMBER of a sample or population is a specific subject or object (eg. a person, firm, item, state, or country) about which the information is collected ● A VARIABLE is a characteristic or condition that can change or take on different values ○ Most studies begin with a general question about the relationship between two variables for a specific group of individuals. The value of a variable for an element is called an O BSERVATION OR MEASUREMENT ● A DATA SET is a collection of observations on one or more variables Variables and Data ● A VARIABLE is a characteristic which may change its value while it is under observation ○ i.e. A characteristic that varies ○ eg. Height of a child, temperature across a province ● A variable can be quantitative (numerical) or qualitative (attribute, categorical) ● RANDOM VARIABLE are values obtained arise as a result of chance factors so that they cannot be exactly predicted in advance ● QUANTITATIVE (NUMERICAL) VARIABLE is meaningful numeric values that represent a measurable quantity ○ eg. Height, weight, etc.
●
○ There are different ways to sort quantitative data QUALITATIVE (ATTRIBUTE, CATEGORICAL) VARIABLE means that it does not take meaningful values ○ Categorical variable places an individual into one of the several groups or categories ○ You can assign numerical values to it but it has no meaning ○ eg. A sick person is given a medical diagnosis, a person designated as belonging to an ethnic group, place, or object
●
●
Quantitative variables can be: ○ DISCRETE which takes finite (or countably infinite values) ■ eg. Shoe size ○ CONTINUOUS takes infinite values ■ eg. Shoe length, time, height, weight Values taken by the variables are called DATA which are raw materials of statistics ○ Data is numbers ○ Data is values taken from the variable ○ Data can be quantitative or qualitative ○ Look at variable FIRST (if the variable is quantitative, data is quantitative and vice versa if it were qualitative)
Example: Types of Variables ● A (QUALITATIVE) CATEGORICAL/ATTRIBUTE VARIABLE places an individual into one of several groups or categories ○ Recall: You can assign numerical values but they have no mathematical meaning ○ A yes or no question (eg. Did you pay income tax) ○ eg. Your blood type (A, B, AB, O), your hair colour, your ethnicity, whether you paid income tax last year or not ○ The genders (male/female) of professional athletes, shirt numbers on professional athletes uniforms - substitutes for names ● A QUANTITATIVE VARIABLE takes on numerical values on which arithmetic operations can be made ○ Recall: Numerical values have mathematical meaning ○ Taking about the amount (eg. How much did you pay income tax) ○ eg. How tall are you, your age, your blood cholesterol level, the number of credit cards you own Distribution ● The DISTRIBUTION of a variable describes what values the variable take on and how often Example: Distribution of a Categorical Variable ● M&M’s plain chocolate candies: 30% browns, 20% each of yellows and reds, and 10% each of oranges, greens, and tans ● Colour = qualitative ● Number of coloured M&M = quantitative ● This is a colour distribution Observational Studies and Designed Experiments ● An OBSERVATIONAL STUDY observes individuals and measures variables of interest but does not attempt to influence the responses ○ ie. You observed, you record, you summarize, you analyze, you interpret - all on the basis of your operation ○ Cannot establish cause and effect relationships ● A DESIGNED EXPERIMENT (or simply EXPERIMENT) deliberately imposes some treatment on individuals in order to observe their responses ○ ie. The researcher gets involved actively and changes conditions to see what happens in response ○ Can establish cause and effect Data Collection: Samples, Surveys, and Experiments ● Data can be through surveys (observational studies) or experiments
● ● ●
The goal of every statistical study is to collect data and then use the data to make a good decision Any decision you make using the results of a statistical study is only as good as the process used to obtain the data If the process is flawed, then the resulting decision will be questionable and meaningless
Planning a Survey: ● (1) Objectives of the study ○ Why I want to do this study ○ What is the purpose ● (2) Define the target population ○ eg. The voting population is not everyone is Manitoba, only the ones able to vote (18+) ● (3) Sampling methods ● (4) Prepare questionnaire ● (5) Conduct a survey ● (6) Administer survey Data Collection Types ● CENSUS is a survey that includes every member of the population ○ ie.Complete count of the population ● SAMPLE SURVEY is the technique of collecting information from a portion of the population ● REPRESENTATIVE SAMPLE i s a sample that represents the characteristics of the population as closely as possible ○ eg. If I wanted to say something about the entire class, my population should be the entire class Random Sample vs. Non-Random Sample ● A RANDOM SAMPLE is a sample drawn in such a way that each member of the population has some chance of being selected ● In a NON-RANDOM SAMPLE, some members of the population may not have any chance of being selected in the sample ● Random does not mean chaos, it means chance has to play a role Application of Randomness to Statistics ● If each unit of the population has the same chance of selection (or if each sample of the same size from the same population has some chance of selection) then the sample is called S IMPLE RANDOM SAMPLE ● SAMPLING WITHOUT REPLACEMENT i s when each sample unit of the population has only one chance to be selected in the sample ○ eg. A ballot draw, winner 1 is selected and their ticket is not put back before the next draw ● SAMPLING WITH REPLACEMENT i s when a sampling unit is drawn from a finite population and is returned to that population after its characteristic(s) have been recorded before the next unit is drawn ○ eg. A ballot draw, winner 1 is selected and their ticket is put back before the next draw ● “Each sample of the same size from the same population has the same chance of selection” ○ eg. A wedding with 320 people with 40 tables. A card is drawn from a deck and whoever has that card under their table may get food first Sorting and Classifying Data ● It is necessary to start by sorting or classifying the data in some ways. There are two methods for sorting data: ○ (1) Sorting data into a stem-and-leaf plot ○ (2) Sorting data into a frequency distribution table ● A STEM-AND-LEAF PLOT is a good way to obtain an informative visual display of a data set, which has at least two digits. To construct a stem-and-leaf plot, use the following steps: ○ (1) Divide each number into two parts: a STEM , consisting of one or more of the leading digits, and a LEAF , consisting of the remaining digit ○ (2) List the stem values in a vertical column to the left ○ (3) Record the leaf for each observation beside its stem on the right Stem-and-Leaf Plot ● Example 1: Consider the data in the following table. These data result from a 150-question aptitude test given to 50 individuals recently interviewed for a position at a manufacturing company. The data indicate the number of questions answered correctly. 112
72
69
97
107
73
92
76
86
73
126
128
118
127
124
82
104
132
134
83
92
108
96
100
92
115
76
91
102
81
95
141
81
80
106
84
119
113
98
75
68
98
115
106
95
100
85
94
106
119
● ●
Refer to the table below To construct a stem-and-leaf display: ○ (1) We arrange the leading digits of each data value to the left of a vertical line ○ (2) To the right of the vertical line, we record the last digit for each data value
6
89
7
233566
8
01123456
9
12224556788
10
002455578
11
2355899
12
4678
13
24
14
1
Frequency Histogram ● A FREQUENCY HISTOGRAM is a more compact summary of data than a stem-and-leaf diagram ● To construct a frequency distribution, we must divide the range of the data into intervals, which are usually called class intervals, cells, or bins
Aptitude Bin
Frequency
Aptitude Bin
Frequency
75
6
1-75
6
90
10
76-90
10
105
15
91-105
15
120
12
106-120
12
135
6
121-135
6
136
1
136 +
1
● ●
An UNGROUPED FREQUENCY TABLE is when there is only one value RELATIVE FREQUENCY is the total number of frequency divided by the frequency
Constructing a Histogram (Equal Bin Widths) 1. Label the bin (class interval) boundaries on a horizontal scale 2. Mark and label the vertical scale with the frequencies or relative frequencies 3. Above each bin, draw a rectangle where height is equal to the frequency (or relative frequency) corresponding to that bin Aptitude Bin
Frequency
1-75
6
76-90
10
91-105
15
106-120
12
121-135
6
136 +
1
Line Graph ● A LINE GRAPH shows how a variable changes over time. This line graph can be used for quantitative data. It shows the values of a variable over a given interval of time ● Below shows the example of: The number of employees of a computer games company over nine years were: Year
Employees
1992
7
1993
15
1994
38
1995
112
1996
149
1997
371
1998
371
1999
508
2000
422
Charts: Bar Charts, Pie Charts, and Plotting in Excel ● Charts and graphs are common features on the business pages in most daily newspapers, journals, internet sites, etc. Charts may be drawn for numeric and categorical data either manually or by using packages such as Microsoft excel ● Qualitative data use: pie charts, bar charts ● Quantitative data use: stem-and-leaf displays, histograms ● On one axis of the graph (usually the horizontal axis), we specify the labels that are used for the classes (categories) ● A frequency, relative frequency, or percent frequency scale can be used for the other axis of the chart (usually the vertical axis) ● Then, using a bar of fixed-width drawn above each class label, we extend the length of the bar until we reach the frequency, relative frequency, or percent frequency of the class
● For categorical data, the bars should be separated to emphasize the fact that each class is separate Bar Charts ● A BAR CHART is a graphical tool to display categorical data summarized in a frequency, relative frequency, or percent frequency distribution Example: Bar Charts ● Example 1: The statistics relating to the England-Ecuador match in the defensive phase for the 2006 World Cup are given in the following table. Use a suitable plot to compare the performance of the teams and comment England
Defensive Phase
Ecuador
36
Header
26
32
Balls won
32
2.46
Balls won/Fouls committed
1.28
22
Successful tackles
14
59
% of successful tackles
38
25
Interceptions
24
1
Saves
3
●
England scored higher in all categories, except for saves (this is the analysis)
●
Example 2: Coke, Diet Coke, Dr. Pepper, Pepsi, and Sprite are five popular soft drinks. Assume that the data in the table show the soft drink selected in a sample of 50 soft drink purchases. Draw the frequency distribution
Coke
Pepsi
Sprite
Pepsi
Pepsi
Soft Drink
Frequency
Relative Frequency
Coke
Diet
Pepsi
Coke
Coke
Coke
19
0.38
Coke
Coke
Pepsi
Pepsi
Diet
Sprite
5
0.10
Dr. P
Coke
Pepsi
Diet
Pepsi
Pepsi
13
0.26
Coke
Sprite
Coke
Coke
Coke
Diet Coke
8
0.16
Coke
Diet
Coke
Pepsi
Sprite
Dr Pepper
5
0.10
Coke
Dr. P
Coke
Diet
Dr. P
Dr. P
Dr. P
Pepsi