STAT 1401 - Chapter 1 PDF

Title	STAT 1401 - Chapter 1
Course	Discrete Math ematics
Institution	University of Winnipeg
Pages	8
File Size	475.8 KB
File Type	PDF
Total Downloads	54
Total Views	136

Preview

CLICK TO PREVIEW PDF

Summary

Sohail Khan...

Description

DATA COLLECTION & ITS GRAPHICAL PRESENTATION --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- UNIT 1 - LECTURE 1 What is Statistics? ● STATISTICS is a science of collecting, summarizing, analyzing, interpreting information (data) and drawing conclusions ○ NOTE: Example of test question - “is population a statistic?” etc. ● Statistics has 2 types: ○ DESCRIPTIVE i s the science of describing the important aspects of a set of measurements. It consists of methods for organizing and summarizing information ○ INFERENTIAL i s if the population is small enough, then we could take a census and not have to sample and make any statistical inferences ■ CENSUS is a complete count of a population ● DESCRIPTIVE STATISTICS includes the construction of graphs, charts, and tables and the calculation of various descriptive measures such as averages, measures of variation, and percentiles ○ ie. Describes data (for eg, a chart of the graph) ○ Helps to summarize and analyze data but does not help to conclude data ○ QUESTION: Can I use descriptive statistics on the population and sample? ■ A: Yes ● INFERENTIAL STATISTICS c onsists of methods which use sample information to conclude about the population ○ i.e. Allows you to make predictions (“inferences”) from that data ○ Inferential statistics is only needed if you don’t know the truth ○ Helps you infer or conclude what is unknown to you ○ The purpose is to use a subset of the population ● Statisticians analyze the information obtained from a sample of  the (eg., voting) population to  make inferences (draw  conclusions) about the preferences of the entire voting population (18+ individuals) ○ Inferential statistics provides methods for drawing such conclusions  Populations and Samples ● POPULATION is the entire group of individuals that we want information about ○ i.e. Values as the largest collection of values of a random variable for which we have an interest at a particular time ○ eg. We are interested in the weights of all the children enrolled in a certain country elementary school system ○ Populations could be infinite (values consist of a fixed number) or infinity (consists of an endless succession of values) ● A SAMPLE is  a group of individuals from the population that we examine in order to gather information ○ i.e. Part of a population ○ If I know the entire population, no use to take a sample out of it

  Example: Descriptive Statistics ● The table below shows data on 25 mutual funds. Descriptive statistics can be used to provide summaries of this data. For example, a tabular summary of the data for the categorical variable fund type is shown on the table. A graphical summary of the same data, called a bar chart, is shown in the figure. These types of tabular and graphical summaries generally make the data easier to interpret. From the table and figure, we can see that the majority of the mutual funds are of the Domestic Equity type. ● FREQUENCY i s the number of times a data value occurs (count) ● The percent is found by dividing the frequency with the total of data values (25)

  Example: Inferential Statistics ● Suppose a researcher wants to use the sample data to make an inference about the average hours of useful life for the population of all light bulbs that could be produced with the new filament ● Suppose he has a random sample of 200 light bulbs and the average lifetime for each light bulb is 76 hours ● We can use this sample result to estimate that the average lifetime for the light bulbs in the population is 76 hours ● This is an example of survival analysis ● What we would do instead: Take a random sample, let them run and base your results on 2 light bulbs to generalize the population  Example: Descriptive and Inferential Statistics ● You might stand in a mall and ask a sample of 100 people if they like shopping at Forever 21. You could make a bar chart of yes or no answers (this is descriptive statistics) or you could use your research (this is inferential  statistics) to reason that around 75-80% of the population (all shoppers in all malls) like shopping at Forever 21  Parameters and Statistics ● A PARAMETER is  a number that describes some aspect of the population ○ Usually unknown ○ Greek letter mu is used to calculate the average of the population ■ Generally not known because the population is not known ● A STATISTIC is  a number that is computed from the sample ○ Often used to estimate the unknown parameter ● If parameters are known to us, they are constants and we won’t need statistics ○ If they’re not known, we use statistics to estimate it ● X bar, x (statistic) is a variable because it will vary from sample to sample  Example: Descriptive vs. Inferential Statistics ● Example 1: 30 of the 198 students enrolled in statistics 1501 were asked if they wanted to test 2 to be a take-home or an in-class assessment. 20 or about 67% of the students polled indicated a preference for an in-class exam. The professor concluded that all students in statistics 1501 would prefer an in-class examination for the second assessment ● QUESTION: Did the professor perform a descriptive study or an inferential study? ● ANSWER: He performed an inferential study because he took a sample (30) of students from the total number of students (198)  Basic Terms ● An ELEMENT  OR MEMBER of a sample or population is a specific subject or object (eg. a person, firm, item, state, or country) about which the information is collected ● A VARIABLE  is a characteristic or condition that can change or take on different values ○ Most studies begin with a general question about the relationship between two variables for a specific group of individuals. The value of a variable for an element is called an O  BSERVATION OR MEASUREMENT ● A DATA SET is  a collection of observations on one or more variables  Variables and Data ● A VARIABLE is  a characteristic which may change its value while it is under observation ○ i.e. A characteristic that varies ○ eg. Height of a child, temperature across a province ● A variable can be quantitative (numerical) or qualitative (attribute, categorical) ● RANDOM VARIABLE are values obtained arise as a result of chance factors so that they cannot be exactly predicted in advance ● QUANTITATIVE (NUMERICAL) VARIABLE is meaningful numeric values that represent a measurable quantity ○ eg. Height, weight, etc.

●

○ There are different ways to sort quantitative data QUALITATIVE (ATTRIBUTE, CATEGORICAL) VARIABLE means that it does not take meaningful values ○ Categorical variable places an individual into one of the several groups or categories ○ You can assign numerical values to it but it has no meaning ○ eg. A sick person is given a medical diagnosis, a person designated as belonging to an ethnic group, place, or object

 ●

●

Quantitative variables can be: ○ DISCRETE which takes finite (or countably infinite values) ■ eg. Shoe size ○ CONTINUOUS takes infinite values ■ eg. Shoe length, time, height, weight Values taken by the variables are called DATA which  are raw materials of statistics ○ Data is numbers ○ Data is values taken from the variable ○ Data can be quantitative or qualitative ○ Look at variable FIRST (if the variable is quantitative, data is quantitative and vice versa if it were qualitative)

 Example: Types of Variables ● A (QUALITATIVE) CATEGORICAL/ATTRIBUTE VARIABLE places an individual into one of several groups or categories ○ Recall: You can assign numerical values but they have no mathematical meaning ○ A yes or no question (eg. Did you pay income tax) ○ eg. Your blood type (A, B, AB, O), your hair colour, your ethnicity, whether you paid income tax last year or not ○ The genders (male/female) of professional athletes, shirt numbers on professional athletes uniforms - substitutes for names ● A QUANTITATIVE VARIABLE takes  on numerical values on which arithmetic operations can be made ○ Recall: Numerical values have mathematical meaning ○ Taking about the amount (eg. How much did you pay income tax) ○ eg. How tall are you, your age, your blood cholesterol level, the number of credit cards you own  Distribution ● The DISTRIBUTION  of a variable describes what values the variable take on and how often  Example: Distribution of a Categorical Variable ● M&M’s plain chocolate candies: 30% browns, 20% each of yellows and reds, and 10% each of oranges, greens, and tans ● Colour = qualitative ● Number of coloured M&M = quantitative ● This is a colour distribution  Observational Studies and Designed Experiments ● An OBSERVATIONAL  STUDY observes individuals and measures variables of interest but does not attempt to influence the responses ○ ie. You observed, you record, you summarize, you analyze, you interpret - all on the basis of your operation  ○ Cannot establish cause and effect relationships ● A DESIGNED EXPERIMENT (or  simply EXPERIMENT) deliberately imposes some treatment on individuals in order to observe their responses ○ ie. The researcher gets involved actively and changes conditions to see what happens in response ○ Can establish cause and effect  Data Collection: Samples, Surveys, and Experiments ● Data can be through surveys (observational studies) or experiments

● ● ●

The goal of every statistical study is to collect data and then use the data to make a good decision Any decision you make using the results of a statistical study is only as good as the process used to obtain the data If the process is flawed, then the resulting decision will be questionable and meaningless

 Planning a Survey: ● (1) Objectives of the study ○ Why I want to do this study ○ What is the purpose ● (2) Define the target population ○ eg. The voting population is not everyone is Manitoba, only the ones able to vote (18+) ● (3) Sampling methods ● (4) Prepare questionnaire ● (5) Conduct a survey ● (6) Administer survey  Data Collection Types ● CENSUS is a survey that includes every member of the population ○ ie.Complete count of the population ● SAMPLE SURVEY is the technique of collecting information from a portion of the population ● REPRESENTATIVE SAMPLE i s a sample that represents the characteristics of the population as closely as possible ○ eg. If I wanted to say something about the entire class, my population should be the entire class  Random Sample vs. Non-Random Sample ● A RANDOM  SAMPLE is a sample drawn in such a way that each member of the population has some chance of being selected ● In a NON-RANDOM SAMPLE, some members of the population may not have any chance of being selected in the sample ● Random does not mean chaos, it means chance has to play a role  Application of Randomness to Statistics ● If each unit of the population has the same chance of selection (or if each sample of the same size from the same population has some chance of selection) then the sample is called S  IMPLE RANDOM SAMPLE ● SAMPLING WITHOUT REPLACEMENT i s when each sample unit of the population has only one chance to be selected in the sample ○ eg. A ballot draw, winner 1 is selected and their ticket is not put back before the next draw ● SAMPLING WITH REPLACEMENT i s when a sampling unit is drawn from a finite population and is returned to that population after its characteristic(s) have been recorded before the next unit is drawn ○ eg. A ballot draw, winner 1 is selected and their ticket is put back before the next draw ● “Each sample of the same size from the same population has the same chance of selection” ○ eg. A wedding with 320 people with 40 tables. A card is drawn from a deck and whoever has that card under their table may get food first  Sorting and Classifying Data ● It is necessary to start by sorting or classifying the data in some ways. There are two methods for sorting data: ○ (1) Sorting data into a stem-and-leaf plot ○ (2) Sorting data into a frequency distribution table ● A STEM-AND-LEAF PLOT is  a good way to obtain an informative visual display of a data set, which has at least two digits. To construct a stem-and-leaf plot, use the following steps: ○ (1) Divide each number into two parts: a STEM  , consisting of one or more of the leading digits, and a LEAF  , consisting of the remaining digit ○ (2) List the stem values in a vertical column to the left ○ (3) Record the leaf for each observation beside its stem on the right  Stem-and-Leaf Plot ● Example 1: Consider the data in the following table. These data result from a 150-question aptitude test given to 50 individuals recently interviewed for a position at a manufacturing company. The data indicate the number of questions answered correctly.  112

72

69

97

107

73

92

76

86

73

126

128

118

127

124

82

104

132

134

83

92

108

96

100

92

115

76

91

102

81

95

141

81

80

106

84

119

113

98

75

68

98

115

106

95

100

85

94

106

119

 ● ●

Refer to the table below To construct a stem-and-leaf display: ○ (1) We arrange the leading digits of each data value to the left of a vertical line ○ (2) To the right of the vertical line, we record the last digit for each data value

 6

89

7

233566

8

01123456

9

12224556788

10

002455578

11

2355899

12

4678

13

24

14

1

 Frequency Histogram ● A FREQUENCY HISTOGRAM is  a more compact summary of data than a stem-and-leaf diagram ● To construct a frequency distribution, we must divide the range of the data into intervals, which are usually called class intervals, cells, or bins 

 Aptitude Bin

Frequency

Aptitude Bin

Frequency

75

6

1-75

6

90

10

76-90

10

105

15

91-105

15

120

12

106-120

12

135

6

121-135

6

136

1

136 +

1



 ● ●

An UNGROUPED  FREQUENCY TABLE is  when there is only one value RELATIVE FREQUENCY is the total number of frequency divided by the frequency

 Constructing a Histogram (Equal Bin Widths) 1. Label the bin (class interval) boundaries on a horizontal scale 2. Mark and label the vertical scale with the frequencies or relative frequencies 3. Above each bin, draw a rectangle where height is equal to the frequency (or relative frequency) corresponding to that bin  Aptitude Bin

Frequency

1-75

6

76-90

10

91-105

15

106-120

12

121-135

6

136 +

1

 



Line Graph ● A LINE GRAPH shows  how a variable changes over time. This line graph can be used for quantitative data. It shows the values of a variable over a given interval of time ● Below shows the example of: The number of employees of a computer games company over nine years were:  Year

Employees

1992

7

1993

15

1994

38

1995

112

1996

149

1997

371

1998

371

1999

508

2000

422

  

 Charts: Bar Charts, Pie Charts, and Plotting in Excel ● Charts and graphs are common features on the business pages in most daily newspapers, journals, internet sites, etc. Charts may be drawn for numeric and categorical data either manually or by using packages such as Microsoft excel ● Qualitative data use: pie charts, bar charts ● Quantitative data use: stem-and-leaf displays, histograms ● On one axis of the graph (usually the horizontal axis), we specify the labels that are used for the classes (categories) ● A frequency, relative frequency, or percent frequency scale can be used for the other axis of the chart (usually the vertical axis) ● Then, using a bar of fixed-width drawn above each class label, we extend the length of the bar until we reach the frequency, relative frequency, or percent frequency of the class

● For categorical data, the bars should be separated to emphasize the fact that each class is separate  Bar Charts ● A BAR  CHART is a graphical tool to display categorical data summarized in a frequency, relative frequency, or percent frequency distribution  Example: Bar Charts ● Example 1: The statistics relating to the England-Ecuador match in the defensive phase for the 2006 World Cup are given in the following table. Use a suitable plot to compare the performance of the teams and comment  England

Defensive Phase

Ecuador

36

Header

26

32

Balls won

32

2.46

Balls won/Fouls committed

1.28

22

Successful tackles

14

59

% of successful tackles

38

25

Interceptions

24

1

Saves

3

 ●



England scored higher in all categories, except for saves (this is the analysis)

 ●

Example 2: Coke, Diet Coke, Dr. Pepper, Pepsi, and Sprite are five popular soft drinks. Assume that the data in the table show the soft drink selected in a sample of 50 soft drink purchases. Draw the frequency distribution







 Coke

Pepsi

Sprite

Pepsi

Pepsi

Soft Drink

Frequency

Relative Frequency

Coke

Diet

Pepsi

Coke

Coke

Coke

19

0.38

Coke

Coke

Pepsi

Pepsi

Diet

Sprite

5

0.10

Dr. P

Coke

Pepsi

Diet

Pepsi

Pepsi

13

0.26

Coke

Sprite

Coke

Coke

Coke

Diet Coke

8

0.16

Coke

Diet

Coke

Pepsi

Sprite

Dr Pepper

5

0.10

Coke

Dr. P

Coke

Diet

Dr. P

Dr. P

Dr. P

Pepsi