Hypothesis Testing notes with examples PDF

Title Hypothesis Testing notes with examples
Course Quantitative Business Analysis
Institution University of Strathclyde
Pages 11
File Size 556.4 KB
File Type PDF
Total Downloads 40
Total Views 144

Summary

Hypothesis testing exercises with examples...


Description

University of Strathclyde MS922D:1 Quantitative Business Analysis 1 T7. Hypothesis Testing & Contingency Tables Introduction This section will explore two topics. Firstly, through hypothesis testing we will consider such problems as assessing whether a specific probability distribution is an adequate model for a population. Secondly, through contingency tables we will consider hypothesis testing with categorical data, such as whether or not there are significant differences between genders or nationalities. The objectives within this section are: • • •

Understand the steps involved in hypothesis testing Understand what is meant by Type I and Type II errors Understand how to test for independence

Steps in Constructing a Hypothesis Test A statistical hypothesis is a statement about a parameter in a probability distribution, which describes some characteristic of the population of interest. Hypothesis testing begins with a statement of the hypothesis that you wish to test and an alternative hypothesis that you wish to test it against. For example, we may have a claim that the proportion of defective items from a manufacturing process is 20%. If we wish to confirm or refute this, we could state that the hypothesis we are testing is that the proportion of defectives is 20%. We need to specify what we are going to test this against? I know, you are probably thinking, "against it not being 20%" and we can do this. However, we may only be interested in knowing that that the proportion of defectives is not greater than 20% and as such, the alternative hypothesis is that it is not greater than 20%. As such, sometimes we do two sided alternatives (e.g. not 20%) and sometimes we do one sided (e.g. greater than 20%). A simple explanation of the process of hypothesis is to assume that your hypothesis is true, we call this the null hypothesis and then determine a reasonable range of values for a particular statistic under this assumption, then go and collect the data and evaluate the statistic to see if it lies in this reasonable range of values. If it does, the data supports the hypothesis, if it falls outside then it does not. Reasonable range of values That very quick description provokes many questions; the first of which we want to answer is what is a reasonable range of values? This is answered through the determination of a significance level. We are assessing populations, such that are observations are subject to variation. As such, by pure chance we can obtain data that is not representative of the population. In our manufacturing process example, even if the defective rate is 20%, there is a chance, albeit very small that we could observe in a sample of 100 all defectives. However, if you witnessed 100 defectives in a sample of 100 items you would be more inclined to believe that the defective rate was much higher than 20%. As such, we need to assess a reasonable interval in which we are likely to observe our statistic of interest if the null hypothesis is true. Likely is the key to defining that reasonable interval. We can define an interval such there is a 90% chance that the test statistic is in the interval, we would call this a 10% significance level, or an bigger interval such that the test statistic had a 95% chance of falling in the interval, this we would call a 5% significance level. The significance level measures the probability that, if the null hypothesis were true, the test statistic falls outside the reasonable interval.

Example: Defects Let's re-visit our manufacturing example. What would be a reasonable interval to construct if our test statistic were the number of defectives in a sample of 100 items chosen at random and we wanted a one-sided test, where the alternative hypothesis is that the defective proportion is greater than 20%?

We can see from these plots that if the defective rate is 20% then there is a 94.4% chance of observing 26 or fewer defectives. There is a 96.6% chance of observing 27 or fewer defectives. So we make a choice for our reasonable region to be 26 or fewer and this corresponds to a significance level of 5.56%. After determining this all we need to do is take our random sample of 100 and notice if the number of defectives lies in out interval or not. If we were interested in a two-sided alternative hypothesis we would construct an interval such that there was 2.5% chance (or close to it) of the number of defectives being above the region (assuming the null hypothesis) and a 2.5% chance of the number of defectives being below the region. Exercise 1: Biscuit Machine

The foreman on the shop floor, where machines produce chocolate biscuits, suspects that one machine has been set wrongly so that it no longer produces biscuits with, on average, the same weight as the other machines. He takes a random sample of 60 biscuits from the output of the suspect machine (let's call this machine, machine 1). These have a mean weight of 12.7 gms and a standard deviation of 0.9gms. A random sample of 140 biscuits from the other machines has a mean weight of 13 gms and a standard deviation of 0.5 gms. •

Test the hypothesis that the suspect machine is producing biscuits, on average, lighter than the other machines. Use a 5% significance level.

Solution: Biscuit Machine The hypotheses for this test are as follows: Null hypothesis H0: The average weight of biscuits from machine 1 is the same as the average weight of biscuits from the other machines H0: μ1=μ2 Alternative hypothesis H1: The average weight of biscuits from machine 1 is less than the average weight of biscuits from the other machines H1: μ130), we use a normal distribution with the following mean and standard error.

As the significant value, -0.096, is still smaller than zero, we can reject H0. Thus the machine 1 is producing biscuits, on average, lighter than the other machines. Note that the case of comparing the means of two small samples (less than 30) is beyond the course scope.] A Tale of Two Errors When we conduct hypothesis test we are subject to two different types of errors. A Type I error occurs when we reject the null hypothesis when it was true. This occurs when we obtain a test statistic outside our region. The probability of this occurring is measured by the significance of the test we are conducting. The notion of Type I errors might lead you to think that we should have big regions and hence reduce the likelihood of this occurring. Hold that though for a moment! We have to draw the line somewhere, or we'll be accepting any observation as support for our hypothesis. As we increase the size of region we decrease the likelihood of a Type 1 error but increase the likelihood of accepting the null hypothesis when it is not true. This error is called a Type II error. Let's re-visit our example with the manufacturing process. We define a region for a one-sided hypothesis test to be 26 or fewer defects from a sample of 100 and we would accept the null hypothesis that the proportion of defects is 20%. Assuming that the defective rate was 30% what is the probability that we could observe 26 or fewer defects?

Example: Defects - cont'd

There is in fact a 22.4% chance that a process with a defective rate of 30% will result in 26 or fewer defectives. Illustrated in the figure is the probability of a Type II error for various proportions for this situation. Example: Defects - cont'd 2

We can see that the test discriminates very well against proportions in excess of 0.3 but there is a noticeable probability of a Type II error for proportions between 0.2 and 0.3. Exercise 2: Normal

Work it out for a Normal A random sample of 100 people is to be selected and their salaries recorded. We wish to test the hypothesis that the mean salary is £25000. i. Assuming the population standard deviation is £10000 construct a two-sided accept region with a 5% significance level. ii. What is the probability of a type 2 error, testing against a mean of £22500? Solution: Normal The hypotheses are as follows: Null hypothesis: mean salary = £25000 Alternative hypothesis: mean salary ≠ £25000 i. Accept Region A two-sided accept region would be:

If the mean salary of the sample of 100 people is within this region, then we accept the null hypothesis and conclude that the data is consistent with a population mean salary of £25000. ii. Probability of an average of 100 normally distributed random variables with a mean of 22 500 and a standard deviation of 1000 being greater than 23040 is 0.3. This is calculated as follows: P(Z> (23040 - 22500)/1000) = P(Z>0.54) = 0.2946 ≈ 0.3 Testing Goodness of Fit We have covered hypothesis testing for parameters within models. Now we turn our attention to testing models against data. We can assess the goodness of fit for a probability distribution against a set of data as a hypothesis test, where the null hypothesis is that the proposed distribution in question is a true refection of the distribution of the population from which this sample was chosen. As in the previous section, we determine a significance level, an appropriate test statistic, assess its distribution to determine the acceptable region, then calculate the statistic and test the hypothesis. There are a few different approaches to assessing the goodness of fit of a probability model to data. We will consider the Pearson Chi Squared test, which is appropriate for situations where are data is recorded in intervals. Even if your data is not intervals you can always make it so. Pearson Test Pearson

Test

For this test we start with our proposed model as the null hypothesis. Using this distribution we estimate the number of observations we would expect to obtain for specified intervals for a sample size equal to the one that we are assessing the distribution against. These intervals may be decided based on the data with which we are working or chosen arbitrarily. We then compare this with the observed frequencies within each interval. The test statistic that is used considers the squared differences between the observed frequencies and the expected number of observations for each interval dived by the expected number of observations. The formula is:

Where: • • •

k the number of intervals th O ¡ the number of observations in the i interval E ¡ the expected number of observations in each interval Under the assumption of the null hypothesis, i.e. that the underlying distribution is correct this statistic has a distribution with k-1 degrees of freedom. What is a

distribution?

The distribution is a probability distribution. It has 1 parameter, which is referred to as the degrees of freedom. The distribution comes about as the sum of squared independent normal random variables. Consider the following: Let Z1, Z2, ... ,Zn be independent normal random variables each with a mean of 0 and a standard deviation of 1. Then:

has a

distribution, which has the following density:

Which is in fact as special case of a Gamma distribution. Its mean is k and its variance is 2k. As such, both the mean and the variance grow linearly with the degrees of freedom. Exercise 3: Call Centre 16 observations were made on inter-arrival times at a call centre. The data was recorded in intervals. This is summarised in the Table below.

Intervals (minutes)

Number of Observations

0-20

2

21-50

9

51-80

1

81-100

1

101-500

3

Test the Exponential distribution with a mean of 100 minutes against this data at the 1% significance level.

Solution: Call Centre The null hypothesis is that the data has an Exponential distribution with a mean of 100, and the alternative hypothesis is that is does not. The test statistic is 12.53. See Excel spreadsheet for details. From the chi squared tables, the critical value for a 1% significance level and 4 degrees of freedom is 13.2767. The test statistic is less than the critical value, therefore we accept the null hypothesis at the 1% significance level, and conclude that the data is consistent with an Exponential distribution with a mean of 100. Contingency Tables The last topic for this section is contingency tables. We use these to explore the differences between categorical data, e.g. gender or nationality. Consider the following problem. There are two factories and we wish to assess whether one has a significantly higher defective rate than another. We collect data on the number of items produced that were defective and good for both factories. We organise the data into a table as below.

Factory 1

Factory 2

Defects

D

D

Good

G

G

Where: • • • •

G number of good items produced in Factory 1 G number of good items produced in Factory 2 D number of defective items produced in Factory 1 D number of defective items produced in Factory 2

We wish to test whether or not the probability an item is defective is independent of which factory in which it was manufactured. To do this we construct the following statistics (as shown in the following animation). Where: • • • • •

D total defects G total good items T total items produced in factory 1 T total items produced in factory 2 T total number of items produced

From these summaries we calculate the following statistics.

Which are the proportion of defects in total and proportion of good items produced in total. Essentially, what we are asking is if this is an acceptable probability to use for both factories in assessing the likelihood of an item being produced that is defective or good.

If this were the true probability how many defectives would we expect to be produced in each factory? How many good items would we expect to be produced? For factory 1:

For factory 2:

We use these estimates to construct the following table.

F1

F2

D

G

Where in each cell we have the squared difference between the observed frequency and the expected under the null hypothesis divided by the null hypothesis. Under the null hypothesis this statistic has a

distribution with 1 degree of freedom.

After obtaining the test statistic, we would compare it to the critical value associated with the relevant significance level and degrees of freedom. In general, for bigger tables with many factors, these contingency tables will have (r-1)(c-1) degrees of freedom where c is the number of columns and r is the number of rows.

Exercise 4: Political Party Preference An experiment was conducted to determine whether gender was related to political party preference. The following data was collected on which party people preferred. There are three political parties, namely Labour, Lb-Dem and Conservative.

Labour

Lib-Dem

Conservative

Male

90

70

20

Female

40

80

100

Conduct a Pearson's test to test the significance of the association between gender and political party preference at the 5% significance level. Solution: Political Party Preference Null hypothesis: Political party preference and gender are independent Alternative hypothesis: Political party preference and gender are not independent

OBSERVED DATA: Labour

Lib-Dem

Conservative

Sum

Male

90

70

20

180

Female

40

80

100

220

130

150

120

400

EXPECTED:

The test statistic is

Labour

Lib-Dem

Conservative

Male

58.5

67.5

54

Female

71.5

82.5

66

= 69.93 with 2 degrees of freedom.

At a 5% level, the critical value is 5.99. Since the test statistic is greater than the critical value, it is indeed significant at the 5% level. We reject the null hypothesis and conclude that there is an association between political party preference and gender.

Exercise 5: Siblings A sample of 200 siblings was taken. The older and younger siblings were asked separately whether they preferred classical, folk or rock music. The following data were recorded (so that for example, in six sibling couples both enjoy folk music):

Younger\Older

Classical

Folk

Rock

Classical

15

6

10

Folk

11

10

20

Rock

23

15

90

A contingency table is said to be symmetric if the probability of a response being in row i and column j, i.e. π ij, is equal to the probability of a response being in row j and column i, i.e. π ji. To test whether a contingency table is symmetric we apply Pearson's c is the number of columns.

test with c(c-1)/2 degrees of freedom, where

i. Conduct a test for symmetry with the data provided at the 5% significance level. We can also test from symmetry and independence simultaneously. To do this, we apply Pearson's 1) degrees of freedom. ii. Conduct a test for symmetry and independence with the data provided at the 5% significance level. Siblings – Solution OBSERVED : Younger\Older

Classical

Folk

Rock

Classical

15

6

10

Folk

11

10

20

Rock

23

15

90

Classical

Folk

Rock

15

8.5

16.5

i) Symmetry EXPECTED: Younger\Older

Classical

test with c(c-

Folk

8.5

10

17.5

Rock

16.5

17.5

90

Null hypothesis: contingency table is symmetric Test statistic:

= 7.31 with 3 degrees of freedom.

Critical value for 3 df is 7.81 at the 5% significance level. Conclusion: Test is not significant at the 5% level, thus we accept the null. ii) Symmetry & Independence EXPECTED: Younger\Older

Classical

Folk

Rock

8

7.2

15.19

Folk

7.2

6.48

22.32

Rock

24.8

22.32

76.88

Classical

Null hypothesis: contingency table is symmetric and independent Test statistic:

= 24.09 with 6 degrees of freedom.

Critical value for 6 df is 12.59 at the 5% significance level. Conclusion: Test is significant at the 5% level, thus we reject the null. Summary In this section we have explored two topics. • •

Hypothesis testing - e.g. to assess whether a specific probability distribution is an adequate model for a population. Contingency tables - e.g. to assess whether or not there are significant differences between categories (e.g. genders or nationalities).

You should now go through the Hypothesis Testing And Contingency Tables Exercises. You are expected to post all questions and problems in the first instance to the discussion forum....


Similar Free PDFs