STAT1600 Excel Analysis Tool Pak PDF

Title STAT1600 Excel Analysis Tool Pak
Author Alison Wu
Course Statistics: Ideas and Concepts
Institution The University of Hong Kong
Pages 32
File Size 2.7 MB
File Type PDF
Total Downloads 74
Total Views 147

Summary

excel analysis...


Description

STAT1600 Getting Started with Excel Microsoft Excel is a spreadsheet program that allows users to store, organize, and analyse data. After inputting data, Excel can be used to store the data and perform financial, mathematical or statistical calculations. Getting Started with Excel Before introducing the Analysis ToolPak, it is useful to first understand the interface of Excel.

Excel Statistics Tools (Analysis ToolPak) The Analysis ToolPak is an Excel add-in program, which contains data analysis function for financial, statistical and engineering data analysis. To load in the Analysis ToolPak add-in, follow the following steps. 1. Open a new workbook. Click on the “File” tab, then the “Option” button.

1

2. A dialog box will pop up. Under the tab “Add-Ins”, choose “Analysis ToolPak” and click the “Go” button.

3. Check the “Analysis ToolPak” box, and click the “OK” button.

4. Click on the “Data” tab, and you will see a button called “Data Analysis”.

2

5. Click on the “Data Analysis” button. A dialog box will pop up, and a statistical analysis tool can be chosen from the list.

6. The Analysis ToolPak in Excel includes the following statistical analysis functions: 

ANOVA: Single Factor



ANOVA: Two-Factor With Replication



ANOVA: Two-Factor Without Replication



Correlation



Covariance



Descriptive Statistics



Exponential Smoothing



F-Test Two-Sample for Variances



Fourier Analysis



Histogram



Moving Average



Random Number Generation



Rank and Percentile



Regression



Sampling



T-Test: Paired Two Sample for Means



T-Test: Two Sample Assuming Equal Variances



T-Test: Two Sample Assuming Unequal Variances



Z-Test: Two Sample for Means

Reference Excel Easy (2014). Analysis ToolPak. http://www.excel-easy.com/data-analysis/analysis-toolpak.html. (Retrieved 29 September 2014).

3

ANOVA: Single Factor Single Factor ANOVA is used to decide whether the differences in the samples' averages across different groups are significant enough to conclude that the population averages between these groups are unequal. In other words, the following set of hypotheses will be tested: Null hypothesis (฀฀0 ): The means are equal vs Alternative hypothesis (฀฀1 ): not all means are equal. The following example illustrates how a Single Factor ANOVA can be performed in Excel. As an example, some sample scores of staff performances from 4 different stores are given. A Single Factor ANOVA is performed to decide whether the mean scores are equal for all stores. 1. Input data into the cells.

2. Under the “Data” tab, click on the “Data Analysis” button. Choose “Anova: Single Factor” at the pop-up dialog box and press “OK”. 3. Input the data range (in which cells are the data stored). Depending on how the data are grouped, check the box “columns” or “rows”. In this case the data are grouped by stores, so “rows” is selected. Under “Output Options”, check the box “Output Range” and select a cell to output the results. Then press “OK”.

4

4. Since the output cell is B7, the results are shown in the same workbook, from cell B7 onwards. The following figure shows the output.

5. Hypothesis Testing The hypothesis testing carried out here is actually an F-Test. We will reject ฀฀0 if the sample F-statistic is greater than the critial F value. In this example, from the ANOVA table, the sample F-statistic (1.34) is smaller than the critical value (3.34). Therefore, ฀฀0 cannot be rejected at the 5% level of signficance.

5

ANOVA: Two-Factor With Replication The Two-Factor (or Two-Way) ANOVA is an extension to the Single Factor ANOVA. In the two-factor ANOVA, there are two independent variables (hence the name Two-Factor), and given the sample, it is to decide a) whether each individual factor will affect the result, and b) if there exists any interaction effect between the two factors. The respective hypotheses will be a) ฀฀0 : There is no main effect for factor A (B). vs ฀฀1 : There exists main effect for factor A (B). b) ฀฀0 : There is no interaction between the two factors. vs ฀฀1 : There is interaction between the two factors. In two-factor ANOVA with replication, there are more than one observations under each treatment. As an illustration, suppose there are 2 machines and 3 operators in a factory. 3 sample output quantities for each combination of machine and operator are collected. 1. Input the data together with the factors into the cells.

2. Under the “Data” tab, click on the “Data Analysis” button. Choose “Anova: Two-factor with replication” at the pop-up dialog box and press “OK”. 3. For input range, select the whole data set together with the factors. In this example, since there are 3 samples under each combination of factors, enter “3” in the “Rows per sample”. Select the Output Range and press OK.

6

4. The following figure shows the output.

5. Hypothesis Testing (a) In this example, from the first two rows of ANOVA table, the sample F-statistics for both row and columns (13.5, 16.74) are greater than their respective critical values (4.75, 3.89). Therefore, ฀฀0 is rejected in both cases at the 5% level of signficance, i.e. both “machine” and “operator” have main effects on the output quantities. (b) From the third row of ANOVA table, the sample F-statistic for interaction effect (8.82) is greater than the critical value (3.89). Therefore, ฀฀0 is rejected at the 5% level of signficance, i.e. there exists interaction effect between the factors “machine” and “operator” .

7

ANOVA: Two-Factor Without Replication This is almost the same as ANOVA: Two-Factor With Replication. There are only two differences. Firstly, in twofactor ANOVA without replication, there is only one observation under each combination of factors. Secondly, we have to assume that the two factors have no interaction effect, which means that we cannot test for the interaction effect between the two factors. Consider the following hypothetical example, in which the electrical conductivity of a chemical compound is measured under different temperatures and copper contents. 1. Input the data together with the factors into the cells.

2. Under the “Data” tab, click on the “Data Analysis” button. Choose “Anova: Two-factor without replication” at the pop-up dialog box and press “OK”. 3. Input the data range (remember to select also the factors) and select the output range. Then press “OK”.

8

4. The figure below shows the output.

5. Hypothesis Testing Hypothesis testing can then be carried out as usual. The only point to note here is that only the main effects for each of the two factors can be tested under Two-Way ANOVA Without Replication.

9

Correlation Using this function, Excel calculates the correlation coefficients (an indication of linear relationship) between different variables, which can help determine which variables tend to move together, in what direction, and the strength of this relationship . See the following example. 1. Input the data.

2. Under the “Data” tab, click on the “Data Analysis” button. Choose “Correlation” at the pop-up dialog box and press “OK”. 3. Input the data range and select a destination for the output. Depending on how the data are grouped, select either “columns” or “rows” under “Grouped By”. Then press “OK”.

4. The left panel below shows the output from the Covariance function. It can be seen that A and B are positively correlated (0.997). Variables A and C do not show a strong correlation (0.108), so as B and C (0.103). This can be shown graphically by plotting a line graph (right panel).

10

Covariance The covariance between two variables (same or different) can be calculated by this function, which can help determine which pairs of variables have the tendency to move together, and in what direction. Consider the same data set as in the previous section (Correlation). 1. Input the data.

2. Under the “Data” tab, click on the “Data Analysis” button. Choose “Covariance” at the pop-up dialog box and press “OK”. 3. Input the data range and select a destination for the output. Depending on how the data are grouped, select either “columns” or “rows” under “Grouped By”. Then press “OK”.

4. The output is as follows. Variables A and B tends to move in the same direction, so as A and C, and B and C. This is in agreement with what is being observed using the Correlation function.

11

Descriptive Statistics This function generates a report of descriptive statistics for inputted data.

Consider the following set of data

consisting of the test scores of 10 students. 1. Input the scores in the cells.

2. Under the “Data” tab, click on the “Data Analysis” button. Choose “Descriptive Statistics” at the pop-up dialog box and press “OK”. 3. Input the data range and choose the Output Range. Also, choose the appropriate box to indicate how your data is grouped (by column in this example). Check the box for “Summary Statistics”, and press “OK”.

4. The output is as follows. Various descrptive statistics can be seen from the table, including mean, standard deviation, kurtosis and skewness.

12

Exponential Smoothing Using this function, Excel predicts a value based on the forecast for the prior period and the damping factor (1 α). Consider the following example. 1. Input the actual data for different periods.

2. Under the “Data” tab, click on the “Data Analysis” button. Choose “Exponential Smoothing” at the pop-up dialog box and press “OK”. 3. Select the input range (which is the actual data) and input the damping factor (1–α). Select the output range and check the box for “Chart Output” (a chart will be displayed in the output). Then press “OK”.

4. The following figure shows the output. Note that by changing the damping factor, different forecast values and forecast curves can be obtained.

13

F-Test Two-sample for Variances This function is used to decide whether two data sets have significantly different variances by conducting an F-test. As an illustrative example, after obtaining tests scores from two different classes, it is required to decide whether variances of test scores are significantly different between the two classes. 1. Input the two sets of data.

2. Under the “Data” tab, click on the “Data Analysis” button. Choose “F-Test Two-sample for Variances” at the popup dialog box and press “OK. 3. Input the data range for the two sets of data (class A and class B in this example). Choose a suitable alpha level (significance level). Select the output range, and press “OK”.

4. The output is as follows. Notice that the sample F-statistic is smaller than the critical value in this example, so ฀฀0 cannot be rejected at the 5% level of significance, i.e. the variances of test scores are equal for the two classes.

14

Fourier Analysis Fourier analysis is the study of how general functions can be expressed or approximated by the sums of simple trigonometric functions. This example illustrates how to perform Fourier analysis in Excel. 1. Input the data in a single column. Note that number of data points must be in the power of 2 (2, 4, 8, 16, etc.) with a maximum of 4096. If the number of your data points is not enough (say, 9 or 18), insert extra zeroes into the data set to make up to enough data points, so that Fourier analysis can be used.

2. Under the “Data” tab, click on the “Data Analysis” button. Choose “Fourier Analysis” at the pop-up dialog box and press “OK”. 3. Input the data range and select the output range. Check the box “inverse” if required. Then press “OK”.

4. The output is as follows. The “I” here refers to complex numbers. The green triangle is the reminder from Excel to convert the text to number.

15

Histogram This example illustrates how to create histograms in Excel. 1. Enter the data to be used as input for the Histogram tool in a single column. 2. In another column, enter the bin numbers (Upper Boundaries) to be used for the analysis. The bin numbers have to be entered in ascending order.

3. Under the “Data” tab, click on the “Data Analysis” button. Choose “Histogram” at the pop-up dialog box and press “OK”. 4. Select the data range and the Bin range. Decide the Output Range and check the box “chart output”. Then press “OK”.

5. A histogram, together with a frequency distribution table, is being generated.

16

Further Formatting 6. To delete the default legend, right-click on the legend on the right, and select “Delete” in the pull-down menu.

7. If required, rename the title of the histogram by clicking on the title. 8. To change the spaces between the bars, right click any one of the bars and select “Format Data Series”. Under the tap “Series Options”, adjust the gap width. 9. A border colour can also be added to the bars, under the tap “Border Color”.

10. The following figure shows the output after formatting the histogram.

17

Moving Average This example illustrates how to use the moving average tool in Excel, which is useful in observing trends and producing forecasts. 1. Input the data for analysis.

2. Under the “Data” tab, click on the “Data Analysis” button. Choose “Moving Average” at the pop-up dialog box and press “OK”. 3. Select the data range and interval. Interval here refers to the number of past data points Excel will use to calculate the moving average. In this example, 5 would mean that Excel will use the previous 5 data points to calculate the moving average for each subsequent point. Next, select the Output Range, and check the boxes for “chart output” and “standard errors” if necessary (here only the box for “Chart Output” is checked). Then press “OK”.

4. The following figure shows the output. Notice that there is no moving averages for the first four years, as there are insufficient data points to calculate the moving average for these four years.

18

Random Number Generation The Random Number Generation function can be used to generate random numbers in Excel. This example demonstrates how to generate 5 random numbers from a uniform distribution in Excel. 1. Under the “Data” tab, click on the “Data Analysis” button. Choose “Random Number Generation” at the pop-up dialog box and press “OK”. 2. Input the number of variables (how many sets of random numbers we would like to generate) and number of random numbers (how many random numbers to be generated in each set). Specify the distribution from which the random numbers would be generated. Select the output range and press “OK”.

3. The following figure shows the output. 5 random numbers from the uniform distribution U(0, 1) are generated.

19

Rank and Percentile The Rank and Percentile function can be used to rank a set of data in descending order, and calculate the percentiles for each data point in the data set. For example, it is required to rank the marks of students in the class. 1. Input the data for analysis.

2. Under the “Data” tab, click on the “Data Analysis” button. Choose “Rank and Percentile” at the pop-up dialog box and press “OK”. 3. Select the data range and output range. Depending on how the data are grouped, select either “columns” or “rows” under “Grouped By”. Then press “OK”.

4. Here is the function output. Notice that the marks have been ranked in descending order, and the percentiles for every data point are calculated and shown together.

20

Regression Linear regression attempts to relate a response variable to one or more explanatory variables by finding out the “best-fit” equation. This example illustrates how to perform linear regression analysis in Excel. Here it is assumed that there is only one explanatory variable X (the linear regression in this case is also called simple linear regression). 1. Input the data for both X (explanatory variable) and Y (response variable).

2. Under the “Data” tab, click on the “Data Analysis” button. Choose “Regression” at the pop-up dialog box and press “OK”. 3. Select the ranges of Y and X data respectively. Check the box for “confidence level” and select a suitable level (for performing hypothesis testing).

Also, select the output range, and check the boxes for residuals analysis

whenever necessary (for example, if to have a list of residuals, check the box “residuals). Then press “OK”.

21

4. The following shows the output of the function. The first table shows the summary statistics of the regression, including the R-Square (R2), which measures how good the regressed line can fit in the original data points. The second table is the ANOVA table, which provides a breakdown of variances into those explained and unexplained by regression. The third table provides the estimated intercept and coefficient of X for the linear equation, together with the associated t- statistics (for hypothesis testing) and confidence intervals.

22

Sampling A random sample is a sample randomly selected from the data set, i.e. simple random sampling. A periodic sample is a sample in which Excel chooses the n-th data in the data set to include in your sample, similar to systematic sampling. This example illustrates how to select periodic sample and random sample from a data set using Excel. Notice that the sample size has to be predetermined. 1. Input the data set into Excel. The data can be entered into rows or columns, but be aware that the number of rows and columns should be even numbers.

2. Under the “Data” tab, click on the “Data Analysis” button. Choose “Sampling” at the pop-up dialog box and press “OK”. 3. For input range, select the entire data set. For sampling method, select either periodic or random sampling. If periodic sampling is chosen, enter the period n (every nth data in the data set will be extracted). If random sampling is chosen, enter the sample size to be extracted from the data set. Press “OK” after selecting the output range.

23

4. The following shows the output when a periodic sampling with a period of 18 is selected. It can be seen that every 18th data in the original data set (i.e. the last row of the data set) is chosen to be the sample.

24

T-Test: Paired Two Sample for Means This test is used to test whether the means between two groups of samples are significantly different from our belief or not. The main idea is to look at the sum of the differences between each pair of data coming from the two groups. It is usually applied on two samples representing the same group of subjects but under two different treatments. As the name suggests, we should also have pairs of data if we want to use this test. Consider the following trial in deciding whether the tutorial class is effective in raising their test scores. To test this, the test scores for five students before and after the tutorial class are obtained. 1. Input the data into the worksheet. Divide the scores into two columns “Before” and “After”, so that the differences between the two groups can be compared using the T-Test.

2. Under the “Data” tab, click on the “Data Analysis” button. Choose “T-Test: Paired Two Sample for Means” at t...


Similar Free PDFs