QADM PDF

Title QADM
Author Erum Bukhari
Course statistics
Institution University of Engineering and Technology Lahore
Pages 17
File Size 650.8 KB
File Type PDF
Total Downloads 1
Total Views 133

Summary

Complete assignment...


Description

1

1. Correlation Analysis 1.1. Definition ............................................................................................................................................ 03 1.2. Assumption of Correlation ............................................................................................................ 04 1.3. Bivariate Correlation ....................................................................................................................... 04 1.4. Correlation Coefficients: Pearson, Kendall, Spearman ......................................................... 04 1.5. Application of Correlations ......................................................................................................... 05 1.6. Strength of Correlations ............................................................................................................... 05 1.7. Limitation of Correlations ........................................................................................................... 06 1.8. Case Study of Correlations .......................................................................................................... 06 2. Regression Analysis 2.1. Definition ..........................................................................................................................................07 2.2. Objectives of Regression Analysis ............................................................................................08 2.3. Assumption of Regression Analysis .........................................................................................08 2.4. Comparison ......................................................................................................................................09 2.5. Simple Regressions Model ..........................................................................................................09 2.6. Assumptions of Simple Linear Regression .............................................................................10 2.7. Analysis of Variance Approach to Test the Significance of Regression .......................11 2.8. Degrees of freedom (df) ...............................................................................................................11 2.9. Mean Squared Errors ....................................................................................................................11 2.10 Significance F ................................................................................................................................11 2.11. R² (R Square) ................................................................................................................................12 2.12. Application ....................................................................................................................................12 2.13. Advantages and Disadvantages ...............................................................................................12 2.14. Limitations .....................................................................................................................................13 2.15. Case study of Simple Liner Regression Model .................................................................13

2

Correlation Analysis 1.1. Definition Correlation is a statistical measure that indicates the magnitude of two or more variables of the variables together. A positive correlation indicates the extent to which those variables are rising or falling; negative interaction indicates magnitude when one variable rise as the other decreases. When the variability of one variation relies on the same variation of another variation, it tends to be assumed that the change in one causes a change in the other. However, the relationship does not necessarily mean that it is the cause. There may be something unknown that influences both variables equally. Correlation is a mathematical method that can show how different pairs are related and how they are related. Although this combination is very clear your data may contain an unexpected combination. You may also suspect that there is a connection, but I do not know which is the strongest. Smart relationship analysis can lead to greater understanding of your data. Correlation is Positive or direct when the values increase together, and Correlation is Negative when one value decreases as the other increases, and so called inverse or contrary correlation.

 

a

b

c

d

e

If the points plotted were all on a straight line we would have perfect correlation, but it could be positive or negative as shown in the diagrams above, a. Strong positive correlation between x and y. The points lie close to a straight line with y increasing as x increases. b. Weak, positive correlation between x and y. The trend shown is that y increases as x increases but the points are not close to a straight line c. No correlation between x and y; the points are distributed randomly on the graph. d. Weak, negative correlation between x and y. The trend shown is that y decreases as x increases but the points do not lie close to a straight line e. Strong, negative correlation. The points lie close to a straight line, with y decreasing as x increases   

1 is a perfect positive correlation 0 is no correlation (the values don't seem linked at all) -1 is a perfect negative correlation

3

The value shows how good the correlation is (not how steep the line is), and if it is positive or negative. Usually, in statistics, there are three types of correlations: Pearson correlation, Kendall rank correlation and Spearman correlation. 1.2. Assumption of Correlation The use of combinations depends on certain basic concepts. The variables are considered to be independent, assuming they are randomly selected from the population; two variables are normal distributions; data integration is performed similarly (similarly), homoscedastic data has a common deviation common in different groups whereas heteroscedastic data has a common deviation common in different groups and it is assumed that the relationship between these two variables is consistent. The coefficient of integration is unsatisfactory and it is difficult to interpret the associations between the variables in the event that the data contains the output. Scatterplot tests can give an idea of whether the two differences are related to what their relationship direction is saying. But in itself it is not enough to determine whether there is a merger between the two variables. Relationships described in the spattering process need to be defined in a standard way. Descriptive statistics that show the level of relationship between variables called coefficients of integration. The most commonly used coefficient for combining Pearson correlation, Kendall rank correlation and Spearman correlation. The correction used to assess the presence of a direct relationship between two variables that gives some assumptions about data is satisfactory. The results of the analysis, however, need to be interpreted with care, especially when looking for causal relationships. 1.3. Bivariate Correlation Bivariate correlation is a measure of the relationship between the two variables; it measures the strength and direction of their relationship, the strength can range from absolute value 1 to 0. The stronger the relationship, the closer the value is to 1. Direction of The relationship can be positive (direct) or negative (inverse or contrary); correlation generally describes the effect that two or more phenomena occur together and therefore they are linked. 1.4. Correlation Coefficients Pearson, Kendall and Spearman Correlation are Bivariate analyzes that measure the strength of the interaction between two variables. In the calculation, the average value of the equation varies between +1 and -1. When the value of the coupling coefficient lies in about ± 1, then it is said to be the absolute level of cohesion between these two variables. As the integration partner's value goes to 0, the relationship between the two variables will be weak. Types of Correlations Pearson 𝑟 correlation: Pearson correlation is widely used in statistics to measure the degree of the relationship between linear related variables. For example, in the stock market, if we want to measure how two commodities are related to each other, Pearson correlation is used to measure the degree of relationship between the two commodities. The following formula is used to calculate the Pearson correlation coefficient 𝑟

4

𝑟 = 𝑛(∑𝑥𝑦 ) − (∑𝑥)(∑𝑦)/√[𝑛∑𝑥2 − (∑𝑥)2][𝑛∑𝑦2 − (∑𝑦)2] Kendall's Tau rank correlation: Kendall rank correlation is a non-parametric test that measures the strength of dependence between two variables. If we consider two samples, x and y , where each sample size is n, we know that the total number of pairings with x y is n (n1)/2. The following formula is used to calculate the value of Kendall rank correlation:

𝜏 =

𝑛𝑐 − 𝑛𝑑 1 2 𝑛(𝑛 − 1)

Spearman rank correlation: Spearman rank correlation is a non-parametric test that is used to measure the degree of association between two variables. It was developed by Spearman, thus it is called the Spearman rank correlation. Spearman rank correlation test does not assume any assumptions about the distribution of the data and is the appropriate correlation analysis when the variables are measured on a scale that is at least ordinal. The following formula is used to calculate the Spearman rank correlation coefficient: The Spearman correlation coefficient,, can take values from +1 to -1. A 𝜌 of +1 indicates a perfect association of ranks, a 𝜌 of zero indicates no association between ranks and a 𝜌 of -1 indicates a perfect negative association of ranks. The closer 𝜌 to zero, the weaker the association between the ranks.

ƿ = 1 − (6∑𝑑𝑖2 )/ (𝑛 (𝑛2− 1)) 1.5 Application of Correlations Predictability: If there is a relationship between the two variables, we can make predictions about each other. Performance: Legality at the same time (the relationship between a new standard and an established standard). Reliability :Reliability of re-testing (by consistent measures) and integrity between certain roads (viewers align). Verification of theory: Ability to guess 1.6 Strengths of Correlations 



The correlation allows the researcher to investigate natural phenomena that may or may not be experimental in nature. For example, it would be wrong to test whether smoking causes lung cancer. The correlation allows the researcher to clearly and easily see if there is a relationship between the variables. This can be illustrated. 5

1.7 Limitations of Correlations 



The correlation is not the same and the underlying cause cannot be taken for granted. Even if there is a very strong interaction between the two variables we cannot assume that one causes the other.For example, suppose we found a connection between watching violence on TV and violent behavior in youth. It is possible that the cause of all of this is a third (external) difference - for example, growing up in a violent home - and that TV watching and violent behavior are the result of this. The correlation does not allow us to go beyond the data provided. For example, suppose it was found that there was a correlation between the time spent on homework (1/2 hour to 3 hours) and the G.C.S.E number. passing (1 to 6). It would be illegal to assume that spending six hours on homework is likely to produce 12 G.C.S.E. it passes

1.8 Case study A group of 12 childern participated in psychological study designed to assess the relationship,if any,between age, x year and total average sleep time (ATST),y mintues,To obtain a measure for ATST,recording were taken on each child on five consecutive nights and then averaged. The result obtained are shown in the table CHILD A B C D E F G H I J K L

AGE (x) 4.4 6.7 10.5 9.6 12.4 5.5 11.1 8.6 14 10.1 7.2 7.9

ATST (y) 586 565 515 532 478 560 493 533 575 490 530 515

Tests of Normality Shapiro-Wilk Kolmogorov-Smirnova Statistic df Sig. Statistic df Sig. Age ATST

.084 .144

12

.200*

.990

12

1.000

12

*

.955

12

.707

.200

6

*. This is a lower bound of the true significance. a. Lilliefors Significance Correction H0= Data follow normal distribution HA= Data donot follow normal distribution Since P-value form Shapiro-Wilk test for both variable are greater than 0.05 (null hypothesis will be accepted) hence it is confirmed that data follow normal distribution Since data are normally distributed hence we will use Pearson correlation

Correlations Age Age

Pearson Correlation

ATST 1

Sig. (2-tailed) N ATST Pearson Correlation Sig. (2-tailed)

-.481 .114

12

12

-.481

1

.114

N 12 Correlation between age and ATST is -4.81 Strength= Weak Direction= negative Significance= insignificant

12

Regression Analysis

2.1. Definition Regression analysis is one of the most widely used mathematical methods in the social and behavioral and environmental sciences which involves identifying and examining the relationship between dependent variations and one or more independent variables, also called dynamic or descriptive. It is especially useful in testing and refining to confuse. The relationship model has been considered and parameter value estimates are used to improve the estimated value estimates. Various tests are used to determine if the model is satisfactory. If the model is considered satisfactory, the measured reversal ratio can be used to predict the number of separated values based on independent variables. Line regression examines relationships that can be easily defined by straight lines or their transformation in multiple sizes. A very large number of problems can be solved by linear 7

regression, and even more by the initial dynamics that results in an equitable relationship between the dynamic variables. Types of Regression Where there is one continuous reliable variation and one independent variation, the analysis is called a simple linear analysis. This analysis assumes that there is a direct correlation between these two variable. Multiple regression is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. Independent variations are factors that can be directly measured; these variables are also called predictors or descriptive variables that are used to predict or interpret the performance of a dependent variable. Dependent variation is a factor whose value depends on the values of the independent variance. Reliability and Validity: • Is the model logical? Is the model easy to understand and interpret? • Are all coefficients statistically significant? (p values less than -55) • Are the signals associated with the coefficients as expected? • Does the model predict values that are fairly close to real values? • Is the model reasonable enough? (R high square, low standard error, etc.) 2.2. Objectives of Regression Analysis Regression analysis used to define variability dependent using one or more independent or controlling variables and analyzing the relationship between variables to respond; the question of how much variability depends on the variability and variability of each independent variant, and predict or predict the number of variables depending on the values of the independent variance. The main purpose of the regression is to develop an equitable relationship between response variability and descriptive variables for predictive purposes, assuming that that a functional linear relationships are active, and approaches (functional regression) are superior.

2.3. Assumption of Regression Analysis The regression model is based on the following assumptions.  The relationship between independent variable and dependent is linear.  The expected value of the error term is zero  The variance of the error term is constant for all the values of the independent variable, the assumption of homoscedasticity.  There is no autocorrelation.  The independent variable is uncorrelated with the error term.  The error term is normally distributed.  On an average difference between the observed value (yi) and the predicted value (ˆyi) is zero.

8

  

On an average the estimated values of errors and values of independent variables are not related to each other. The squared differences between the observed value and the predicted value are similar. There is some variation in independent variable. If there are more than one variable in the equation, then two variables should not be perfectly correlated.

Intercept or Constant  Intercept is the point at which the regression intercepts y-axis.  Intercept provides a measure about the mean of dependent variable when slope(s) are zero.  If slope(s) are not zero then intercept is equal to the mean of dependent variable minus slope × mean of independent variable. Slope  Change is dependent variable as we change independent variable.  Zero Slope means that independent variable does not have any influence on dependent variable. For a linear model, slope is not equal to elasticity. That is because; elasticity is percent change in dependent variable, as a result one percent change in independent variable Advantage of regression 

Regression provides a more detailed analysis which includes an equation which can be used for prediction and/or optimization.

2.4 Comparison Between Correlation and Regression

Basis

Correlation

Regression

Meaning

A statistical measure that defines co-relationship or association of two variables.

Describes how an independent variable is associated with the dependent variable.

Dependent and Independent variables

No difference

Both variables are different.

Usage

To describe a linear relationship between two variables.

To fit the best line and estimate one variable based on another variable.

Objective

To find a value expressing the relationship between variables.

To estimate values of a random variable based on the values of a fixed variable.

2.5. Simple Regression Model

9

Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables. In a cause and effect relationship, the independent variable is the cause, and the dependent variable is the effect. Least squares linear regression is a method for predicting the value of a dependent variable y, based on the value of an independent variable x.    

One variable, denoted (x), is regarded as the predictor, explanatory, or independent variable. The other variable, denoted (y), is regarded as the response, outcome, or dependent variable. Mathematically, the regression model is represented by the following equation

𝒚 = 𝛽0 + 𝛽1𝒙 + 𝜀 The simple linear regression equation is graphed as a straight line, where: 1. β0 is the y-intercept of the regression line. 2. β1 is the slope. 3.

y is the independent variable

4.

x is the dependent variable.

2.6 Assumptions of Simple Linear Regression We make a few assumptions when we use linear regression to model the relationship between a response and a predictor. These assumptions are essentially conditions that should be met before we draw inferences regarding the model estimates or before we use a model to make a prediction 1. Linear relationship: There exists a linear ...


Similar Free PDFs
QADM
  • 17 Pages