• Explain what multivariate analysis is and when its application is appropriate. • Discuss the nature of measurement scales and their relationship to multivariate techniques. • Understand the nature of measurement error and its impact on multivariate analysis. • Determine which multivariate technique is appropriate for a specific research problem. • Define the specific techniques included in multivariate analysis. • Discuss the guidelines for application and interpretation of multivariate analyses. • Understand the six-step approach to multivariate model building.

CHAPTER PREVIEW This chapter presents a simplified overview of multivariate analysis. It stresses that multivariate analysis methods will increasingly influence not only the analytical aspects of research but also the design and approach to data collection for decision making and problem solving. Although multivariate techniques share many characteristics with their univariate and bivariate counterparts, several key differences arise in the transition to a multivariate analysis. To illustrate this transition, this chapter presents a classification of multivariate techniques. It then provides general guidelines for the application of these techniques as well as a structured approach to the formulation, estimation, and interpretation of multivariate results. The chapter concludes with a discussion of the databases utilized throughout the text to illustrate application of the techniques.

KEY TERMS Before starting the chapter, review the key terms to develop an understanding of the concepts and terminology used. Throughout the chapter, the key terms appear in boldface. Other points of emphasis in the chapter are italicized. Also, cross-references within the key terms appear in italics. Alpha (a) See Type I error. Beta (Il) See Type II error. Bivariate partial correlation Simple (two-variable) correlation between two sets of residuals (unexplained variances) that remain after the association of other independent variables is removed. From Chapter 1 of Multivariate Data Analysis, 7/e. Joseph F. Hair, Jr., William C. Black. Barry J. Babin, Rolph E. Anderson. Copyright., 2010 by Pearson Prentice Hall. All rights reserved.

Bootstrapping An approach to validating a multivariate model by drawing a large number of subsamples and estimating models for each subsample. Estimates fum all the subsamples are then combined, providing not only the ''besf' estimated coefficients (e.g., means of each estimated coefficient across all the subsample models), but their expected variability and thus their likelihood of differing fum zero; that is, are the estimated coefficients statistically different fum zero or not? This approach does not rely on statistical assumptions about the population to assess statistical significance, but instead makes its assessment based solely on the sample data. Composite measure See summated scales. Dependence technique Classification of statistical techniques distinguished by having a variable or set of variables identified as the dependent variable(s) and the remaining variables as independent. The objective is prediction of the dependent variable(s) by the independent variable(s). An example is regression analysis. Dependent variable Presumed effect of, or response to, a change in the independent variable(s). Dummy variable Nonmetrically measured variable transformed into a metric variable by assigning a 1 or a 0 to a subject, depending on whether it possesses a particular characteristic. Effect size Estimate of the degree to which the phenomenon being studied (e.g., correlation or difference in means) exists in the population. Independent variable Presumed cause of any change in the dependent variable. Indicator Single variable used in conjunction with one or more other variables to form a composite measure. Interdependence technique Classification of statistical techniques in which the variables are not divided into dependent and independent sets; rather, all variables are analyzed as a single set (e.g., factor analysis). Measurement error Inaccuracies of measuring the "true" variable values due to the fallibility of the measurement instrument (Le., inappropriate response scales), data entry enuI"S, or respondent enuI"S. Metric data Also called quantitative data, interval data, or ratio data, these measurements identify or describe subjects (or objects) not only on the possession of an attribute but also by the amount or degree to which the subject may be characterized by the attribute. For example, a person's age and weight are metric data. Multicollinearity Extent to which a variable can be explained by the other variables in the analysis. As multicollinearity increases, it complicates the interpretation of the variate because it is more difficult to ascertain the effect of any single variable, owing to their interrelationships. Multivariate analysis Analysis of multiple variables in a single relationship or set of relationships. Multivariate measurement Use of two or more variables as indicators of a single composite measure. For example, a personality test may provide the answers to a series of individual questions (indicators), which are then combined to form a single score (summated scale) representing the personality trait Nonmetric data Also called qualitative data, these are attributes, characteristics, or categorical properties that identify or describe a subject or object. They differ fum metric data by indicating the presence of an attribute, but not the amount Examples are occupation (physician, attorney, professor) or buyer status (buyer, nonbuyer). Also called nominal data or ordinal data. Power Probability of correctly rejecting the null hypothesis when it is false; that is, correctly finding a hypothesized relationship when it exists. Determined as a function of (1) the statistical significance level set by the researcher for a Type I error (a), (2) the sample size used in the analysis, and (3) the effect size being examined. Practical significance Means of assessing multivariate analysis results based on their substantive findings rather than their statistical significance. Whereas statistical significance determines whether the result is attributable to chance, practical significance assesses whether the result is useful (Le., substantial enough to warrant action) in achieving the research objectives. Reliability Extent to which a variable or set of variables is consistent in what it is intended to measure. IT multiple measurements are taken, the reliable measures will all be consistent in their

values. It differs from validity in that it relates not to what should be measured, but instead to how it is measured. Specification error Omitting a key variable from the analysis, thus affecting the estimated effects of included variables. Summated scales Method of combining several variables that measure the same concept into a single variable in an attempt to increase the reliability of the measurement through multivariate measurement. In most instances, the separate variables are summed and then their total or average score is used in the analysis. Treatment Independent variable the researcher manipulates to see the effect (if any) on the dependent variable(s), such as in an experiment (e.g., testing the appeal of color versus black-andwhite advertisements). Type I error Probability of incorrectly rejecting the null hypothesis-in most cases, it means saying a difference or correlation exists when it actually does not Also termed alpha (a). Typical levels are 5 or 1 percent, termed the .05 or .01 level, respectively. Type II error Probability of incorrectly failing to reject the null hypothesis-in simple terms, the chance of not finding a correlation or mean difference when it does exist Also termed beta (P), it is inversely related to Type I error. The value of 1 minus the Type II error (1 - P) is defined as power. Univariate analysis of variance (ANOVA) Statistical technique used to determine, on the basis of one dependent measure, whether samples are from populations with equal means. Validity Extent to which a measure or set of measures correctly represents the concept of studythe degree to which it is free from any systematic or nonrandom error. Validity is concerned with how well the concept is defined by the measure(s), whereas reliability relates to the consistency of the measure(s). Variate Linear combination of variables formed in the multivariate technique by deriving empirical weights applied to a set of variables specified by the researcher.

WHAT IS MULTIVARIATE ANALYSIS? Today businesses must be more profitable, react quicker, and offer higher-quality products and services, and do it all with fewer people and at lower cost. An essential requirement in this process is effective knowledge creation and management There is no lack of information, but there is a dearth of knowledge. As Tom Peters said in his book Thriving on Chaos, ''We are drowning in information and starved for knowledge" [7]. The information available for decision making exploded in recent years, and will continue to do so in the future, probably even faster. Until recently, much of that information just disappeared. It was either not collected or discanled. Today this information is being collected and stored in data warehouses, and it is available to be ''mined'' for improved decision making. Some of that information can be analyzed and understood with simple statistics, but much of it requires more complex, multivariate statistical techniques to convert these data into knowledge. A number of technological advances help us to apply multivariate techniques. Among the most important are the developments in computer hardware and software. The speed of computing equipment has doubled every 18 months while prices have tumbled. User-friendly software packages brought data analysis into the point-and-click era, and we can quickly analyze mountains of complex data with relative ease. Indeed, industry, government, and university-related research centers throughout the world are making widespread use of these techniques. We use the generic term researcher when referring to a data analyst within either the practitioner or academic communities. We feel it inappropriate to make any distinction between these two areas, because research in both relies on theoretical and quantitative bases. Although the research objectives and the emphasis in intetpretation may vary, a researcher within either area must address all of the issues, both conceptual and empirical, raised in the discussions of the statistical methods.

MULTIVARIATE ANALYSIS IN STATISTICAL TERMS Multivariate analysis techniques are popular because they enable organizations to create knowledge and thereby improve their decision making. Multivariate analysis refers to all statistical techniques that simultaneously analyze multiple measurements on individuals or objects under investigation. Thus, any simultaneous analysis of more than two variables can be loosely considered multivariate analysis. Many multivariate techniques are extensions of univariate analysis (analysis of single-variable distributions) and bivariate analysis (cross-classification, correlation, analysis of variance, and simple regression used to analyze two variables). For example, simple regression (with one predictor variable) is extended in the multivariate case to include several predictor variables. Likewise, the single dependent variable found in analysis of variance is extended to include multiple dependent variables in multivariate analysis of variance. Some multivariate techniques (e.g., multiple regression and multivariate analysis of variance) provide a means of performing in a single analysis what once took multiple univariate analyses to accomplish. Other multivariate techniques, however, are uniquely designed to deal with multivariate issues, such as factor analysis, which identifies the structure underlying a set of variables, or discriminant analysis, which differentiates among groups based on a set of variables. Confusion sometimes arises about what multivariate analysis is because the term is not used consistently in the literature. Some researchers use multivariate simply to mean examining relationships between or among more than two variables. Others use the term only for problems in which all the multiple variables are assumed to have a multivariate normal distribution. To be considered truly multivariate, however, all the variables must be random and interrelated in such ways that their different effects cannot meaningfully be interpreted separately. Some authors state that the pmpose of multivariate analysis is to measure, explain, and predict the degree of relationship among variates (weighted combinations of variables). Thus, the multivariate character lies in the multiple variates (multiple combinations of variables), and not only in the number of variables or observations. For our present pmposes, we do not insist on a rigid definition of multivariate analysis. Instead, multivariate analysis will include both multivariable techniques and truly multivariate techniques, because we believe that knowledge of multivariable techniques is an essential first step in understanding multivariate analysis.

SOME BASIC CONCEPTS OF MULTIVARIATE ANALYSIS Although the roots of multivariate analysis lie in univariate and bivariate statistics, the extension to the multivariate domain introduces additional concepts and issues of particular relevance. These concepts range from the need for a conceptual understanding of the basic building block of multivariate analysis--the variate-to specific issues dealing with the types of measurement scales used and the statistical issues of significance testing and confidence levels. Each concept plays a significant role in the successful application of any multivariate technique.

The Variate As previously mentioned, the building block of multivariate analysis is the variate, a linear combination of variables with empirically determined weights. The variables are specified by the researcher, whereas the weights are determined by the multivariate technique to meet a specific objective. A variate of n weighted variables (Xl to X,J can be stated mathematically as:

where Xn is the observed variable and Wn is the weight determined by the multivariate technique.

The result is a single value representing a combination of the entire set of variables that best achieves the objective of the specific multivariate analysis. In multiple regression, the variate is determined in a manner that maximizes the correlation between the multiple independent variables and the single dependent variable. In discriminant analysis, the variate is formed so as to create scores for each observation that maximally differentiates between groups of observations. In factor analysis, variates are formed to best represent the underlying structure or patterns of the variables as represented by their intercorrelations. In each instance, the variate captures the multivariate character of the analysis. Thus, in our discussion of each technique, the variate is the focal point of the analysis in many respects. We must understand not only its collective impact in meeting the technique's objective but also each separate variable's contribution to the overall variate effect

Measurement Scales Data analysis involves the identification and measurement of variation in a set of variables, either among themselves or between a dependent variable and one or more independent variables. The key wonl here is measurement because the researcher cannot identify variation unless it can be measured. Measurement is important in accurately representing the concept of interest and is instrumental in the selection of the appropriate multivariate method of analysis. Data can be classified into one of two categories-nonmetric (qualitative) and metric (quantitative)-based on the type of attributes or characteristics they represent. The researcher must define the measurement type-nonmetric or metric-for each variable. To the computer, the values are only numbers. As we will see in the following section, defining data as either metric or nonmetric has substantial impact on what the data can represent and how it can be analyzed. NON METRIC MEASUREMENT SCALES Nonmetric data describe differences in type or kind by indicating the presence or absence of a characteristic or property. These properties are discrete in that by having a particular feature, all other features are excluded; for example, if a person is male, he cannot be female. An "amount" of gender is not possible, just the state of being male or female. Nonmetric measurements can be made with either a nominal or an ordinal scale.

Nominal Scales. A nominal scale assigns numbers as a way to label or identify subjects or objects. The numbers assigned to the objects have no quantitative meaning beyond indicating the presence or absence of the attribute or characteristic under investigation. Therefore, nominal scales, also known as categorical scales, can only provide the number of occurrences in each class or category of the variable being studied. For example, in representing gender (male or female) the researcher might assign numbers to each category (e.g., 2 for females and 1 for males). With these values, however, we can only tabulate the number of males and females; it is nonsensical to calculate an average value of gender. Nominal data only represent categories or classes and do not imply amounts of an attribute or characteristic. Commonly used examples of nominally scaled data include many demographic attributes (e.g., individual's sex, religion, occupation, or political party affiliation), many forms of behavior (e.g., voting behavior or purchase activity), or any other action that is discrete (happens or not). Ordinal Scales. Ordinal scales are the next ''higher'' level of measurement precision. In the case of onlinal scales, variables can be onlered or ranked in relation to the amount of the attribute possessed. Every subject or object can be compared with another in terms of a "greater than" or "less than" relationship. The numbers utilized in onlinal scales, however, are really nonquantitative because they indicate only relative positions in an onlered series. Ordinal scales provide no measure of the actual amount or magnitude in absolute terms, only the onler of the values. The researcher knows the onler, but not the amount of difference between the values.

For example, different levels of an individual consumer's satisfaction with several new products can be illustrated, first using an onlinal scale. The following scale shows a respondent's view of three products.

Very Satisfied

Product A





I Not At All Satisfied

When we measure this variable with an onlinal scale, we ''rank order" the products based on satisfaction level. We want a measure that reflects that the respondent is more satisfied with Product A than Product B and more satisfied with Product B than Product C, based solely on their position on the scale. We could assign ''rank order" values (1 = most satisfied, 2 = next most satisfied, etc.) of 1 for Product A (most satisfaction), 2 for Product B, and 3 for Product C. When viewed as ordinal data, we know that Product A has the most satisfaction, followed by Product B and then Product C. However, we cannot make any statements on the amount of the differences between products (e.g., we cannot answer the question whether the difference between Products A and B is greater than the difference between Products B and C). We have to use an interval scale (see next section) to assess what is the magnitude of differ...

