Summary Business Statistics - summaries of the entire textbook PDF

Title Summary Business Statistics - summaries of the entire textbook
Course Introductory to Statistics
Institution The University of British Columbia
Pages 49
File Size 1.1 MB
File Type PDF
Total Downloads 238
Total Views 622

Summary

Ch. 2 Data Tuesday, January 04, 2011 7:16 PM 2 What are Data? Data: Systematically recorded information, whether numbers or labels, together within a context. Data can come in many forms, such as numbers or names and other labels. Data can have values that look like numerical values but are just num...


Description

Ch. 2 - Data Tuesday, January 04, 2011



7:16 PM

2.1 What are Data? ○ Data: Systematically recorded information, whether numbers or labels, together

within a context. ○ Data can come in many forms, such as numbers or names and other labels. ○ Data can have values that look like numerical values but are just numerals

serving as labels.  Ex. ISBN numbers on books. ○ Data values are useless without their context.  To provide context, data must be accompanies by an establishment of the 5 W's. □ In particular, if you can't answer who or what, you don't have data, and you don't have useful information.  Data can me made useful with added context and once they have been organized in data tables. □ Data Tables: An arrangement of data in which each row represents a case and each column represents a variable. ○ In general the rows of a data table correspond to individual cases about who (not necessarily people) we record the data about. ○ DATA VOCAB!  Respondent: Individuals who answer a survey  Subjects: People on whom we experiment □ Participants: relatively the same thing, but these people we attempt to acknowledge the importance of their role in the experiment.  Experimental Units: Animals, plants, and other inanimate subjects.  Records: The rows in a data base  Cases: The things that are being described in each row.  Variables: The characteristics recorded about each individual or case. □ Usually shown as the columns of a data table, and have a name showing the 'what' that is being measured.  Spreadsheet: A general term for a data table □ Ideal for relatively small data sets  Relational Database: Two or more separate data tables are linked together so that information can be merged across them.  2.2 Variable Types ○ Variables play different roles, and knowing the variables type is crucial to knowing what to do with it and what it can tell us.  Categorical Variable: A variable that names categories and answers questions about how a case falls into that category □ Often descriptive responses to questions □ IF the values of a variable are words rather than numbers, they are likely to be categorical.  Ex. Male or Female  Quantitative Variable: A variable has measured numerical values with units, and the variables tell us about the quantity of what is measured. □ Just because a piece of data has numbers does not mean that it is Statistics.one (On 11-8-2011) Page 1

quantitative, the numbers need to actually be measureable, and they need to be relevant.  What is your area code?' is not going to give you a quantitative variable, because while 604 is a number, the numerical value of it doesn't matter at all. □ The units tell us how each value has been measured, and the scale of measurement.  They tell us how much of something we have, or how far apart the two values are. □ Ex. Height ○ Some variables can answer both kinds of questions  If it isn't clear as to treat a variable as quantitative or categorical, think about why you are looking at it, and what you want it to tell you ○ Counts  In statistics we often counts things as a natural way to summarize the categorical variables. □ With categorical variables, counts are used to summarize what the variable tells us.  Ex. How many purchases were shipped by plane, vs. train? So should we use plane or train?  However we can also use counts to measure the amounts of quantitative variables. □ Ex. How many songs are in your iTunes library ○ Identifiers  Identifier variables are individual categories, in which only exists one case.  There are exactly as many categories as there are individuals □ Ex. Student Number  Alone they don't tell us anything useful, however they are crucial in the era of large data sets as they provide unique labels. □ They make it possible to combine data from different sources, protect confidentiality, □ They are essential in relational data tables for linking one data table to another  It is important to know which variables are identifiers so that you don't analyze it ○ Other Data Types  Non Variables: Categorical variables used only to name categories  Ordinal Values: When all we want to know about a variable is the order of its value. □ Can be ordered individually (place in a race) or in groups (grade in school) □ Orders matter on the purpose.  If measuring by age, it would go baby, kid, teen, adult.  If measuring by most likely to buy a CD it would probably go teen, adult, kid, baby. ○ Cross Sectional and Time Series Data  Time Series: The same variable measured at regular intervals over time. □ Common in business

Statistics.one (On 11-8-2011) Page 2

 Ex. Rain in Vancouver in the month of June, measured in 2005, 2006, and 2007.  Cross Sectional Data: Several variables are measured at the time point in time.  Ex. Rain in 2006 in March, April, May, June.  2.3 Where, How, and When ○ Where and When  Where something was measured can make a huge difference. □ Ex. Something measured in Canada may be more credible than something measured in Iraq.  When something is measured also alters its value □ Sea level measured in 2010 vs. 1876. ○ How  How the data has been gathered strongly affects it. □ People can lie □ People can be biased □ People can be influenced easily.

Statistics.one (On 11-8-2011) Page 3

Ch. 3- Surveys and Sampling Monday, January 10, 2011



11:32 AM

3.1 Three Ideas of Sampling ○ Idea 1: Examine a Part of the Whole  When we'd like to know about an entire population of individuals, we settle

for examining a smaller group of them for practicality purposes. □ Sample: A subset of the population studied.  We trust that the information provided by the sample will be representative of the information provided by the entire population. □ Sample Survey: A poll designed to ask questions of a small group of people in the hope of learning something about the entire population.  When selecting a sample, we need to make sure that it represents the type of people in the population we are representing. □ The way the sample is drawn may overlook subgroups that are hard to find  Ex. Telemarketers who phone in the middle of the day are extremely more likely to get results from unemployed people, or older retired people who are home during the day. □ Samples that are over or underemphasized some characteristics are said to be biased.  The summary of characteristics of a sample differ from the corresponding characteristics of the population it is trying to represent. ◊ Conclusions based on biased surveys are very flawed, and there is no way to fix it, nor any way to salvage any useful information. □ Individuals should be selected at random. ○ Idea 2: Randomize  Randomization protects against factors that you aren't aware of, as well as those you know could provide biases. □ It protects us from the influences of all the features of our population by making sure that on average the sample looks like the rest of the population.  Randomness makes things fair. □ Nobody can guess the outcome before it happens □ When fair, sets of outcomes are equally likely.  Many things aren't really random, we just think they are. □ Pseudorandom  When you click shuffle on an iPod, it's not 100% random because it never plays a song twice, if it was entirely random, the possibility that one song gets played 5 times in a row is entirely there.  We don't match our sample exactly to the population, because there are so many minor differences that it would be impossible to take into consideration all of them.  Sampling Errors: Sample to sample errors. When a survey is given to multiple samples, the samples will differ from each other and so will the responses.

Statistics.one (On 11-8-2011) Page 4

○ Idea 3: The Sample Size is What Matters  That size of the sample is what we use to make conclusions, not the size of the population that we've taken a sample from. □ We do not need a large % or portion of a population just because it is large. What fraction of the population you sample is not important, only the sample size itself.  This influences how much the sample costs, and how well he survey can measure the population.  Too small of a population won't be representative of the population. □ Usually need at least several hundred participants.  3.2 A Census- Does it Make Sense? ○ Census: A sample that is the entire population.  Difficult to complete □ How likely are you to be able to survey absolutely everyone…  The cost of locating everyone may be huge.  The population that is being studied may change □ People die, and are born □ Advertising and news events change people's opinions  3.3 Populations and Parameters ○ To generalize from a sample to the world, we need a model of reality. ○ The model doesn't need to be complete or perfect, it just needs to be able to give us summaries that we can learn from and use even though they don't fit each data value exactly. ○ Models use mathematics to represent reality.  Parameters: A numerically valued attribute of a model for a population. □ We rarely expect to know the exact value of a parameter, but we aim to estimate it from the given data. □ We use the data to try and estimate values for the population parameters.  Statistic: Any summary found from the data ◊ Sample Statistic: When we match the statistics with the parameters they estimate.  3.4 Simple Random Sample (SRS) ○ How would you select a representative sample?  Every individual in the population needs to have an equal chance to be selected. □ Every possible sample of the size we plan to draw has an equal chance of being selected.  Each combination of individuals has an equal chance of being selected now! ○ Simple Random Sampling: A sample in which each set of x elements in the population has an equal chance of selection.  The standard against which we measure other sampling methods, and the sampling method on which the theory of working with sampled data is based. ○ To Select a Sample at Random:  First, define a sampling frame

Statistics.one (On 11-8-2011) Page 5

□ Sampling Frame: A list of individuals from which the sample will be drawn.  In defining the sample frame, we must deal with the details of defining the population.  Then we choose an SRS by assigning each individual a sequential number, then we draw random numbers to see who will be in our sample. □ Sampling Variability: Differences that exist between possible samples of the same population.  3.5 Other Sample Designs ○ Stratified Sampling  Strata: Subsets of population that are intentionally homogeneous but differ from one another.  Stratified Sampling: Use SRS in each strata, and then combine the results in the end to form results for the entire population.  This can be used to beneficially represent a population to scale, making the results more accurate. □ Ex. 60% of mall go-ers are female, 40% male. When conducting a survey about a new store, we would select 60 women and 40 men into stratum, instead of doing it randomly and maybe getting 70 men and 30 women.  In stratified sampling, different samples are generally more alike. □ BENEFICIAL.

○ Cluster and Multistage Sampling  Sometimes SRS and Stratified sampling are impractical. □ Cluster: A representative subset of a population chosen for reasons of convenience, cost or practicality.  Cluster Sampling: A sampling design in which clusters representative of the population are chosen at random and a census is taken of each. ◊ If each cluster fairly represents the population, it will be unbiased.  A cluster is all different, like a mini sample of a sample… whereas a strata is comprised of people that are alike. – We stratify to reduce variability, we cluster to reduce costs or ensure practicality.  Multistage Sample: Sampling schemes that combine several sampling methods. ○ Systematic Samples  Systematic Sample: Selecting individuals systematically □ Every tenth individual on an alphabetical list of employees.  To make sure it is still random, we need to start the systematic selection on a randomly selected individual.  The order of the list needs to be in no way associated with the responses measured in order for this to be representative and random.  3.6 Defining the Population ○ Often the population is not evident  And sometimes, even when it is it may not be a practical group to study Statistics.one (On 11-8-2011) Page 6

○ You also have to specify the sampling frame  Limits what your survey can find out  Not the group you really want to know about ○ Select your target sample  The individuals for whom you intend to measure responses □ You probably will not get a response for many of these people. ○ Select your sample  The actual respondents to your survey  They may or may not be representative of your target sample or your population. ○ People who conduct the surveys rarely consider the actual people who will be filling out the survey, and whether they are individuals about whom the answers will be interesting or have meaningful business consequences.

 3.7 The Valid Survey ○ The survey needs to be able to yield the information you need about the population you are interested in, ensuring a valid survey.  Valid Survey □ What do I want to know?  The survey instrument (questionnaire, online survey, etc) can result in errors. ◊ Asking unnecessary questions ◊ Too long  If you don't have good use for an answer, don't use the question. Who are the right respondents? □  Your selected sample needs to be representative and helpful in the question you want answered.  The respondents also need to actually know the information you are hoping to discover. □ What are the right questions?  Ask specific rather than general questions  Be careful with question phrasing ◊ If a question is hard to understand, people may answer them incorrectly.  Measurement Errors: Inaccurate responses either intentional or unintentional. ◊ To cut down on measurement errors, survey writers often provide possible answers. ◊ To protect a survey from measurement errors, a pilot test may be performed.  Pilot test: A small sample is drawn from the sampling frame and a draft form of the survey is administered. □ What will be done with the results?  Watch for biases ◊ Nonresponse bias: When a large fraction of those sampled don't respond.  If the people who don't respond have something in common, a bias will occur. ◊ Volunteer Response Bias: When individuals can choose on their own whether they want to complete the survey.  Individuals with strong feelings on either side are Statistics.one (On 11-8-2011) Page 7

more likely to respond, with no middle ground really. ○ What can go Wrong? How to Sample Badly.  Voluntary Response Sample □ Online, call in shows, etc. □ Always biased, so conclusions drawn from them are often wrong. □ Usually frequented by people with strong negative opinions.  Convenience Sampling □ Using only the individuals who are convenient

 Undercoverage □ Some portion of the population is not sampled at all or has a smaller representation in the sample than in the population.  Leads to bias ○ How to Think About Biases  Look for biases in any survey  Spend time and resources reducing biases  Pretest or Pilot your survey  Always report your sampling methods in detail

Statistics.one (On 11-8-2011) Page 8

Ch.4- Displaying and Describing Categorical Data Monday, January 10, 2011



5:34 PM

4.1 The Three Rules of Data Analysis ○ Make a Picture  A display of your data will reveal things you are not likely to see in a table of

numbers and will help you to plan your approach to the analysis and think clearly about the patterns and relationships that may be hiding in your data. ○ Make a Picture  Will basically analyze your data for you  Show you patterns ○ Make a Picture  Easy to report to others what you find in your data  4.2 Frequency Tables ○ To make a picture of data, we need to first put the data into piles of things that seem to go together, so we can see how the cases distribute across different categories.  In categorical data, it's easy, you just count the number of cases under each category.  All of the piles are organized into a frequency table. □ Frequency Table: A table that lists the categories in a categorical variable and gives the number of observations for each category. ○ Sometimes we want to know the proportion of the data in each category.  Divide the counts by the total number of cases. □ Relative Frequency Table: Shows the percentages rather than the counts of the values in each category. ○ Both tables show the distribution of the data,  4.3 Charts ○ The Area Principle  The area occupied by a part of the graph should correspond to the magnitude of the value it represents. ○ Bar Charts- Only for categorical  Extremely accurate as a visual representation  The height of each bar shows the count for each category.  Bar charts make comparisons between categories easy and natural.  There should be small spaces between the bars to indicate that these are freestanding bars that could be arranged in any order.  Generally drawn in vertical columns, but sometimes with horizontal bars.  Relative Frequency Bar Charts: Shows the percentage rather than the counts of the values in each category. ○ Pie Charts- Only for categorical  Show the whole group of cases as a circle, with pieces that are proportional

Statistics.one (On 11-8-2011) Page 9

to the fraction of the whole category.  Comparisons are sometimes harder, especially with close categories.  Its also harder to estimate the number in each category. ○ The decision of which picture to use depends on the data you have, and on what you hope to communicate. ○ Categorical Data Condition: The data are counts or percentages of individuals in categories. ○ Additionally, if you want to draw a pie or bar chart, you need to make sure categories don't overlap, and that no individual is counted in two categories.

 4.4 Contingency Tables ○ A table that displays counts of individuals falling into categories based on multiple variables.  Shows how individuals are distributed along each variable, contingent on the value of the other variable. Agree

Neutral Disagree Don't Know

Canada 544

43

567

98

1,252

USA

532

59

498

12

1,101

1076

102

1065

110

2,353

□ Shows how individuals answer the variable of agree-disagree, contingent upon which country they live in. □ The margins of a contingency table show totals  The frequency distribution of either variable in the margins is called the marginal distribution. □ Cell: Each cell in the table gives the count for a combination of values of the two variables. □ Most statistic programs offer a choice of total percent, row percent, or column percent.  Total for Canada Agree= 544/2353 Total  Row for Canada Agree= 544/1252 Canadians  Column for Canada Agree= 544/1076 Agrees ○ Conditional Distributions  The distribution of a variable restricting the who to consider only a smaller group of individuals.  In a contingency table, when the distributions of one variable is the same for all categories of another, we say that the variables are independent. □ There's no association between these variables. ○ Segmented Bar Charts  The bars in the bar charts are the 'wholes' and then each bar is divided proportionally into segments corresponding to the percentage in each group.  Simpson's Paradox ○ Combining percentages across very different values or groups can give absurd results.  Sometimes someone can be better at doing A and better at doing B than another person, but the other person can still have a better overall

Statistics.one (On 11-8-2011) Page 10

average. □

Ex. Peter= 90/100=0.9 Katrina= 19/20=0.95

10/20=0.5

75/100=0.75

100/120 = 0.8333

94/120=0....


Similar Free PDFs