Lecture notes, Data Analysis, course 1-9 PDF

Title Lecture notes, Data Analysis, course 1-9
Course Data Analysis
Institution Queensland University of Technology
Pages 43
File Size 1.6 MB
File Type PDF
Total Downloads 75
Total Views 172

Summary

Download Lecture notes, Data Analysis, course 1-9 PDF


Description

1 BSB 123 – DATA ANALYSIS NOTES LECTURE 1: INTRODUCTION TO STATISTICS INTRODUCTION TO EXCEL LECTURE NOTES Key Definitions Population: members of a group about which you want to draw a conclusion ie. all of what you are interested in Sample: portion of subset of the population selected for analysis Parameter: numerical measure which describes a characteristic of a population ie. average based on population data Statistic: numerical measure that describes a characteristic of a sample ie. average based on a sample. It is more practical and is used more than parameters Population vs. Sample Measures used to describe a population are called parameters Measures used/computed from sample data are called statistics Size of the sample has a big impact on the result Analysing the sample allows you to say something about a larger group but there is always a chance for error Population and Sample Data A sample is a selection of measurements (subset) from all measurements (population) Purpose of analysing a sample is to make a statistical inference A note on notation: - Greek letters (µ, θ, N) are used for population data - roman letters (x, s, n) are used for sample data *Note: there are two formulas for each notation*

Types of Data Data does not have to be numerical There are two types of data: - categorical/qualitative - numerical/quantitative Numerical data is measured on a natural number scale Categorical data can only be named or categorised. It can be further categorised: - nominal: no natural/implied order - ordinal: there is an implied order Further Classifications of Numerical Data Continuous or discrete - continuous: Can take on any real number Infinite number of items eg. time - discrete: Countable number of responses Finite number of items (looking at integers) Note: whilst there cannot be half a person there can be half a shoe size Interval or ratio - interval: Difference between measurements (no true 0) eg. temperature It is untrue to say size 10 is double a size 5 (shoe sizes) Jessica King

BSB123 – Data Analysis

Semester 1, 2009

2 Uses discrete data Differences between measurements where the true 0 exists It is true to say $100 is double $50 Uses continuous data Time Series or Cross Sectional - time series: Data collected through time (look for trends) - cross sectional: Collected for a certain point in time -

ratio:

TEXTBOOK NOTES – Chapter 1: Presenting and Describing Information Definitions Variables: characteristics of items or individuals Data: observed values of variables Population: consists of all the members of a group about which you want to draw a conclusion. Two factors need to be specified when defining a population: the entity (eg. people or vehicles) the boundary (eg. those registered to vote or registered in QLD for road use) Sample: the portion of the population selected for analysis. The people or vehicles in the sample represent a portion, or subset, of the people or vehicles comprising the population. Parameter: numerical measure that describes a characteristic of a population Statistic: numerical measure that describes a characteristic of a sample Categorical variables: yield categorical responses, such as yes or no answers. Categorical responses can also yield more than one possible response. Continuous variables: produce numerical responses that arise from a measuring process. The more precise the measuring device used to greater the likelihood of detecting small differences in measurements and therefore having more precise data. Descriptive statistics: focuses on collecting, summarising and presenting a set of data Discrete variables: produce numerical responses that arise from a counting process. Focus group: a market research tool which is used to elicit unstructured responses to open-ended questions. Inferential statistics: uses sample data to draw conclusions about a population. Interval scale: ordered scale in which the difference between measurements is a meaningful quantity but does not involve a true zero point. Nominal scale: classifies data into various distinct categories in which no ranking is implied. Nominal scaling is the weakest form of measurement because you cannot specify any ranking across the various categories. Numerical variables: yield numerical responses, such as your height in centimetres. There are two types of numerical variables: discrete and continuous. Ordinal Scale: classifies data into distinct categories in which ranking is implied ie. things are ranked in order of satisfaction level. Ordinal scaling is a stronger form of measurement than nominal scaling because an observed value classified into one category possesses more of a property than does an observed value classified into another category. However, ordinal scaling is still relatively week because the scale does not account for the amount of the differences between the categories. Ordering only implies which category is greater or preferred – not by how much. Operational definition: a universally accepted meaning that is clear to all associated with an analysis Ratio scale: an ordered scale in which the difference between the measurements involves a true zero point eg. weight, length age or salary. Basic Concepts of Statistics Chapter Summary Statistics examines ways to process and analyse data and provides procedures to collect and transform data in ways that are useful to business decision-makers Identifying the most appropriate source of data is a critical aspect of statistical analysis. Jessica King

BSB123 – Data Analysis

Semester 1, 2009

3 Data from a categorical variable are measured on a nominal scale or on an ordinal scale. Data from numerical variable are measured on an interval or ratio scale. Data measured on an interval scale or on a ratio scale constitute the highest levels of measurement. They are stronger forms of measurement than an ordinal scale because you can determine not only which observed value is the largest but also by how much.

Jessica King

BSB123 – Data Analysis

Semester 1, 2009

4 LECTURE 2: PRESENTING DATA IN TABLES AND CHARTS LECTURE NOTES Presenting data – what to do with information – allows it to be seen visually Tables and Charts for Categorical Data Categorical Data ↓ Summary Table

Graphing Data ↓ Bar Charts Pie Charts

The Summary Table/Frequency Table Choose key points of the information and communicate these on the table Example: Gender; M, F, M, M, F, F, M, F, M, F, F ... Gender Tally Frequency M IIII ... 570 F II ... 430 Total 1000 Bar Chart Best for discussing the best/worst amount, preferred option etc. Each category is discrete and separate In presentation, always use an appropriate title/bar heading In excel, use absolutely references eg. B2/$B$6 and therefore the last value will not change Pie Chart Useful for graphing percentages - % are rounded to the nearest whole percentage More useful for discussing portions Not for use with more than 6-8 categories – as it is not visually effective and doesn’t correctly display the information figures. Tables and Charts for Numerical Data Numerical Data ↓ Ordered Array

Frequency Distributions Cumulative Distributions ↓

↓ Stem and Leaf Display

Histogram

Polygon

Ogive

The Ordered Array A sequence of data in rank order provides signals of variability within the range and may help identify outliers Abnormal values/outliers- figures which are too smaller or too large compared to the other results. They can have a significant impact on the result through distorting the answer/end result. These figures can be removed if they are too significant. Stem and Leaf Diagram Quick and simple way to see distribution details in a data set Separate sorted values into groups (stem) and values within each group (leaves)

Jessica King

BSB123 – Data Analysis

Semester 1, 2009

5 Data may be heavily skewed ie. more of a certain result than others and the distribution can be either symmetrical or non-symmetrical Round of numbers when they are more than 2 digits. Only ever leave one unit – results in the loss of some detail eg. Slide 11 Tabulating Numerical Data: Frequency Distributions A summary table in which data is arranged into numerically ordered classes or intervals A summary/condensed version of numerical data – condenses raw material and makes it more useful i.e. allows for quick visual interpretation Example: Ages; 41. 39. 21, 35, 65, 54 ... Age Frequency 21 1 22 0 23 1 ... ... 65 0 This is not a good/accurate/feasible representation, therefore group responses into CLASSES. Grouped in classes: Age 20-...


Similar Free PDFs