Further 3/4 Bound Reference pdf PDF

Title Further 3/4 Bound Reference pdf
Course Further Mathematics
Institution Victorian Certificate of Education
Pages 50
File Size 3.6 MB
File Type PDF
Total Views 142

Summary

95 SAC average, raw 44 (2019) Further Bound Reference. Concise explanation of concepts + examples. Covers: data & statistics, finance, matrices, graphs & network....


Description

CLASSIFYING CLASSIFYI NG DATA

Categorical Ordinal: can be used Nominal: data to group and order values that can be individuals used to group individuals Examples of categorical variables Ordinal • Grades (A,B,C,D) • Rating (1- strongly agree, 2 – agree etc) • Release date • Age range (over 50, under 50) • Floor level (1,2,3,4) • Shoe size (6,8,10) Examples of numerical variables Discrete • Price paid for… • Amount of… • Number of…

Numerical Discrete: quantities Continuous: that are counted quantities that are measured

Nominal • Bank account • Post code • Sugar type

Continuous • Time taken to… • Temperature

DISPLAYING DESCRIBING THE DISTRIBUTIONS CATEGORICAL VARIABLES DISPLAYI NG AND DESC RIBING T HE DISTR IBUTIONS OF CATEGORIC AL VARIA BLES 1. The Frequency Table Frequency can be recorded as a number (the number of times a value occurs) or as a

percentage (per cent =

!"#$%& %"%'(&!"#

%$)&100%).

2. The Bar Chart The bar chart graphically displays information to help identify any features that stand out in

data. In a bar chart: • Variable being displayed is on x-axis, frequency is on y-axis • Axes are labelled, must rule lines • Bars of equal width with space between them 3. Segmented/stacked Bar Charts The segmented/stacked bar chart is a compact display which is useful when comparing two or more categorical variables. Be careful: stacked bar charts have frequencies on y-axis, percentage segmented bar charts have percentages on y-axis. Report Writ Writing: ing: When writing a report describing the distribution of a categorical variable: • Briefly summarise context of data including number of individuals who were involved • Ensure mode/modal category is mentioned • Include frequencies or percentages. Percentages are preferred

!

1

.

DISPLAYI NG AND DES CRIBING THE DISTR IBUTIONS OF NUMERICAL VARIABL ES DISPLAYING DESCRIBING DISTRIBUTIONS VARIABLES 1. The Grouped Frequency Table

A grouped frequency table ensures that we can compare continuous variables which can take a large range of values (e.g. age).

2. The Histogram Graphical display of information in a grouped frequency table. Shape of dist distributions ributions Skewed distribution

Symmetric distr distribution ibution

!

2

Outl Outliers iers

Centre The centre of a histogram can be found by finding the middle of the distribution. Spread

Report wri writing: ting:

• • •

Shape and outliers Centre Spread

Histo Histogram gram maki making: ng:

!

3

USING LOG SCALE TO DISPLAY DATA

Examples: Find log of 45, correct to two significant figures.

Log10(45) = 1.65…. 1.7 (to 2. significant figures) Find the number whose log is 2.7125, correct to the nearest whole number. 102.7125= 515.82… 516 (to the nearest whole number) DOT AND STEM PLOTS 1. The Dot Plot A dot plot is particularly suitable for displaying discrete numerical data and provides a very quick way to order and display a small dataset. 2. The Stem Plot The stem-and-leaf plot, or stem plot, are particularly useful for displaying small- to mediumsized sets of data (up to about 50 data values). • Always include a key/legend with stem plot • The leaf is ordered • Split stem up in halves or fifths if told so MEDIAN, RANGE AND MEDIA N, R ANGE AN D IQR Media Median n The median is the middle value in an ordered data set. For n data values, the median is $-.

found by / th position. When n is odd, the median will be the middle data value. When n is even, the median will the average of the two middle data values. Range Range is the simplest measure of spread of a distribution. It is calculated by largest data value – smallest data value.

!

4

A probl problem em with range as a measur measure e of spread The range depends only on the two extreme values in the data, it is not always an informative measure of spread. For example, one or other of these two values might be an outlier. Hence, when distribution is skewed or has an outlier, IQR is more refined measure of spread. IQR The interquartile range is defined as the spread of the middle 50% of data values. It is found by: Q3-Q1. THE 5 NUMBER SUMMARY

The five number summary is: • Minimum • Q1 • Median • Q3 • Maximum Box plot

Outl Outliers iers To calculate upper fence: Q3 +1.5(IQR) To calculate lower fence: Q1-1.5(IQR) RELATING BOX PLOT TO SHAPE A symmetr symmetric ic dis distribut tribut tribution ion

!

5

Positivel Positivelyy skewed dist distribution ribution

Negati Negatively vely skewe skewed d dis distrib trib tribution ution

Distr Distributi ibuti ibutions ons wi with th out outlier lier lierss

PLOTS DESCRIBE DISTRIBUTIONS USING BOX PL OTS TO DESCRI BE AND COMPARE DIS TRIBUTIONS Using a box pl plot ot to descri describe be dist distributi ributi ribution on with outli outliers ers

Using a box pl plot ot to compare di distri stri stributions butions

!

6

DESCRIBING CENTRE SPREAD SYMMETRIC DISTRIBUTI STRIBUTIONS DESCRIBI NG THE CE NTRE AND SP READ OF S YMMETRIC DI STRIBUTI ONS Choosin Choosing g between the mea mean n and tthe he media median n

The mean and median are both measures of the centre of a distribution. If the distribution is: • Symmetric and there are no outliers, either mean or median can be used • Clearly skewed and/or there are outliers, it is more appropriate to use the median to indicate the centre of the distribution The standard deviation is an average of the squared deviations of each data value from the mean. 68--95 95--99.7% RULE THE NORMAL DISTRIBUTION AND THE 68 The following rules can be used when examining a symmetric distribution that has an approximate bell shape. • 68% will lie within one standard deviation above/below the mean • 95% will lie within two standard deviations above/below the mean • 99.7% will lie within three standard deviations above/below the mean

!

7

STANDARD SCORES Standardised scores are a way of comparing data from sets of different magnitude.

• • •

A positive z-score indicates that the actual score it represents is above the mean A negative z-score indicates that the actual score it represents is below the mean A zero z-score indicates that the actual score lies below the mean

Convert Converting ing standardised scores into actual scores

!

8

INVESTIGA TIGATING ASSOCIAT CIATION BETW ETWEEN NUMERIC ERICAL AND CATEG ATEGOR ORICAL VARIAB ARIABLE INVES TIGA TING ASSO CIAT ION B ETW EEN NUM ERIC AL AN DAC ATEG OR ICAL V ARIAB LE There are several ways of identifying and systematically describing the association between multiple box plots. You can use:

Media Median n

!

9

IQR/ra IQR/range nge

Comparing shapes

When using back-to-back stem plot, media median n is most commonly used to compare distribution.

!

10

• •

INVESTIGATING ASSOCIATION BETWEEN TWO NUMERICAL VARIABLES The vertical (y-axis) is used for the response variable The horizontal (x-axis) is used for the explanatory variable

HOW TO INTERPRET A SCATTERPLOT The things we look for in a scatterplot are direction and outliers (if any), form and strength.

Guidel Guidelines ines for cla classif ssif ssifying ying the sstreng treng trength th of a llinea inea inearr as associa socia sociation tion

WARNI WARNING: NG: If you use correlation coefficient as a measure of the strength of an association, you are implicitly assuming that: • Variables are numeric • Association is linear • There are no outliers in the data The correlation coefficient can give a misleading indication of the strength of a linear association if there are outliers present.

COEFFICIENT CALCULATING CORRELATION COEFFI CIENT

!

11

Pearson’s correlation coefficient, r, gives a numerical measure of the degree to which the points in the scatterplot tend to cluster around a straight line. THE COEFFICIENT OF DETERMINATION

Interpre Interpreting ting the coeffic coefficient ient of d determ eterm etermination ination The coefficient of determination (as a percentage) tells us the variation in the response variable that is explained by the variation in the explanatory variable.

Correlation does not equal causality. There are lots of possible non-causal explanations for an association. CORRELATION AND CAUSALITY 1. Common Response

2. Confounding variables

3. Coincidence

Sometimes, correlation is down to coincidence. E.g. There is a strong correlation between the consumption of margarine and the divorce rate in American state of Maine. This is probably due to coincidence.

!

12

GRAPH? WHICH GRAP H?

REGRESSION LEAST SQUARES REG RESSION LINE The least squares regression line is the line that minimises the sum of the squares of the residuals. The vertical distances, d, are known as residuals.

The assumptions for fitting a least squares line to data are the same as for using the correlation coefficient. These are that: • The data is numerical • The association is linear • There are no clear outliers

!

13

PERFORMING A REGRESSION ANALYSIS The scatter scatterplot plot and corr correlation elation coeff coefficient icient

Interpre Interpreting ting the slope and int intercept ercept of a regression line

Interpo Interpolation lation a and nd Extra Extrapolatio polatio polation n Interpolation: Predicting within data range (reasonably reliable) Extrapolation: Predicting outside the range of data (not reliable) The coeffi coefficient cient of det determination ermination 2 If r=-0.964, then r = 0.93. Then 93% of variation in ____ can be explained by variation in ____.

The residual plot A residual plot is a plot of the residual value for each data value against the independent variable. If there is no clear pattern (i.e. dots are randomly scattered around zero regression line), then it confirms the assumption of a linear relationship. CIRCLE OF TRANSFORMATIONS

!

14

In each case, there is more than one type of transformation that might work. The best option is found by comparing the r2 value. The highest = the best fit. WARNI WARNING: NG: If a transformation is made, write it in the final equation. THE SQUARED TRANSFORMATION The squared transformation is a stretching transformation. It works by stretching out the upper end of the scale on either the x- or y- axis. THE LOG TRANSFORMATION The logarithmic transformation is a compressing transformation. It works by compressing the upper end of the scale on either the x- or the y-axis THE RECIPROCAL TRANSFORMATION The reciprocal transformation is a stretching transformation that compresses the upper end of the scale on either the x- or y-axis. TIME SERIES DATA Time series data are a special kind of bivariate data, where the explanatory variable is time. A time series plot is a line graph with time plotted on the horizontal axis. The variable under investigation, the response variable, is plotted on the vertical axis. Trend • General upward or downward movement over time

!

15

One way to identify trends on a time series graph is to draw a line that ignores the fluctuations, but which reflects the overall increasing or decreasing nature of the plot. If line is horizontal, there is no trend.

Sometimes, there can be different trends in a time series for different time periods.

Cycle • Periodic movement in a time series, but over a period greater than 1 year. • Has peaks and troughs • Peaks are irregularly spaced and unpredictable + of different magnitudes

Seasonality

!

16

• • •

Time between peaks is consistent Peaks are often of consistent size Periodic movement is related to a calendar related period (e.g. year, month or week)

We can see that there is an increasing trend and seasonality. The demand for accommodation is at its lowest in the June quarter and highest in December quarter. Structural change



Significant change in conditions that causes an abrupt change in the data

Outl Outliers iers



Outliers are present when there are individual values that stand out from the general body of the data

Random fluct fluctuations uations • All graphs will have random fluctuation unless they are perfectly straight line

!

17

SMOOTHING A TIME SERIES USING MOVING MEANS One effect of the irregular fluctuations and seasonality can be to obscure an underlying trend. The technique of smoothing can sometimes be used to overcome this problem. Smoothing time seri series es plot usi using ng moving means

Five mean smoothing is more effective in reducing the irregular fluctuations than threemean smoothing. Two moving mean with centring

!

18

Four moving mean with centring

SEASONAL INDICES This means that the sum of the seasonal indices equals the number of seasons. Thus, if the seasons are months, the seasonal indices add to 12. If the seasons are quarters, then the seasonal indices would to 4, and so on.

Seasonal indices tell us how a particular season compares to the average season. For example, • If seasonal index is 1.2 or 120% for month of February, it means that the sales tend to be 20% higher than the monthly average. Remember, average seasonal index is 1 or 100%. • If seasonal index for August is 0.9 or 90%, it tells us sales are 10% lower than the monthly average. Deseaso Deseasonalis nalis nalising ing dat data a 012304&567389& :90:;...


Similar Free PDFs