Further Bound Reference PDF

Title Further Bound Reference
Author Benjamin Nguyen
Course Law Reform
Institution University of Melbourne
Pages 42
File Size 2 MB
File Type PDF
Total Downloads 18
Total Views 131

Summary

hjhvhjvvjhvhh...


Description

2020 Further Mathematics: Bound Reference Emma Zanghi Core: Data Analysis Core: Recursion and Finance Module: Networks

2

Data Analysis DATA DISTRIBUTIONS ........................................................................................................................... 4 ASSOCIATIONS BETWEEN VARIABLES........................................................................................... 10 MODELLING LINEAR ASSOCIATIONS ............................................................................................ 13 TIME SERIES .......................................................................................................................................... 15

3

Data distributions

Categorical Categorical variables represent characteristics or qualities of people or things – for example, a person’s eye colour, sex, or fitness level. Data generated by a categorical variable can be used to organise individuals into one of several groups or categories that characterise this quality or attribute.!For example, an ‘F’ in the Sex column indicates that the student is a female, while a ‘3’ in the Fitness level column indicates that their fitness level is low. Categorical variables come in two types: nominal and ordinal. -

Nominal variables have data values that can be used to group individuals according to a particular characteristic. The variable sex is an example of a nominal variable. The data values for the variable sex, for example M or F, can be used to group students according to their sex. It is called a nominal variable because the data values name the group to which the students belong, in this case, the group called ‘males’ or the group called ‘females’. !

-

Ordinal variables have data values that can be used to both group and order individuals according to a particular characteristic. The variable fitness level is an example of an ordinal variable. The data generated by this variable contains two pieces of information. First, each data value can be used to group the students by fitness level. Second, it allows us to logically order these groups according to their fitness level – in this case, as ‘low’, ‘medium’ or ‘high’.

4

Numerical Numerical variables are used to represent quantities, things that we can count or measure. For example, a ‘179’ in the Height column indicates that the person is 179 cm tall, while an ‘82’ in the Pulse rate column indicates that they have a pulse rate of 82 beats/minute. Numerical variables come in two types: discrete and continuous.

-

Discrete variables represent quantities that are counted. The number of mobile phones in a house is an example. Counting leads to discrete data values such as 0, 1, 2, 3, . . . There can be nothing in between. As a guide, discrete variables arise when we ask the question ‘How many?’

-

Continuous variables represent quantities that are measured rather than counted. Thus, even though we might record a person’s height as 179 cm, their height could be any value between 178.5 and 179.4 cm. We have just rounded to 179 cm for convenience, or to match the accuracy of the measuring device. As a guide, continuous variables arise when we ask the question ‘How much?’

Numerical or categorical? Deciding whether data are numerical of categorical is not an entirely trivial exercise. Two things that can help your decision-making are: 1. Numerical data can always be used to perform arithmetic computations. This is not the case with categorical data. For example, it makes sense to calculate the average weight of a group of individuals, but not the average house number in a street. This is a good test to apply when in doubt. 2. It is not the variable name alone that determines whether data are numerical or categorical; it is also the way the data are recorded. For example, if the data for variable weight are recorded in kilograms, they are numerical. However, if the data are recorded as ‘underweight’, ‘normal weight’, ‘overweight’, they are categorical.

5

Symmetric distributions

Positively skewed distributions

Outliers

6

Negatively skewed distributions

The median The median is the middle value in an ordered dataset For n data values the media is located at the (n + 1/2)th position. When - n is odd, the media will be the middle data value - n is even, the median will be the average of the two middle values.

Range The range, R, is the simplest measure of spread of a distribution. It is the difference between the largest and smallest values in the dataset. R = largest data value – smallest data value

The interquartile range The interquartile range (IQR) is defined as the spread of the middle 50% of data values, so that: IQR = 𝑄!"– 𝑄#"

7

Five Number Summary A listing of the median, M, the quartiles Q1 and Q3 and the smallest and largest data values of a distribution, written in the order minimum, Q1, median, Q3, maximum is known as a five-number summary.

The box plot A box plot is the graphical display of the five number summary.

In a box plot: - a box extends from Q1 to Q3, locating the middle 50% of the data values - the median is shown by a vertical line drawn with in the box - lines (called whiskers) are extended out from the lower and upper ends of the box to the smallest and largest data values of the dataset respectively.

Using a box plot to display outliers In a box plot, possible outliers are defined as being those values that are: - greater than Q3 + 1.5 × IQR (upper fence) - less than Q1 – 1.5 × IQR (lower fence)

Choosing between the mean and the median The mean and the median are both measures of the centre of a distribution. If the distribution is: - symmetric and there are no outliers, either the mean or the median can be used to indicate the centre of the distribution. - clearly skewed and/or there are outliers, it is more appropriate to use the median to indicate the centre of the distribution.

8

Standard deviation To measure the spread of a data distribution around the median (M) we use the interquartile range (IQR). To measure the spread of a data distribution about the mean (x) we use the standard deviation (s).

The 68-95-99.7% rule For a normal distribution, approximately: - 68% of the observations lie within one standard deviation of the mean - 95% of the observations lie within two standard deviations of the mean - 99.7% of the observations lie within three standard deviations of the mean.

Calculating standardised (z) scores To obtain a standard score for an actual score, subtract the mean from the score and then divide the result by the standard deviation. That is: standard score = actual score – mean/standard deviation

Converting standardised scores into actual scores By making the actual score the subject of the rule for calculating standard scores, we arrive at: actual score = mean + standard score × standard deviation

9

Associations between variables Response and explanatory variables When investigating the association between two variables the explanatory!variable (EV) is the variable we expect to explain or predict the value of the response variable (RV). Note: - EV = x-axis - RV = y-axis

Direction of an association -

Two variables have a positive association when the value of the response variable tends to increase as the value of the explanatory variable increases. ! Two variables have a negative association when the value of response variable tends to decrease as the value of the explanatory variable increases. ! Two variables have no association when there is no consistent change in the value of the response variable when the values of the explanatory variable increases. !

Form Linear

Non-linear

10

Strength of a linear relationship: the correlation coefficient (r) The strength of a linear association is an indication of how closely the points in the scatterplot fit a straight line. If the points in the scatterplot lie exactly on a straight line, we say that there is a perfect linear association. If there is no fit at all we say there is no association. In general, we have an imperfect fit, as seen in all of the scatterplots to date. To measure the strength of a linear relationship, a statistician called Carl Pearson developed a correlation coefficient, r, which has the following properties. -

If there is no linear association, r = 0. If there is a perfect positive linear association, r = +1. If there is a perfect negative linear association, r = −1.

Classifying the strength of a linear association

11

Interpreting the correlation coefficient -

If asked to interpret the value of the correlation coefficient use the following template sentences.

Linear, positive and strong It can be concluded that the y variable should increase as the x variable increases. Linear, negative and strong It can be concluded that the y variable should decrease as the x variable increases.

Linear, positive and moderate There is some evidence to suggest that the y variable should increase as the x variable increases. Linear, negative and moderate There is some evidence to suggest that the y variable should decrease as the x variable increases.

Linear, positive and weak There is limited evidence to suggest that the y variable should increase as the x variable increases. Linear, negative and weak There is some evidence to suggest that the y variable should decrease as the x variable increases.

The coefficient of determination The degree to which one variable can be predicted from another linearly related variable is given by a statistic called the coefficient of determination. The coefficient of determination is calculated by squaring the correlation coefficient: coefficient of determination = r2

Interpreting the coefficient of determination The coefficient of determination (as a percentage) tells us the variation in the response variable that is explained by the variation in the explanatory variable.

Which graph? Type of variable Response variable Explanatory variable Categorical Categorical

Numerical

Categorical

Numerical

Categorical (two categories only) Numerical

Numerical

12

Graph Segmented bar chart, sideby-side (parallel) bar chart Parallel box plots, parallel dot plots Back-to-back stem plot, parallel dot or box plots Scatterplot

Modelling linear associations The least squares regression line The least squares line is the line that minimises the sum of the squares of the residuals. The equation of the least squares regression line The equation of the least squares regression line is given by y = a + bx, where: the slope (b) is given by 𝑏 = $

$%! %"

and the intercept (a) is then given by 𝑎 = $ 𝑦' − 𝑏𝑥* Here: - r is the correlation coefficient - sx and sy are the standard deviations of x and y - 𝑥* and 𝑦' are the mean values of x and y.

Interpreting the slope and intercept of a regression line For the regression line y = a + bx: - the slope (b) estimates the average change (increase/decrease) in the response variable (y) for each one-unit increase in the explanatory variable (x) - the intercept (a) estimates the average value of the response variable (y) when the explanatory variable (x) equals 0.

Residuals residual value = actual data value – predicted value Residuals can be positive, negative or zero. - Data points above the regression line have a positive residual - Data points below the regression line have a negative residual - Data points on the line have a zero residual. Note: if there is no clear pattern in a residual plot, the association is linear.

13

The circle of transformations

Transformation 𝑥&

Outcome Spreads out the high x-values y relative to the lower x-values, leaving the yvalues unchanged. This has the effect of straightening out curves like the one shown opposite. The y-squared transformation works in a similar manner but stretches out the scale on the y-axis.

log x

Compresses the higher x-values relative to the lower x-values, leaving the yvalues unchanged.!This has the effect of straightening out curves like the one shown.!The log y transformation works in similar manner but compressing the scale on the y-axis.

1/y

The reciprocal y transformation y works by compressing larger values of y relative to lower values of y. This has the effect of straightening out curves like the one shown opposite. The reciprocal x transformation works the same way but in the x-direction.

14

Graph

Time series Trend The tendency for values in a time series to generally increase or decrease over a significant period of time is called a trend.

Cycles Cycles are periodic movements in a time series, but over a period greater than 1 year.

Seasonality Seasonality is present when there is a periodic movement in a time series that has a calendar-related period – for example a year, a month or a week.

15

Structural change Structural change is present when there is a sudden change in the established pattern of a time series plot.

Outliers Outliers are present when there are individual values that stand out from the general body of data.

Irregular (random) fluctuations Irregular (random) fluctuations include all the variations in a time series that we cannot reasonably attribute to systematic changes like trend, cycles, seasonality and structural change or an outlier.

Three-moving mean To use three-moving mean smoothing, replace each data value with the mean of that value and the values of its two neighbours, one on each side. That is, if y1, y2 and y3 are sequential data values, then: smoothed 𝑦& = $

'#('$('% !

The first and last points do not have values on each side, so leave them out.

16

Five-moving mean To use five-moving mean smoothing, replace each data value with the mean of that value and the two values on each side. That is, if y1, y2, y3, y4, y5 are sequential data values, then: smoothed 𝑦! = $

'#('$('%('&('' )

The first two and last two points do not have two values on each side, so leave them out.

Seasonal indices Key fact 1 Seasonal indices are calculated so that their average is 1. This means that the sum of the seasonal indices equals the number of seasons. Thus, if the seasons are months, the seasonal indices add to 12. If the seasons are quarters, then the seasonal indices would to 4, and so on. Key fact 2 Seasonal indices tell us how a particular season (generally a day, month or quarter) compares to the average season. For example: - seasonal index for unemployment for the month of February is 1.2 or 120%. This tells us that February unemployment figures tend to be 20% higher than the monthly average. Remember, the average seasonal index is 1 or 100%. - seasonal index for August is 0.90 or 90%. This tells us that the August unemployment figures tend to be only 90% of the monthly average. Alternatively, August unemployment figures are 10% lower than the monthly average. !

De-seasonalising data Time series data are de-seasonalised using the relationship: de-seasonalised figure = actual figure/seasonal index De-seasonalising can: - remove the seasonality from the time series plot - reveal a clear underlying trend in the data.

Re-seasonalising data Time series data are re-seasonalised using the rule:! actual figure = de-seasonalised figure × seasonal index

17

Interpreting the seasonal indices The seasonal index of: - 1.03 for summer tells us that summer sales are typically 3% above average - 1.15 for autumn tells us that autumn sales are typically 15% above average - 1.30 for winter tells us that winter sales are typically 30% above average - 0.52 for spring tells us that spring sales are typically 48% below average.

Correcting for seasonality Also, using the rule de-seasonalised figure = actual figure/seasonal index we can work out how much we need to increase or decrease the actual sales figures to correct for seasonality.

Correlation or causality Correlation is where there is an association between two variables Causation is where there is a meaningful association between two variables

18

-

Common response – both variables are linked to a third, shared variables

-

Confounding variable – too many factors to accurately tell what’s impacting the variable

-

Coincidence – simply by chance

19

20

Recursion LINEAR RECURRENCE RELATIONS .......................................................................................................... 22 ARITHMETIC AND GEOMETRIC SEQUENCES.......................................................................................... 23 DEPRECIATION ........................................................................................................................................ 24 INTEREST ................................................................................................................................................. 26 LOANS ..................................................................................................................................................... 27 EFFECTIVE INTEREST RATE ..................................................................................................................... 29 ANNUITIES AND PERPETUITIES .............................................................................................................. 30

21

Linear Recurrence Relations Terms of a sequence o o o o o o

Initial (principle) value = V0 or PV Following terms = V1, V2, V3, … nth term = Vn Term after nth term = Vn+1 Depreciation or payment amount= d *(% )..0. Rate of increase or decrease = r | given by 1 + $ 1.2"×"#44

Recurrence Relations Combination of a rule that links successive terms in a sequence together and the value of at least one of the terms in the sequence

First-order linear recurrence relations are given by the rule: 𝑉4 = 𝑃𝑉, $$𝑉5(# = 𝑟$ ×$ 𝑉5 + 𝑑

Rule for the nth term A rule that states the value of a term in a sequence based off the term number it is in the sequence. Given by: 𝑉5 = 𝑃𝑉 + 𝑑$ × $𝑛, when d is added to the previous term OR 𝑉5 = 𝑃𝑉$ ×$ 𝑟 5 when the previous term is multiplied by r

22

Arithmetic and Geometric Sequences Arithmetic sequence

Geometric sequence

Neither arithmetic or geometric

23

Depreciation Flat rate depreciation Depreciation is either a fixed amount or percentage of the purchase price per annum. Used when the depreciation is constant throughout the life of the asset based on time. Recurrence relation 𝑉4 = 𝑝𝑟𝑖𝑛𝑐𝑖𝑝𝑙𝑒, 𝑉5(# = $ 𝑉5 − 𝑑 Depreciation amount per period 𝑑 = 𝑟%$ ×$ 𝑉4 Rate per period 6 𝑟 = $ 7 $ × $100 (

Effective Life The time the asset will remain useful, 𝑛; the time until the asset reaches either its scrap value or is written off ($0 value). Rule for the future value after 𝑛 periods 𝑉5 = 𝑉4 − 𝑑 × 𝑛 Depreciation over n periods 𝐷𝑒𝑝𝑟𝑒𝑐𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑑$ × $𝑛 OR 𝐷𝑒𝑝𝑟𝑒𝑐𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑟%$ × $𝑉4 $ × $𝑛 OR 𝐷𝑒𝑝𝑟𝑒𝑐𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑉4 − $ 𝑉5

Unit cost depreciation Based on the maximum output (units) of the item, e.g. kilometres travelled depreciated per kilometre. Used when the depreciation is constant throughout the life of the asset based on usage. Recurrence relation 𝑉4 ...


Similar Free PDFs