MATH2089-2 - Summary Numerical Methods and Statistics PDF

Title	MATH2089-2 - Summary Numerical Methods and Statistics
Course	Numerical Methods and Statistics
Institution	University of New South Wales
Pages	59
File Size	2 MB
File Type	PDF
Total Downloads	722
Total Views	1,007

Preview

CLICK TO PREVIEW PDF

Summary

MATH2089 Numerical Methods and Statistics (by taimoor and janzen)[Index]:The notes are alphabetically ordered by topic. Open the ‘document outline’ to navigate more easily, or click one of the links below to go directly to the topic.Statistics: Numerics:IndexGraphs ApproximationsProbability In...

Description

MATH2089 Numerical Methods and Statistics (by taimoor    and janzen) [Index]: The notes are alphabetically ordered by topic. Open the ‘document outline’ to navigate more easily, or click one of the links below to go directly to the topic. Statistics:

Numerics: Index 

Graphs

Approximations

Probability

Integration

Random Variables I

Interpolation

Random Variables II

Matrices I

Statistical Inference I

Matrices II

Statistical Inference II

Matrices III

Statistical Inference III

ODEs

:(

Polynomials MATLAB

[Approximations]: Floating Points: Computers operate on binary values. In order for them to display n on-integers, they would have to display floating points using a string of binary digits. - The first binary digit stores the sign of the number; ‘0’ if positive and ‘1’ if negative. - The proceeding string of binary digits stores the value of the exponent; ‘2N-M’, where ‘N’ represents the number represented by the string of digits, and ‘M = ‘2N-1  -1’ - The last string of binary digits stores the value of the mantissa (multiplying factor); ‘1 + 0.[string]’, where the string of digits represents the digits after the decimal point. - To get the actual floating point value, we multiply the exponent and the mantissa, using the sign from the first digit to determine the sign of the number. For a 32-bit precision system, the exponent will use 8 bits and the mantissa will use 23. For a 64-bit precision system, the exponent will use 11 bits and the mantissa will use 52. - E.G.: 1 10000000 1100...000 (32-bit precision) = - (2128-127  )(1 + 0.5 + 0.25) = -3.5.8 The ‘Floating Point Operations’ (or ‘flops’) per second, is a unit of measurement used to approximate the time it takes for an operation to be executed.

Order Notation for Functions: ‘O(nk)’ and ‘o(nk )’ represent the ‘Big O’ and ‘Little O’ of a certain function - the value of ‘k’ depending on whether ‘n’ approaches zero or infinity. Considering a function ‘F’ of terms with variable ‘n’, as ‘n’ approaches infinity. - For the Big O notation, ‘lim(n→∞) F(n)/nk = constant’. In other words, ‘ k’ represents the power of ‘n’ that dominates the function. - For the Little O notation, ‘lim(n→∞) F(n)/nk = 0’. In other words, ‘k’ represents the smallest integer greater than the power of ‘n’ that dominates the function. Considering a function ‘F’ of terms with variable ‘n’, as ‘n’ approaches zero. - For the Big O notation, ‘lim(n→0) F(n)/nk = constant’. In other words, ‘k’ represents the power of ‘n’ that least dominates the function. - For the Little O notation, ‘lim(n→0) F(n)/nk = 0’. In other words, ‘ k’ represents the greatest integer smaller than the power of ‘n’ that least dominates the function.

As an example, consider the function ‘F(n) = 3n2.6  + n2 + n’.  ) = o(n3 )’. - For ‘n→∞’, ‘F(n) = O(n2.6 - For ‘n→0’, ‘F(n) = O(n) = o(1)’. - Note that we can use order notation (when ‘n’ approaches zero or infinity) to replace functions (e.g.: for Taylor polynomials).

Taylor Approximations: The Taylor polynomial formula gives us that ‘f(x) = f(x0)/0! + f’(x0)*(x-x0)/1! + f’’(x0)*(x-x0)2 /2! + f’’’(x0)*(x-x0)3 /3! + ...’. - By defining ‘h = x - x0’, we can transform the equation to be ‘f (x0 + h) = f(x0)/0! + f’(x0)*h/1! + f’’(x0)*h2/2! + f’’’(x0)*h3/3! + ...’. - By setting ‘x = x9’, we get ‘f (x + h) = f(x) + f’(x)*h + f’’(x)*h2 /2 + f’’’(x)*h3 /3! + ...’, which will be used to derive the expressions for the forward and central difference approximations.

Forward Difference Approximation: Suppose we can evaluate values of ‘F’ and want to evaluate values of ‘dF/dx’. We can use the forward difference approximation (of order ‘O(h)’) to do this. By rewriting the above taylor polynomial expression, we get ‘f (x + h) = f(x) + f’(x)*h + O(h2)’, - Once rearranged, the expression allows us to approximate the derivative of a function.

-

The rounding error and truncation error of this approximation are given by ‘RE(h) = O(ε/h)’ and ‘TE(h)  = O(h)’. Hence, the total error is given by ‘RE(h) + TE(h) = O(ε/h) + O(h)’.

Central Difference Approximation: Suppose we can evaluate values of ‘F’ and want to evaluate values of ‘dF/dx’. We can also use the central difference approximation (of order ‘O(h2 )’) to do this. By rewriting the above taylor polynomial expression, we get that ‘f(x +  h) = f(x) + f’(x)*h + 2 3 f’’(x)*h /2 + O(h )’ . By replacing ‘h’ with ‘-h’, we get the expression ‘f(x - h) = f(x) - f’(x)*h + f’’(x)*h2 /2 + O(h3)’ . - Rearranging the sum of those two expressions will give us the central difference approximation of order ‘O(h2 )’, which similarly allows us to approximate the derivative of a function.

-

The rounding error and truncation error of this approximation are given by ‘RE(h) = O(ε/h)’ and ‘TE(h)  = O(h2)’. Hence, the total error is given by ‘RE(h) + TE(h) = O(ε/h) + O(h2)’.

The equation for the central difference approximation for ‘F’’(x)’ can be derived by adding an additional term to the polynomial equation (using ‘O(h4 )’ instead of ‘O(h3 )’), and rearranging that equation.

Optimal Step Size: As the step size ‘h’ approaches zero: - The truncation error (‘O(h)’ and ‘O(h2 )’) approaches zero. - The rounding error (‘O(ε/h)’) w  ill approach infinity. The o  ptimal step size to minimise the total error can be found using the total error equations. - For the forward difference approximation, the total error is given by ‘O(ε/h) + O(h) = c1*ε/h + c2h’. The derivative of the total error is thus given by ‘-c1*ε/h2 + c2’. When equated to zero (stationary point), it is revealed that the value of ‘h’ that minimises the total error (and thus the optimal step size) is ‘h = (c1*ε/c2)½ = O(ε1/2) ≈ 10-8’. - Using the same process, we can find the optimal step size for the central difference approximation is ‘h = O(ε1/3) ≈ 10-6’. Suppose we want to find the optimal step size in the forward/central approximation of the first derivative of a function, in MATLAB. - We can assign the approximation to a function (via ‘f = @(h) ...’). - We can then use a while loop to loop through different values of ‘h’ in powers of ‘10’. - Finally, we can graph the values of ‘h’ with the absolute error of the approximation (i.e.: ‘|F’(x)ACTUAL - F’(x)APPROX|’), and find where the absolute error is minimised. Absolute and Relative Errors: Suppose that ‘x1’ is the computed approximation of ‘x0’. - The absolute error of ‘x1’ would be given by ‘AE(x1) = |x1 - x0|’. - The relative error of ‘x1’ would be given by ‘RE(x1) = |x1 - x0|/|x0|’. If a value is stored using double precision floating point arithmetic (if the value is known to be “exact”/is given as an integer), then it has a relative error of ‘ ’. - As such, ‘ε|x|’ gives an estimate of this absolute error, in which ‘ε’ is the relative precision on a computer, and ‘x’ is the stored value. - This means that the ‘exact’ value is confined by the bounds 'x - ε|x|' and 'x + ε|x|'.

As such, if a solution computed by MATLAB has a value near ‘Ɛ’ (2.2204e-16), then it can be interpreted as ‘0’ within the limits of computational arithmetic.

Rounding Error: The rounding error refers to the error associated with rounding a value to ‘n’ number of significant figures or decimal places. - If ‘x’ is accurate to (or has correct values for at least) ‘n’ significant digits, then ‘x’ has a relative error less than ‘0.5 × 10-n’. - If ‘x’ is accurate to (or has correct values for at least) ‘n’ decimal digits, then ‘x’ has an absolute error less than ‘0.5 × 10-n’. As the r elative error approaches ‘1’ and the absolute error approaches ‘|x|’, the less ‘confidence’ we have in the solution. - The ‘confidence’ refers to the number of significant figures we can accurately say that the value of something is. - If the relative error is greater than ‘1’, then there are no significant digits in the computed solution. - NOTE: Never give an answer to more accuracy than the input data! Due to the way floating points are stored on computers, they have finite levels of accuracy. As such, d  oing arithmetic involving a large number and a comparably smaller number, will cause ‘catastrophic cancellation’, in which the solution would be very inaccurate. - This is caused by the relative error increasing substantially more than the absolute error, in which the confidence in the solution is decreased significantly. - If a function is iterative or periodic, rounding errors will accumulate, which can cause the computed answer to be significantly different from the actual answer. - To solve this, we can rearrange the expression with the catastrophic cancellation into a mathematically equivalent but numerically preferable expression.

[Graphs]: Background: When analysing a large number of units, it can be impractical to analyse each individual unit, in which a representative sample can be used to reach the same conclusion more efficiently. - In order for the sample to be a generalisation of the units, random sampling methods must be used. - This is done to make sure that the people responsible for sampling are impartial to the sampled units, so that the results will not be affected. There are two types of variables - categorical and numerical. - Categorical variables are qualitative, and have no numerical significance. They can either be ‘ordinal’ (with order) or ‘nominal’ (with no order). - Numerical variables are quantitative, and do have numerical significance. They can either be discrete (have distinct values) or continuous. By representing large amounts of data on a graph, the general pattern can be more easily established, and a conclusion of the data can be found. - “We can visually discern the overall pattern of variation”. - “Summary measures that tell where a sample is centred (measures of centre or location), and what is the extent of spread around its centre (measures of variability)”.

Dot Plots and Stem-and-Leaf Plots: A ‘dot plot’ is a type of graph where the data points are represented by dots. This is usually used for smaller sample sets. A ‘stem and leaf plot’ is another way of representing data on a graph, in which the ‘stem’ and ‘leaf’ of the graph can represent different things. - E.G.: Suppose the ‘stem’ represents the 10s units, and the ‘leaves’ represent the 1s units, in which the data set is ‘{1, 5, 12, 13, 29}’. Then the data can be represented on a ‘stem and leaf plot’ as: 0 | 15 1 | 23 2|9

Histograms: A ‘histogram’ is a type of graph useful for representing both categorical and numerical data, where ‘classes’ are used to group similarly categorised data together.

-

E.G.:

If the data is numerical and continuous, then we have to create artificial intervals, where each interval is a class. - Too few classes can cause information to be loss, while too many classes can cause information to be distorted and indiscernible. As a general rule of thumb, t he number of classes should be the square root of the number of data points. Typical terms used to describe histograms are ‘symmetric’, ‘skewed to the right/left’, ‘bimodal’ (two peaks), ‘unimodal’ (single peak), and ‘bell shaped’ (if symmetric and unimodal). - The asymmetry of a histogram can be described by its skewness. - If a histogram is skewed to the right/left, then the ‘tail’ of the histogram is on the right/left. Sometimes equally width classes may not be preferable (e.g.: when one of the classes has a  high frequency value while another has a very low frequency value). When this happens, w  e can group multiple classes together, such that one of the classes contains a larger width than the rest.

On the left is the graph before combining the classes, and on the right is after.

Density Histograms: A density histogram is a histogram with the vertical axis representing the densities of each class, as opposed to the frequencies. That way, the relative frequency of a class would simply be the area of the rectangle representing the class (hence the sum of all the areas of the rectangles will equate to ‘1’).

-

The relative frequency of a class is the frequency divided by the total number of observations (total frequency). The density of a class is the relative frequency divided by the class width.

Measures of Center and Variability: Once the histogram has been plot, we can observe where the samples are centred (measure of centre/location) and what the extent of spread around the centre is (measure  of variability). The ‘measure of centre’ (or ‘measure of location’) is usually represented by the arithmetic mean/average of the observation values. - The sample mean is given by the formula ’x = 1/n * ∑x i’. - The ‘measure of centre’ can also be represented by the median (or the middle-most) of the observation values (if the number of observations is even, then we just get the average of the two middle-most values). - Medians are preferable when there are many outliers (because medians aren’t as affected by the outliers); note that when calculating the mean and median, we still consider the outliers. The ‘measure of variability’ (or ‘measure of dispersion’) is usually represented by the sample variance. - The sample variance (s2 ) of a random sample is given by ‘s2 = 1/(n-1) * ∑(xi - x) 2’. - Alternatively, we can use the more practical equation ‘s2 = 1/(n-1) * [(∑ xi 2) - nx 2]’. - The units used by the variance, is the square of the unit used by the data values. We square the sample variance because otherwise, the sum would telescope to a zero value. Moreover, we use ‘n-1’ instead of ‘n’ because the sample variance has ‘n-1’ degrees of freedom. - If you get a negative variance value, it means that you have made an error. - The sample standard deviation (s) would simply be the square root of the variance.

Quartiles and Percentiles: We can divide the sample into four equal parts, in which the division points are referred to as ‘sample quartiles’. - The first/lower quartile (q1) is the median of the lower half of the sample. - The third/upper quartile (q3 ) is the is the median of the upper half of the sample. - The second/upper quartile (q2) is the median. - Generally, the ‘nth’ percentile/quantile of a sample will have ‘n%’ of the observations below, and ‘(100-n)%’ of the observations above. - When splitting an odd numbered sample set, we are to include the median in the upper and lower halves as well.

The ‘five number summary’ of a data set, is a set of 5 elements ‘{x(1), q1, q2 q3, x(n)}’, containing the ‘0th’, ‘25th’, ‘50th’, ‘75th’ and ‘100th’ percentile. - Make sure to answer in that exact order. - Note that ‘x(1)’ and ‘x(n)’ are just the smallest and largest elements. The ‘iqr’ (interquartile range) is given by ‘q3 - q1’. - The interquartile range describes the amount of variation in the middle half of the observations. - An outlier is an observation ’1.5 × iqr’ from the closest quartile. An extreme outlier is an observation ‘3 × iqr’ from the closest quartile. A mild outlier is an outlier that is not extreme.

Box Plots: A box plot (or a box-and-whisker plot) is a g  raphical display showing the five number summary as well as any outlier value. A central box spans the quartiles (from ‘q1’ to ‘q3’), and the horizontal line inside the box marks the median (‘q2’). - Lines extend above and below the box representing the largest and smallest observations that are not outliers (the observations are within ’1.5 × iqr’). - Observations more than ’1.5 × iqr’ outside the central box are plotted individually as outliers. - Due to their high visual impact, box pots are extremely useful and easy to understand. - E.G.:

[Integration]: Trapezoidal Rule: The t rapezoidal rule involves approximating the area under a function ‘f’ over the interval ‘[a,b]’, by using ‘N’ number of trapezoids. - The interval ‘[a,b]’ is divided into ‘N’ equal width intervals (or ‘N+1’ points). - The approximated area is defined by: ‘QN(f) = (b-a)/N * {f(x0)/2 + ∑i=1N-1f(xi) + f(xN)/2}‘. - Note that ‘x0 = a’ and ‘xN = b’. The e  rror in the approximation would be given by ‘EN(f) = -[f(2)(c)*(b-a)3]/(12*N2)’ for ‘c ∈ [a,b]’. - In other words, ‘EN(f) = O(N-2)’.

Simpson’s Rule: The S  impson’s rule involves approximating the area under a function ‘f’ over the interval ‘[a,b]’, by using ‘N’ even number of parabolas. - The interval ‘[a,b]’ is divided into ‘N’ equal width intervals (or ‘N+1’). - The approximated area is defined by: ‘QN(f) = (b-a)/(3*N) * {f(x0) + 4*f(x1) + 2*f(x2) + 4*f(x3) + … + 4*f(xN-1) + f(xN)}’. - Note that ‘x0 = a’ and ‘xN = b’. The e  rror in the approximation would be given by ‘EN(f) = -[f(4)(c)*(b-a)5]/(180*N4)’ for ‘c ∈ [a,b]’. - In other words, ‘EN(f) = O(N-4)’.

[Interpolation]: Introduction: Interpolation is a method of constructing new data points (by approximating) within the range of a discrete set of known data points. Regarding polynomial interpolation, we are able to approximate the shape of a polynomial when only given a finite number of data points. - Given ‘n’ number of data points, an interpolating polynomial will have degree ‘n-1’.

Lagrange Polynomials; We can interpolate a polynomial by placing the ‘n’ number of data points into a linear system, and solving to find the coefficients of the ‘n-1’ degree polynomial. - The practicality of this approach decreases dramatically as the number of data points increase. - E.G.: Consider an interpolating polynomial ‘p(x) = a0 + a1x + a2x2 + a3x3’ that interpolates ‘(5,1)’, ‘(-7.-23)’, ‘(-6,-54)’ and ‘(0,-954)’, using standard basis functions ‘{1, x, x2 , x3 }’. The linear system would look like this:

A more efficient way to interpolate ‘n’ number of data points is to use ‘Lagrange polynomials’ - A Lagrange polynomial is given by ‘Lj(x) = ∏0, k≠j n+1 (x - xk)/(xj - xk)’ - note that ‘xj’ refers to the ‘x’ value of the ‘jth ’ data point, and ‘xk’ refers to the ‘x’ values of the other data points. - The interpolating polynomial will be given by ‘∑0 n+1yj*Lj(x)’ - note that ‘yj’ refers to the ‘y’ value of the ‘jth ’ data point. - For ‘n’ number of data points, the interpolated polynomial will be a sum of ‘n’ Lagrange polynomials - each Lagrange polynomial being the product of ‘n-1’ terms (since the degree of the polynomial is ‘n-1’ and since ‘k ≠ j’). - E.G.:

[Matrices I]: Inverse and Transpose: If the determinant of a square matrix ‘A’, is non-zero, then the matrix has an inverse, ‘A-1  ’. - The inverse of the matrix is defined by ‘AA-1’ = A-1A = I’, where ‘I’ is the identity matrix. As such, it can be deduced that ‘A  | I = I | A-1’, which is a useful equation to find the inverse of a matrix, through multiple row reductions. - Some properties of inverse matrices include: ‘(AB)-1  = B-1A-1  ’. ’det(A-1  ) = [det(A)]-1  ’ and ‘B-1  (A+C)B = B-1  AB + B-1  CB’. The transpose of a matrix is defined by ‘[AT]ij = [A]ji’, in which the elements of the matrix are reflected along the main diagonal. - If ‘A = AT’, then the matrix is symmetric and square. - Some properties of transpose matrices include: ‘(AB)T = BTAT ’, ‘det(AT) = det(A)’, ‘(A-1  )T = (AT)-1  ’, T ‘(A + B) = AT + BT ’, and ‘AT ·B = A·B  ’.

Eigenvalues and Eigenvectors: An eigenvector ‘v’ and an eigenvalue ‘λ’ of a matrix ‘A’, obeys the equation ‘Av = λv’. An ‘n × n’ matrix will have ‘n’ non-zero, linearly independent eigenvectors and ‘n’ corresponding eigenvalues. - In order to find the eigenvalues, solve ‘det(A - λI) = 0’ to find the characteristic equation, and thus the values for ‘λ’. - In order to find the corresponding eigenvectors for each eigenvalue, substitute ‘λ’ into th...