Lecture Notes of Statistical Inference PDF

Title Lecture Notes of Statistical Inference
Author Ozan Aksu
Course Statistics for Industrial Engineers
Institution Bogaziçi Üniversitesi
Pages 185
File Size 3 MB
File Type PDF
Total Downloads 23
Total Views 148

Summary

These are the lecture notes of the Statistical Inference course which is given in Bogazici University. This is a course that every graduate student of Industrial Engineering department should take....


Description

Statistical Inference Wolfgang H¨ormann M. G¨ uray G¨ uler February 5, 2019

2

Contents 1 Introduction

7

2 The Likelihood Function and Parameter Estimation 2.1 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Sufficient Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Sufficient Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Minimal Sufficient Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Exponential Families: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Expectation of t(y) for the single Parameter Case: k = r = 1 . . . . . . . . 2.4 Maximum Likelihood Estimation: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Estimates and the Fisher Information . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Cramer-Rao Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Observed Fisher Information . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Properties of the MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 MLE for Regular Exponential Families . . . . . . . . . . . . . . . . . . . . . 2.6.2 Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.3 Data Recorded as Frequency Table . . . . . . . . . . . . . . . . . . . . . . . 2.7 R Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 Binomial Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.2 Normal Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.3 MLE Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9 9 10 12 12 13 14 15 17 18 19 20 21 21 22 22 23 23 23 23 24 25

3 Hypothesis Testing 3.1 General Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Statistical Test Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Power Function and Significance Level . . . . . . . . . . . . . . . . . . . . . 3.1.4 How to Design a Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 The Neyman-Pearson Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Deriving the Test for the Mean . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 The General Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Comparing the Parameter Values of Two Independent Samples . . . . . . . 3.2.5 P-value and Practical Considerations . . . . . . . . . . . . . . . . . . . . . . 3.2.6 Nuisance Parameters and Profile Likelihood . . . . . . . . . . . . . . . . . . 3.2.7 Simulating the P-Value and the Power . . . . . . . . . . . . . . . . . . . . . 3.3 Important Examples of the LRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 LRT for Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 LRT for Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 LRT Test for Binomial Distribution . . . . . . . . . . . . . . . . . . . . . .

29 29 29 29 30 30 30 30 31 32 33 33 34 34 35 35 36 36

3

4

CONTENTS

3.4 3.5

3.6

3.3.4 Normal Distribution: One-Sample Two-Sided Student’s t . . . . . . . . . . 3.3.5 Asymptotic Distribution of the LRT for the Exponential Distribution . . . 3.3.6 Two-Sample LRT for the Normal Distribution (Two Sample t-Test) . . . . 3.3.7 Two-Sample LRT for the Exponential Distribution . . . . . . . . . . . . . . Interval Estimation (Confidence intervals and Regions) . . . . . . . . . . . . . . . . 3.4.1 Approximate Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 T-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Two Sample t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Exponential Two Sided Test . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Linear Models 4.1 The Linear Model and Its Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Least Squares Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Geometric Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Statistical Properties of the Estimates . . . . . . . . . . . . . . . . . . . . . 4.3 Normal Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 An Example for the LRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 F-Tests for the Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37 38 39 41 43 43 43 43 44 44 45 49 49 50 50 51 52 52 54 55 56 56

5 Applied Multivariate Statistics 59 5.1 Checking the Model Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.1.1 Graphical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.1.2 The Data Mining Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.2 Linear Regression in Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.2.1 Just One Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.2.2 Many Variables, No Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.2.3 A Few Variables, Theory Available . . . . . . . . . . . . . . . . . . . . . . . 62 5.3 One-Way Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.3.1 The Basic F-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.4 Analysis of Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.4.1 Including Nominal Variables into Regression . . . . . . . . . . . . . . . . . 63 5.4.2 Other Possible Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.5 Box-Cox Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.6 Selecting the Correct Input Distribution . . . . . . . . . . . . . . . . . . . . . . . . 64 5.6.1 Akaike’s Information Criterion: AIC . . . . . . . . . . . . . . . . . . . . . . 65 6 The Generalized Linear Model (GLM) (new Version 2016) 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 GLM with the Poisson family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 The Log-likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Deviance and Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Example for Regression with Poisson Responses . . . . . . . . . . . . . . . . 6.2.4 Example for ANOVA with Poisson Responses . . . . . . . . . . . . . . . . . 6.3 GLM with the Binomial Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Example for Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Example ANOVA with Binomial Data . . . . . . . . . . . . . . . . . . . . . 6.4 GLM with the Gamma Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Example Regression with Gamma Responses . . . . . . . . . . . . . . . . . 6.4.2 Regression with Gamma Responses: Checking the link function . . . . . . . 6.4.3 ANOVA with Gamma Errors . . . . . . . . . . . . . . . . . . . . . . . . . .

67 67 67 68 68 68 70 71 72 74 75 76 77 79

CONTENTS 6.5

5

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A An Introduction To R

80 83

B Basic Probability and Statistics Review B.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.1 Measures of a RV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.2 Important Discrete RVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.3 Important Continuous RVs . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.4 Some Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ¯ . . . . . . . . . . . . . . . . . . . . . . . . B.1.5 Distributions for Sample Mean X B.1.6 Distributions for Sample Variance S 2 . . . . . . . . . . . . . . . . . . . . . . B.1.7 Confidence Intervals concerning µ . . . . . . . . . . . . . . . . . . . . . . . B.1.8 Confidence Intervals concerning σ 2 . . . . . . . . . . . . . . . . . . . . . . . B.2 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2.1 Some Intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2.2 Sampling Distributions and Hypothesis Testing . . . . . . . . . . . . . . . . B.2.3 One-Sided and Two-Sided Tests . . . . . . . . . . . . . . . . . . . . . . . . B.2.4 α Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2.5 Relation to Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . B.2.6 p-Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2.7 Type I and Type II errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.3 Transformation Theorem and Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . B.3.1 Bias of σ 2 Estimators for the Normal Distr. when µ is given . . . . . . . . . B.4 Taylor’s Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89 89 89 89 90 91 91 92 92 94 95 95 95 96 96 97 97 97 99 103 105

C Solutions C.1 Solutions to the Exercises . . . . . . C.1.1 Exercises in Chapter 2 . . . . C.1.2 Exercises in Chapter 3 . . . . C.1.3 Exercises in Chapter 4 . . . . C.2 Other Exercises and Their Solutions C.3 Initial Exercises and Their Solutions

107 107 107 122 139 151 153

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

D Assignments and Their Solutions 163 D.1 Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 D.2 Assignment Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

6

CONTENTS

Chapter 1

Introduction These lecture notes try to collect the main material for a graduate course of statistical inference for industrial engineers. The aim of this course is to present the main concepts of mathematical statistics and to demonstrate their practical application using the statistical software R. We felt the necessity to prepare such notes instead of using a standard text book as we simply were not able to find a suitable book that combines the mathematical concepts with their applications to real world problems. As basic for the mathematical parts of these lecture notes we use the book ”Statistical Inference Based on the Likelihood” by Azzalini (1996). Actually Chapter 2 on estimation, Chapter 3 on testing and Chapter 4 on linear models can be seen as simplified and much shortened reformulations of the corresponding chapters of that book; we added R-code for producing plots and simulations thus trying to clarify mathematical concepts and to facilitate the application. For the mathematical details and for many of the proofs we refer the reader to Azzalini (1996). To reach the aims of the course we added Chapter 5 that presents multivariate regression applications, discusses practical model checking approaches and also discusses one-way ANOVA examples. Finally Chapter 6 contains our practically oriented introduction to Generalized Linear Models (GLM). A main reason for preparing these lecture notes was to make all our exercises, our Introduction to R and our R-codes of the many examples available to the students. These lecture notes do not intend to replace the lecture and also not the book of Azzalini (1996). Therefore the motivations and discussions of many of the concepts are often short. The main aim is to facilitate the application, the R-coding and the interpretation. We hope that this aim can be reached by studying the numerous exercises. As special service the solutions for most of the exercises are presented in Appendix C.1. For the application of statistics a proper software is of highest importance. We use the statistical software package R (R Development Core Team (2010)), which is, due to its programming language, very flexible and also well suited to code small simulations. Another important advantage is that it is freely available. For a very short introduction to R see Appendix A. The lecture notes of a graduate course are of course not a first introduction into statistics. We assume that the reader knows calculus, linear algebra, basic probability and has also some ideas about statistical estimation and testing. For a very short introduction into basic statistical concepts and into special topics of probability and calculus that are required at certain places of this lecture notes see Appendix B What is the difference between probability and statistics? (this section uses https://www.linkedin.com/pulse/whats-difference-between-probability-statisticsbobby-rigano and links given there.) Probability and statistics are related areas of mathematics which concern themselves with analyzing the relative frequency of events. Still, there are fundamental differences in the way they 7

8

CHAPTER 1. INTRODUCTION

see the world: Probability deals with predicting the likelihood of future events, while statistics involves the analysis of the frequency of past events. Probability is primarily a theoretical branch of mathematics, which studies the consequences of mathematical definitions. Statistics is primarily an applied branch of mathematics, which tries to make sense of observations in the real world. Both subjects are important, relevant, and useful. But they are different, and understanding the distinction is crucial in properly interpreting the relevance of mathematical evidence. Many a gambler has gone to a cold and lonely grave for failing to make the proper distinction between probability and statistics. :-) The short answer to this from Persi Diaconis is the following: the problems considered by probability and statistics are inverse to each other. In probability theory we consider some underlying process which has some randomness or uncertainty modeled by random variables, and we figure out what happens. In statistics we observe something that has happened, and try to figure out what underlying process would explain those observations. Some also like the example of a jar of red and green jelly beans. A probabilist starts by knowing the proportion of each and asks the probability of getting 2 red and 3 green jelly beans when drawing 5 times. A statistician infers the proportion of red jelly beans by sampling from the jar. What is statistical inference? Statistical inference (also called statistical induction or inferential statistics) is the process of drawing conclusions from (random) samples. Statistical inference is applied to decide which type of probability model, and especially which parameter values for a certain probability model, are best suited to describe the real world quantities we are interested in.

Chapter 2

The Likelihood Function and Parameter Estimation (Azzalini (1996) Chapters 2 and 3)

2.1

Models

Statistical Inference: We want to draw conclusions about the unknown distribution F ∗ (.) of Y . Fundamental assumption: Experiment has a result Y , which can be described as a random variable. F, the set of all possible choices for the distribution F ∗ (.) of Y . Parametric statistical model: All elements F of F have the same mathematical form and are only different due to different parameter values θ ∈ Θ ∈ Rk . Thus a parametric statistical model (with k parameters) is a set F of CDFs: F = {F (., θ) : θ ∈ Θ ⊂ Rk } In practically all cases the CDFs have either a pdf (continuous case) or a pmf (discrete case) and we can write the statistical model as: F = {f (., θ) : θ ∈ Θ ⊂ Rk } with f (., θ) a density (either pdf or pmf) of the random variate. θ is called the parameter, Θ the parameter space and F the statistical model. (We only consider parametric models in this lecture). The sample space Y is the set of all possible sample outcomes y compatible with a given parametric model. Yθ .... support (=domain) of the density f (., θ). Y= union of all Yθ with θ ∈ Θ. Example: Binomial distribution: Y ∼ Bin(n, θ)

Index n fixed. Parameter θ . Θ = (0, 1)

Yθ is the same for all θ and thus Yθ = Y = 0, 1, . . . , n 9

10

CHAPTER 2. THE LIKELIHOOD FUNCTION AND PARAMETER ESTIMATION Example: Y a single observation of Uniform distribution with [0, θ]. Θ = (0, ∞)

Y = [0, ∞)

Example: Y a single observation of Normal distribution with unit variance N (θ, 1). Θ=R Y =R Definition: A simple random sample (srs) is subset of a statistical population in which each member of the subset has an equal probability of being chosen. Assuming that the parent population is very large the result of consecutive random drawings are a sequence of iid. random variates. Example: Srs of N (θ, 1) distribution. Y a vector of n iid. Normal variates with unit variance. Θ=R Y = Rn Example: Srs of Poisson(θ) distribution. Y a vector of n iid. Poisson variates Θ = (0, ∞)

Y = N0n

2.2

Likelihood

For the statistical model F , from which a sample y ∈ Y has been observed, the likelihood (function) from Θ into R+ is written as L(θ) = L(θ, y) = c(y) f (y, θ) where c(y) is a positive constant depending on y but not on θ . The Log-likelihood function is the natural log of the likelihood function: l (θ) = log L(θ) = log c(y) + log f (y, θ ) As we can see above, the likelihood function is as formula identical to the density. The only but important difference is that the likelihood function is considered as function of the parameter θ not of hte observation y. Therefore the likelihood is not a density! Important is the shape of the likelihood for changing values of θ. Therefore likelihoods are called equivalent if they are the same up to a multiplicative constant, that can depend on y but must not depend on θ . Example (Binomial): Interested in number Y of defective pieces for 50 experiments. f (y, θ) =

50 y

θ y (1 − θ)50−y

So the likelihood function for y = 4 will be:   L(θ) = 450 θ 4 (1 − θ)46 The Log-likelihood function is given by the following:

l (θ) = log c + 4 log θ + 46 log(1 − θ )   where c = log 450 . The R-commands in Section 2.7.1 plot the above (log-) likelihood function (see Figure 2.1(a)). They also implement the likelih...


Similar Free PDFs