Title | What is Regression Regression BWSI152 Courseware Beaver Works Summer Institute ed X |
---|---|
Author | John Hussey |
Course | Financial Analysis For The Umd Student Managed Fund (Smf) |
Institution | University of Massachusetts Dartmouth |
Pages | 7 |
File Size | 436.9 KB |
File Type | |
Total Downloads | 81 |
Total Views | 135 |
Practice...
BWSI: BWSI152 Medlytics 2020 Module 2: Probability and Statistics > Regression > What is Regression? Bookmarks
What is Regression?
Search
Bookmark this page
Before You Start
Recall from the introduction of this module our motivating problem: "How doesX correlate with Y?" or "If I know X, whatdo I expect
Module 1: Introduction to Working with Medical Data
Y will be?" For example: How does a child'sage(X)correlate with hisbody-mass-index(Y)? How does alcohol consumption(X) correlate with liver disease (Y)?
Module 2:
Probability and Statistics Module 2 Introduction
How do pixels inmammogramimage (X) map to healthy or cancerous cells (Y)? Here,
and
are random variables which represent quantities of
interest such as exposures/treatments and outcomes/measurements in
Random Variables Knowledge Checks
Regression Knowledge Checks
a trial. What we are typically trying to do is leverage data (many realizations or observations of describes how
is related to
and
) to learn a function
that
, such that:
Accounting for Confounders Knowledge Checks
▶ Homework: Probability & Stats Homework
Module 2 Feedback Surveys and Feedback
Let's assumethere exists an unknown function want to learn
Module 3: From Statistics to Inference
, and we
from data.There are two general types of problems:
Regression: maps to continuous-valued (for example, maycorrespond to a patient's respiratory rate)
Module 4: Introduction to Machine Learning
Classification: maps to discrete-valued (for example, may correspond to whether a cell is a normal tissue cell or a cancerous cell) Regression is a statistical modeling technique and is alsothe basis for training many (supervised) machine learning algorithms.In this module
Module 5: Artificial Neural Networks
we will cover the basics of regression and show how we can use it to learn model parameters.
Inregression, we assume the function
takes the form:
where
are the unknown parameters of function . For example,
may include the mean and covariance of a Gaussian distribution, or the coefficients of a polynomial. Given
observed data points (where the -th data point is given by ), the goal of regression is to find the parameter values
thatbest fits the data by minimizing the residuals. The residual of data point is simply the difference between the actual value of the dependent variable
and the predicted value based on the model:
. The residual is therefore:
The least-squares regression method finds the "best" parameter values by minimizing the loss
, which is defined as the sum of squared
residuals :
is typically referred to as a "loss function" (other names include objective function, utility function, fitness function, ...). There are many differentloss functions and a variety ofregularizationoptions, but we will just use the above quadratic loss for now. Depending on the form of the function
(linear, nonlinear, categorical,
etc.), there are different regression algorithms available. In the following sections, we will introduce you to a few of the most common regression techniques.
Regression is often thought of as finding the "best fit curve" that describes the relationship between
and . Consider the toy example
belowwhere we want to learn the relationship between
and .
Here, there are 30 data points
that were generated using a
polynomial function with some additive noise:
As a reminder, apolynomial takes the form:
The purpose of regression is to learn the coefficients
that best fit the
data. Theplots below show the optimal curves (learned using least squares regression) for differentvalues of
. In this unitwe will
introduce several techniques for solving this type of problem.
For the remainder of this module, we assume the student alreadyhas familiarity with the basics of linear algebra and calculus. If you are not yet comfortable in these areas,here are some links you may want to review first: Khan Academy: linear algebra vector math matrix vector products Khan Academy: calculus differentiating polynomials sine and cosine derivatives differentiating products chain rule gradients
Types of Problems 2/2 points (graded)
Fill in the blanks: A _____ problem finds a function variable.
that maps to a continuous random
regression
A _____ problem finds a function variable. classification
that maps to a discrete random
Submit
Regression Models 1/1 point (graded)
For the regression model above, the represents _____, the represents _____, and the represents _____. : input random variable; andom variable : output random variable; nknown parameters : unknown parameters; andom variable
Submit
: unknown parameters;
: input random variable;
: input random variable;
You have used 1 of 2 attempts
: output
:
: output...