Https- rstudio-pubs-static s3 amazonaws com 499684 c8b48a9b7f7c44ad97b83fcca623b9cd html PDF

Title Https- rstudio-pubs-static s3 amazonaws com 499684 c8b48a9b7f7c44ad97b83fcca623b9cd html
Course Financi
Institution Data Link Institute
Pages 79
File Size 1.4 MB
File Type PDF
Total Downloads 34
Total Views 128

Summary

notes, summaries and excerpts of lectures taken...


Description

Principal component analysis (PCA) Evan 5/25/2019

1. Intro As a data scientist, you’ll frequently have to deal with messy and high-dimensional datasets. In this chapter, you’ll learn how to use Principal Component Analysis (PCA) to effectively reduce the dimensionality of such datasets so that it becomes easier to extract actionable insights from them.

(1) Why the high dimension bother? Big Computatinoal cost to handle high-dimensional data. Estimation accuracy decreases. Difficult interpretation of the data.

(2) How do we trace correlation patterns? Correlation matrix is a matrix of correlation coefficients. Smaller number of dimensions translate to less complex correlation matrix.

(3) How do we deal with the Curse of Dimensionality? Two Solutions: Feature Engineering: Requires domain knowledge. Remove redundancy.

(4) Why reduce dimensionality? It is to explain as much data variation as possible while discarding highly correlated variables.

2. Exploring multivariate data We’ve loaded a data frame called cars into your workspace. Go ahead and explore it! It includes features of a big range of brands of cars from 2004. In this exercise, you will explore the dataset and attempt to draw useful conclusions from the correlation matrix. Recall that correlation reveals feature resemblance and it will help us infer how cars are related to each other based on their features’ values. To this end, you will discover how difficult it is to trace patterns based solely on the correlation structure.

(1) Explore cars with the summary() function and take a first look at the complexity of the data. data("mtcars") mtcars$cyl...


Similar Free PDFs