ICA - Data Exploration (solutions) PDF

Title ICA - Data Exploration (solutions)
Author Harshad Patil
Course Introduction to Data Analytics for Engineers
Institution Northern Illinois University
Pages 8
File Size 248.5 KB
File Type PDF
Total Downloads 83
Total Views 142

Summary

Download ICA - Data Exploration (solutions) PDF


Description

Module 3 ICA - Data Exploration Fall 2018 Dr. Christine Nguyen This in-class activity was introduced on February 13, 2017. Importing datasets available through R and corresponding packages. Automatically imports the data named “islands” Contains the areas in thousands of square miles of the landmasses that exceed 10,000 square miles data("islands")

Some simple Data Exploration # How many observations are there? length(islands) ## [1] 48 # Central Tendency statistics mean(islands) # 1252.729 ## [1] 1252.729 median(islands) # 41 ## [1] 41 # Find smallest and largest values range(islands)[1] # 12 ## [1] 12 range(islands)[2] #and 16988 ## [1] 16988 max(islands) ## [1] 16988 min(islands) ## [1] 12 which.max(islands) ## Asia ## 3

which.min(islands) ## Vancouver ## 47 # Find how the data is distributed sd(islands) #standard deviation ## [1] 3371.146 # range size range(islands)[2] - range(islands)[1] ## [1] 16976 # Quantile quantile(islands) ## ##

0% 12.00

25% 20.50

50% 41.00

quantile(islands, c(.05, .95)) ## ##

5% 95% 13.00 8481.75

# interquartile range IQR(islands) ## [1] 162.75 # histogram with frequency hist(islands)

75% 100% 183.25 16988.00

# histogram with proportion hist(islands,prob=TRUE)

# box plot with outliers boxplot(islands)

# boxplot without outliers boxplot(islands, outline = F) boxplot(islands, plot=F)$out ## ## ## ##

Africa Antarctica Asia 11506 5500 16988 Greenland North America South America 840 9390 6795

Australia 2968

Europe 3745

title("Boxplot of Islands", ylab = "Area (1000s of Square Miles)")

#stem and leaf plots stem(islands) ## ## ## ## ## ## ## ## ## ## ## ##

The decimal point is 3 digit(s) to the right of the | 0 2 4 6 8 10 12 14 16

| | | | | | | | |

00000000000000000000000000000111111222338 07 5 8 4 5

0

# Notice that in the Environment, the object islands is called a "named num vector". Use names on islands. What is the output? names(islands) ## ## ## ## ## ## ##

[1] [4] [7] [10] [13] [16] [19]

"Africa" "Australia" "Banks" "Celebes" "Devon" "Greenland" "Hokkaido"

"Antarctica" "Axel Heiberg" "Borneo" "Celon" "Ellesmere" "Hainan" "Honshu"

"Asia" "Baffin" "Britain" "Cuba" "Europe" "Hispaniola" "Iceland"

## ## ## ## ## ## ## ## ##

[22] [25] [28] [31] [34] [37] [40] [43] [46]

"Ireland" "Luzon" "Mindanao" "New Guinea" "Newfoundland" "Prince of Wales" "Southampton" "Taiwan" "Timor"

"Java" "Madagascar" "Moluccas" "New Zealand (N)" "North America" "Sakhalin" "Spitsbergen" "Tasmania" "Vancouver"

"Kyushu" "Melville" "New Britain" "New Zealand (S)" "Novaya Zemlya" "South America" "Sumatra" "Tierra del Fuego" "Victoria"

Part B #### A different data set scores...


Similar Free PDFs