Data analysis 1 PDF

Title Data analysis 1
Course Introductory Statistics I
Institution George Mason University
Pages 10
File Size 477.3 KB
File Type PDF
Total Downloads 82
Total Views 144

Summary

Dr. Hunter, Data analysis 1, topics Frequency tables, numerical and categorical graphs ,z- score...


Description

1

XXX STAT-250 Data Analysis Assignment 1 Problem 1 a) Frequency table results for Genre of 2018 Movies: Contingency Table results: Genre Frequency Percent of Total Action/Adventure 25 14.79 Children's/Animated 21 12.43 Comedy 31 18.34 Drama 50 29.59 Horror/Thriller 8 4.73 Mystery/Crime 14 8.28 SciFi/Fantasy 20 11.83 b) The least popular company by percent of total is Horror/Thriller and has a value of 4.73% and the most popular genre was Drama with 29.59 % all views coming from this genre. c) Contingency table results: Rows: Genre Columns: Viewer Rating Disliked Action/Adventure Children's/Animated Comedy Drama Horror/Thriller Mystery/Crime SciFi/Fantasy Total

2 (1.18%) 7 (4.14%) 6 (3.55%) 12 (7.1%) 2 (1.18%) 1 (0.59%) 6 (3.55%) 36 (21.3%)

Liked

Total 23 (13.61%) 14 (8.28%) 25 (14.79%) 38 (22.49%) 6 (3.55%) 13 (7.69%) 14 (8.28%) 133 (78.7%)

25 (14.79%) 21 (12.43%) 31 (18.34%) 50 (29.59%) 8 (4.73%) 14 (8.28%) 20 (11.83%) 169 (100%)

d) The number of movies that the viewer disliked was 36 movies in total and the total percentage of movies that this individual viewed and disliked was 21.3%. e) The column titled Percent of Total in the one way table has the same percentage values as the column titled Total in the two-way table. In each of the seven rows in the two way

2

and one-way table have the same percentages that correspond to the percentage of movies viewed in that genre. In the one-way table with the column titled Frequency, the values are the same as in the two-way table’s column titled Total where the values located above the percentages represent the total number of movies viewed in each of the 7 respective genres’ rows. f) Contingency table results: Rows: Genre Columns: Viewer Rating Count: (Row percent) Disliked Action/Adventure Children's/Animated Comedy Drama Horror/Thriller Mystery/Crime SciFi/Fantasy Total

Liked 2 (8%) 7 (33.33%) 6 (19.35%) 12 (24%) 2 (25%) 1 (7.14%) 6 (30%) 36 (21.3%)

Total 23 (92%) 14 (66.67%) 25 (80.65%) 38 (76%) 6 (75%) 13 (92.86%) 14 (70%) 133 (78.7%)

25 (100%) 21 (100%) 31 (100%) 50 (100%) 8 (100%) 14 (100%) 20 (100%) 169 (100%)

3

Contingency table results: Cell format Count (Column percent) Disliked

Liked

Total Action/Adventure 2 23 25 (5.56%) (17.29%) (14.79%) Children's/Animated 7 14 21 (19.44%) (10.53%) (12.43%) Comedy 6 25 31 (16.67%) (18.8%) (18.34%) Drama 12 38 50 (33.33%) (28.57%) (29.59%) Horror/Thriller 2 6 8 (5.56%) (4.51%) (4.73%) Mystery/Crime 1 13 14 (2.78%) (9.77%) (8.28%) SciFi/Fantasy 6 14 20 (16.67%) (10.53%) (11.83%) Total 36 133 169 (100%) (100%) (100%) g) Row percentage in Children’s/Animated cell means that out of the total movies from the genre called Children’s/Animated, this individual liked 66.67% of the films that they viewed in this specific genre. h) The column percentage listed at the Children’s/Animated row under the liked column represents that out of the total number of movies that the viewer watched, that 10.53% of the movies that the viewer liked were from the Children’s/Animated genre.

Problem 2:

4

a)

b)

c) d)

T h e

drama genre was the most watched genre for this individual because this person watched 50 drama films and 50/169 or 29.59% of all movies that they viewed were from the drama genre.

5

e)

f) For

the graph that is titled Relative Frequency of Viewer Rating by Genre, when category along the x axis are viewer ratings and disliked and liked categories percentages are added together respectively, the total number disliked, and liked movies represent the total percentage of movies viewed. In the graph titled Relative Frequency of Movie Genre by Viewer Rating, the liked and disliked movies percentages when added together by genre represent the total percentage of movies viewed in each respective genre. Also, while in problem 1(h) where 10.53% of all liked movies were from the children’s/animated genre, in problem 2(e), the graph grouped by viewer rating instead references the relative frequencies of the viewer ratings per each category so the number of liked and disliked movies in the genre categories are respective to the total views per individual genre.

6

Problem 3: A)

B) The shape of this distribution is skewed-right because of the tail extends to the right of the median. C) Summary statistics: Column

n

Mean

Duration

300.00

Std. dev.

12.71

9.15

Median

Q3

D) Summary statistics: Column Duration

Min 2.00

Q1 7.00

10.00

Max 16.00

55.00

IQR 9.00

7

E) The appropriate summary statistics for a skewed-right distribution for the center is the median which has a value of 10.00 minutes and for spread it is IQR which has a value of 9.00 minutes. F) Lower Fence calculation: Q1- 1.5(IQR)= 7- 1.5(9)= 7-13.5= -6.5, but the minimum value in this data set is 2, so the lower fence =2 Upper Fence calculation: Q3+1.5(IQR) = 16+1.5(9) = 29.5 Outliers are any points that exist below 2 minutes and above 29.5 minutes. G)

H) There is a total of 17 outliers in this dataset.

8

Problem 4: a)

b) The shape of the histogram that shows the relative frequency of SAT math scores for females is symmetric because of the lack of tail that extends either left or right. The shape of the histogram that shows the relative frequency of SAT math scores for males is skewed to the left with an outlier to the far left. c) Summary statistics for Math: Group by: Gender Gender n Mean Std. dev. F 145 592 M 155 628 d) 68% interval: (498, 686) 592- 94= 498, 592+94= 686 95% interval: (404, 780) 592 – 94(2) = 404, 592+94(2) = 780

94 89

9

99.7% interval: (310, 874) 592- 94(3) =310, 592+94(3)= 874 e) 96/145 = .662(100) = 66.2% 68% interval count: 96 68% interval percentage: 66.2%

140/145=.966 (100) = 96.6% 95% interval count:140 95% interval percentage: 96.6%

145/145= 1x100= 100% 99.7% interval count: 145 99.7% interval percentage:100% f) The three percentages found in 4(E) do not exactly match the empirical calculations but they are extremely close. The expected percentages calculated by the empirical rule for the first standard deviation is 68% while the actual calculated value for the first interval is 66.2%. Then for the second interval the empirical rule’s calculated value is 95% while the actual calculated value for this data set is 96.6% and then for the third interval of the empirical rule’s expected value was 99.7% while the actual value for this data was 100% of the datapoints included in this interval. g) Z score = (700- 592.8)/93.8= 1.14

10

This z score indicates that the new female student’s score of 700 is 1.14 standard deviations from the mean so the SAT score of 700 is a usual score because the z score is not greater than 2 or less than -2....


Similar Free PDFs