Lab1 Exploratory Data Analysis PDF

Title Lab1 Exploratory Data Analysis
Author Kaitlyn Oliveri
Course Introduction To Statistics
Institution California State University Monterey Bay
Pages 8
File Size 400 KB
File Type PDF
Total Downloads 90
Total Views 148

Summary

Mandatory Lab...


Description

STAT 100: Introduction to Statistics Lab 1: Exploratory Data Analysis

Lab Grading Scale There will be a total of 20 points on each lab assignment. Grades are to be awarded individually based on the entire group’s grade. Completeness (5 points total) These points are awarded based solely on whether all parts of the lab are complete. 0 points

0% of the lab complete

1 points

Between 0% and 25% of the lab complete

2 points

Between 25% and 50% of the lab complete

3 points

Between 50% and 75% of the lab complete

4 points

Between 75% and 99% of the lab complete

5 points

100% of the lab complete

Correctness and Interpretation (15 points) Point values will be assigned to each question. Points may be awarded on a partial scale based on correctness of answer, interpretation and proper use of vocabulary. For example – a 1 point question will be graded as follows 0 points: Incomplete or Incorrect ½ point: Answer mostly correct, but some error 1 point: Answer correct

Lab Submission Instructions If all group member names are not written in the lab, in the file name, and in the submission text box then the missing group member will not get credit. 1. Make sure all group member names are under NAMES at the top of the lab 2. Save your file as Lab1_Name1_Name2_Name3.docx, where Name1, Name2, and Name3 are the names of your group members. 3. The Submitter will submit one copy of the lab for the whole group 4. In iLearn go the Lab Assignment Location 5. Under Online Text enter all of your group member names 6. Upload the Lab document into the File Submission section 7. Press Save Changes (note you may edit the submission up until the due date) 8. Do a little dance – you are done! NO LATE WORK IS ACCEPTED!!

Page 1 of 8

STAT 100: Introduction to Statistics Lab 1: Exploratory Data Analysis

Lab 1: Examining the Fall 2016 STAT 100 Students PLEASE READ: This is your first lab using the software package, StatCruch, which you can access through www.statcrunch.com after you register and pay for your subscription. We are going to use graphics and summary statistics to explore the characteristics of the STAT 100 Students. We have a sample of 251 total students represented in the survey (some students and information were removed to keep the data anonymized). Keep in mind that not everyone answered every question so for some variables the sample size is smaller. Loading Data Usually we make the data available on StatCrunch through a link, but since this is personal data we will have you upload from a file. 1. Download the data file from iLearn 2. In StatCrunch select Data>>From file>>on my computer

3. Select the file from your computer. The remaining settings should automatically populate, but make sure that Preview looks correct. There are a lot of other settings, do no worry about these.

4. At the bottom of the page select Load File 5. Get started! How to Make Graphs and Calculate Summary Statistics in StatCrunch For each chart, before pressing Create Graph – press Next> to look at the different options for naming the axes and creating a title. COME BACK TO THESE INSTRUCTIONS AS NEEDED ● Pie Chart: Graphics>>Pie Chart>>with data>>Select Column/Variable>>(Choose Settings for Axis Labels, Title, etc.)>>Compute! ● Bar Plot: Graphics>>Bar Plot>>with data>> Select Column/Variable>>(Choose Settings for Axis Labels, Title, etc.)>>Compute!

Page 2 of 8

STAT 100: Introduction to Statistics Lab 1: Exploratory Data Analysis ● Histogram: Graphics>>Histogram>> Select Column/Variable>>(Choose Settings for Axis Labels, Title, etc.)>>Compute! ● Summary Statistics: Stat>> Summary Stats>> Under Columns>> Select All Columns*>> Under Statistics>> Select Mean & Median* (or other statistics of interest)>> Compute! *To select more than one item, hold ‘Ctrl’ while selecting the items ● Boxplots: Graph>> Boxplot>> Select All Columns>> Under ‘Grouping options’: Check Use Fences and Draw Horizontally>> Under ‘Graph properties’: Label Axes and Create Title>> Under ‘For multiple graphs’: Check Use same X axis for multiple graphs>> Compute! ● Creating multiple histograms with same x-axis: Graph>> Histogram>> Select All the Columns>> Starts bins at 0 and Binwidth as a reasonable value>> Add Labels for Y Only (Do Not Put an X-label)>> Under ‘For multiple graphs’: Check Use Same X and Y axes>> Select Rows and Columns per page so there are enough cells for all graphs>> Compute! If you mess up on a graphic you can always select ‘Options’ on the graphic window and ‘Edit’ to make changes to your settings instead of starting from scratch. To copy graphs into the lab, 1. With the graphic open, go to Options and select Copy. 2. Right Click and select Copy Image 3. In Microsoft Word use the shortcut Ctrl+Alt+V or select Paste and Paste Special and Paste as a Device Independent Bitmap. Please make all graphs no bigger than 2 inches in height.

Part 1: Developing Intuition about Data Before we being our exploration of our class data with graphics, let us first try to develop some intuition about the variables we measured through our survey. Be sure to use the appropriate vocabulary in your responses. MAKE YOUR ANSWERS RED. Answer the following questions: 1. Here is a list of the variables we will focus on in our study. Indicate if they are continuous, discrete, nominal/attribute or ordinal/rank as well as the types of graphs appropriate for the variable. [1 point] VARIABLE Biological Sex Class Rank Hours on Social Media Birth Year

TYPE OF VARIABLE Nominal Ordinal Continuous Discrete

TYPES OF GRAPHS Pie/bar Pie/ Bar Histogram Histogram Page 3 of 8

STAT 100: Introduction to Statistics Lab 1: Exploratory Data Analysis Height Continuous Histogram High School GPA Continuous Histogram Favorite Whole Number Discrete Histogram 2. Define the population in our study? Are we taking a sample or a census of the population? [1/2 point]. The population is all of the students in Stat 100 and we are taking a sample of the population. 3. Who are the individuals in the study from which we measured our variables? [1/2 point]. Individual Stat 100 Students 4. As previously mentioned, not everyone who took the survey answered all the questions and not all STAT 100 students took the survey. How might this affect our confidence in using the summary statistics and graphics to make generalizations about all STAT 100 students in Fall 2016? Brainstorm with your group and provide at least two possible affects. [1/2 point]. It will not be an accurate representation of all Stat 100 Students because not everyone in the class is represented.

Part II: Data Exploration Each group member should think for themselves what would be a good way to summarize and graph the data for Sex, Class Rank, Hours on Social Media, and Height. You may want to review notes from lecture 2-3 about graphics and numeric summaries. Remember there is often not a single right answer. You choose a chart that helps to accurately tell the story in the data. Then decide as a group which chart type is best for each variable and be able to explain the reason for your choice. You may want to try 2 or 3 types of charts and see which looks best. Don’t forget to give your chart a title and legend and label axis. 5. Create a graph of the Sex distribution of STAT 100 students. Provide a justification for your choice of graph. If there is more than on best option provide all appropriate graphics. [2 points]. We used a pie chart when representing gender based on percentage makeup in the class and a bar graph based on count or frequency of each gender.

Page 4 of 8

STAT 100: Introduction to Statistics Lab 1: Exploratory Data Analysis

6. Are there more males or females in STAT 100? How does this relate to the CSUMB population as a whole (which has 62% female and 38% male in spring 2016)? What might explain the differences in the STAT 100 gender ratio compared to the university if there is one? Hint: Consider that STAT 100 is generally not taken by students in Science, Technology, Engineering, and Mathematics (STEM) programs. [1 point]. There are more women in Stat 100. The class is 75.56 percent female and only 22.96 percent Male. This could be explained because STEM programs are primarily made up of males and STAT 100 is not a STEM required course.

7. Create a graph of the distribution of the Class Rank for STAT 100 students. Provide a justification for your choice of graph. For this graphic under the "Other*" if percent less than: option put 1.

Page 5 of 8

STAT 100: Introduction to Statistics Lab 1: Exploratory Data Analysis [1 point] We identified Class Rank as a Nominal Variable with a Relative Frequency.

8. What is the most common Class Rank among STAT 100 students? Provide a possible explanation for the Class Rank distribution. [1/2 point] The most common class is Freshman. They make up 50.75 percent of the Class.

9. Create a graph of the distribution of hours spent on social media by STAT 100 students. Provide a justification for your choice of graph (there may be more than one right choice). In addition, calculate the minimum, maximum, mean, and median using StatCrunch. Because the Variable is discrete we could use any type of grapy. We chose a Histogram because hours spent on social media is a continuous variable. The Minimum is 0 and the Maximum is 100. The Mean is 14.05 and the Median is 10

[1 points]

10. [2 points] Describe the shape, center, and spread of the distribution of Hours Spent on Social Media. a. What is the shape of the distribution? The distribution is Skewed Right b. What is the smallest value? The largest value? Page 6 of 8

STAT 100: Introduction to Statistics Lab 1: Exploratory Data Analysis Smallest value is 0 hours, the largest is 100 hours. c. How do the mean and median compare to each other? Why are the similar or different? The mean is larger because the graph is skewed to the Right d. Describe why the distribution has its shape? The distribution is skewed right because most people use little or no social media but some people still use a lot of social media.

11. Create a graph of the distribution of Heights of STAT 100 students. Provide a justification for your choice of graph (there may be more than one right choice). In addition, calculate the minimum, maximum, mean, and median using StatCrunch. [1 point] We used a histogram because height is a continuous variable.

12. [1 points] Describe the shape of the Heights distribution. Describe the shape, center, and spread of the distribution of Heights. a. What is the shape of the distribution? The distribution is unimodal. b. What is the smallest value? The largest value? The minimum value is 5 inches. The maximum is 510 inches. c. How do the mean and median compare to each other? Why are the similar or different? The mean (68.73) is bigger than the median (64). This is due to the outliers that occur in the graph. The median is also resistant unlike the mean. d. Describe why the distribution has its shape? The distribution is unimodal because the standard deviation is low.

Page 7 of 8

STAT 100: Introduction to Statistics Lab 1: Exploratory Data Analysis

13. Create two new histograms of Height split by Sex (make the Height histogram and use the Group By: option to select Sex). How does splitting the data by Sex influence your interpretation of the height data? You can also try the instructions to make side-by-side boxplots. Calculate the split mean and median too. [2 point] Spliting the data allows us to see the two sexes side by side and provides a more accurate interpretation of height of students in the class. The Male mean was 71.99 inches the male median was 69.5 inches. The female mean was 68.36 and the female median was 63 inches.

14. Based on the explorations above, describe your “typical” STAT 100 student. [1 point] The typical STAT 100 student is Freshman Female who uses social media for an average of 14 hours per week and has an average height of 68.36 inches.

Page 8 of 8...


Similar Free PDFs