Title | STA108 - Descriptive Statistic Project |
---|---|
Author | Ibrahim Alli |
Course | Statistics & Probability |
Institution | Universiti Teknologi MARA |
Pages | 27 |
File Size | 962 KB |
File Type | |
Total Downloads | 78 |
Total Views | 109 |
UNIVERSITI TEKNOLOGI MARA KAMPUSCOURSECOURSE CODESEMESTER: STATISTICS & PROBABILITY: STA: OCTOBER 2020 – FEBRUARY 2021PROJECT TITLE“A study on weight and running speed: The descriptivemeasures and simple linear regression”GROUP MEMBERSNO. NAME STUDENT ID SIGNATURE16 / 20Marked.TABLE OF CONTE...
UNIVERSITI TEKNOLOGI MARA KAMPUS
COURSE
: STATISTICS & PROBABILITY
COURSE CODE
: STA108
SEMESTER
: OCTOBER 2020 – FEBRUARY 2021
PROJECT TITLE “A study on weight and running speed: The descriptive measures and simple linear regression”
GROUP MEMBERS NO.
NAME
STUDENT ID SIGNATURE
Marked.
16.67 / 20
TABLE OF CONTENT
PAGE ACKNOWLEDGEMENT
1
CHAPTER 1:INTRODUCTION 1.1 Background of study
2
1.2 Objectives of Study
2
1.3 Significance of Study
3
1.4 Limitation of Study
3
CHAPTER 2: METHODOLOGY 2.1 Data Description
4
2.2 Graphical Description
6
2.3 Numerical Technique
7
CHAPTER 3: RESULTS AND INTERPRETATION 3.1 Data Representation
8
3.2 Descriptive Statistics Analysis
14
3.3 Correlation and Regression
18
CHAPTER 4: CONCLUSION 4.1 Report Summary
20
4.2 Appendix
21
REFERENCES
25
ACKNOWLEDGMENT We are very grateful as we managed to complete our analytical and statical report on the relationship between the age, gender and weight affecting the speed of running. We are also blessed with our lecturer, Madam X for helping and guiding us with this assignment. This assignment was a successful because of the effort and co-operation from our group members, A, B and C This semester is a very challenging as we are going through online class due to Covid-19, but we managed to pull it off with the good communication and teamwork in completing every assignment given.
1
CHAPTER 1: INTRODUCTION 1.1.
Background of Study The study is done to analyze the relationship between one’s weight against the speed of his or her running. Forty (40) volunteers from both genders participated in this study. The data for this study is taken from StatCrunch.com1, therefore, it is a secondary data set. The location of the study taken was not mentioned in the source.
The study was organized by semester 4 students to satisfy the requirements for the syllabus of STA108: Basic Statistics and Probability. We had chosen to evaluate the relationship between the weight against the running speed. Even though, gender is included but the focus of the study would be on the weight and the running speed.
Scientifically, one’s weight could significantly influence or affect the running speed. Anyone with heavier weight would need more force to overcome the gravity. Therefore, his or her running speed would be adversely affected. Thus, in this study we would investigate the relationship of these two significant continuous quantitative variables namely weight and running speed through a statistical approach. Another qualitative variable, gender of the volunteers is also included in this study as another possible factor that would affect the running speed.
1.2
Objectives of Study The objectives of this study are as below: 1. To describe the gender and weight of the volunteers. 2. To investigate the relationship between weight of the volunteers and their speed of running.
1
(n.d.). Retrieved January 01, 2021, from https://www.statcrunch.com/app/index.php?dataid=2862328
2
1.3
Significance of Study It is very important to know how our body works. Learning and understanding our body would eventually lead us to healthier lifestyle. This study will benefit everyone to understand how lighter weight could eventually improve our stamina which is shown in faster running speed. As gender is also included in this study as other possible factor, one would understand more the physiological differences between men and women in term metabolism rate. Therefore, they could know their biological limit that could eventually prevent them pushing their body to work beyond the limits. Furthermore, pushing one’s biological limit could lead to unwanted adverse effects. Moreover, experts note that you would be able to run about two seconds faster per mile for every pound that you lose. This means that if you lose 15 pounds, you would run about 30 seconds per mile faster, cutting a 5k time by a minute and a half just from your weight loss or a marathon time by 13 minutes2. In general, this study would motivate overweight men and women to lose weight and others to maintain the ideal weight.
1.4
Limitation of Study
The limitation of this study is the nature of the data set procurement. The data is obtained from an open database online. Thus, it is a secondary data. Furthermore, the data set does not include more background of the study when the data was collected. Therefore, we assume that the data was collected through a scientific experiment which included specific number of participants with their weight recorded before performing a run at designated distance. The study also does not include any existing health issues that might also affect the speed of running.
2
Basinger, R. (2019, August 12). Does Weight Loss Make You Run Faster? Retrieved January 27, 2021, from https://thewiredrunner.com/does-weight-loss-make-you-run-faster/
3
CHAPTER 2: METHODOLOGY 2.1
Data Description This assignment was a study for us to analyze the relationship between gender and weight affecting the speed of running. Due to this pandemic, the data set used was from the internet. This study was participated by 40 people with different age, gender, and weight.
The range of the age for the participant was from 18 - 30 years old, which can be considered from a teenager to an adult range of age. This study was participated by 19 females and 21 males, the weight of the participant was from 120 +/- lbs to 190+/- lbs. This case study helps us to learn how the speed of running is affected by gender, and weight of a person. The speed of the person was recorded as miles per hour.
In this study, gender is the qualitative variables. Meanwhile, weight and the speed of running are the quantitative variables.
I.
Population All members of the public in the unknown city/town of the study.
II.
Sample 40 members of the public of both genders with their age.
III.
Sampling Technique Simple random sampling. The sample was chosen randomly from the sampling frame of the researcher’s list of volunteers.
IV.
Data Collection Method Direct observation from the experiment conducted.
4
V.
Descriptions of Variable The variables of this study are gender, weight, and speed of running (in mph).
Variable
Type of Variable
Level of Measurement
Gender
Qualitative
Nominal Scale
Weight
Quantitative Continuous
Ratio Scale
Speed
Quantitative Continuous
Ratio Scale
5
2.2
Graphical Description
Firstly, a pie chart will be constructed based on the table of distribution of volunteers according to their gender. Therefore, the pie chart will show the percentage of the male and female volunteers in the study.
Secondly, a table will be tabulated based on the weight of the volunteers and the number of volunteers. From the data tabulated, a histogram will be constructed according to the distribution of the weight and the number of volunteers. The weight of the volunteers is placed on the x-axis of the graph and number of volunteers are on the y-axis.
Thirdly, a table with data of the running speed of the volunteers and the numbers of volunteers are tabulated. Based on the data, another histogram is construct with the distribution numbers of volunteers and the speed of running on y-axis and x-axis, respectively.
Fourthly, the box and whiskers plot for skewness will be used for the weight of the volunteers based on gender and for running speed based on gender.
In addition, a scatter plot diagram will be construct based on the speed and the weight of the volunteers. The running speed of the volunteers will be deposited on the y-axis and the weight deposited on the x-axis.
6
2.3
Numerical Technique
Based on this study, we have included the value of mean, median, mode, variance, standard deviation, skewness, range, minimum, maximum, first quartile and third quartile.
The relationship between the weights against the running speed is also represented by correlation and regression analysis. Correlation analysis is used to measure the strength of these two variables. The value of correlation coefficient for the data obtained in the study of the weight and the running speed is determine whether it has strong or weak positive or negative correlation.
Furthermore, regression is a simple linear relationship involving only two variables. One would be the dependent variable (y) while the other would be the independent variable (x). The dependent variable (running speed) is the variable in regression that cannot be controlled or manipulated. Besides, it will help to determine the type of relationship between two variables. Simple Linear Regression Equation: y = a + bx where, x = independent variable y = dependent variable a = y-intercept b = slop of the line
7
CHAPTER 3: RESULT INTERPRETATION
3.1
Data Presentation Based on the data that we have obtain, we already identified the qualitative and quantitative variable based on their characteristic. We can pinpoint that the quantitative data includes the variable of the weight in lbs. and the running speed in miles per hour. While the variable gender is classified as the qualitative data.
3.1.1
Gender The data includes a total of 19 (47.5%) Female and 21 (52.5%) Male of volunteers as shown in Table 3.1.1 below.
Table 3.1.1 Distribution of volunteers Based on Gender Gender
Number of Volunteers
Percentage (%)
Male
21
52.5
Female
19
47.5
TOTAL
40
100
Distribution of Volunteers based on Gender
48% 52%
Male
Female
Figure 3.1.1 Pie Chart for the number of volunteers based on gender
8
3.1.2
Weight in lbs. Table 3.1.2 Data distribution of volunteers based on weight (lbs.) Weights (lbs.)
Number of volunteers
120
1
123
1
124
1
126
1
128
1
134
1
137
1
138
1
140
1
142
2
146
1
147
2
148
1
149
2
159
1
163
1
164
1
165
1
166
1
167
1
169
1
170
1
172
1
174
1
175
1
176
1
178
1
179
1
180
1
183
1
185
1
192
1
9
193
1
194
1
196
2
197
1
TOTAL
40
Figure 3.1.2.1 Histogram of distribution of number of volunteers based on their weight (lbs.)
With the sample size of 40 volunteers, we can construct a histogram as the figure above. From the histogram we can observe that the graph is skewed to the left. This shows that the volunteers who participated in this study have a much heavy weight overall. The heaviest weight of the volunteer recorded that take part in this study was 197lbs. with 1 volunteer and the lightest weight of the volunteer recorded that take part in this study was 120lbs with only 1 volunteer. From the graph we can also identify the mod which is 142lbs. From the data given we can calculate and construct a box & whisker plot to further explain the skewness of this variable.
10
Figure 3.1.2.2 Box & Whiskers Plot for Skewness
Based on the data for this variable, we can calculate the Q1, Q2, and Q3 to construct the box & whiskers plot. The overall skewness of the weight of volunteers in lbs. is -0.091. This means, the overall distribution of the variable is left skewed since the value of the skewness is at a negative.
From the Box & Whiskers Plot, we could see that weight of the volunteers in lbs. are skewed to the left (negatively skewed). The plot shows that the median line Q2 is located more towards the right. In addition, the left whisker of the plot is much longer than the right whisker. These properties can be observed in the Box & Whiskers plot
11
3.1.3 Running Speed Table 3.1.3 Data distribution of volunteers based on Running speed Running Speed (miles per hour)
Number of volunteers
5
2
6
5
7
1
8
4
9
3
10
4
11
1
12
4
13
7
14
7
15
2
TOTAL
40
Figure 3.1.3.1 Histogram of distribution of number of volunteers based on their running speed
12
With the sample size of 40 volunteers as well, we can construct a histogram just life the figure above shown. From the histogram, we can observe that the graph is skewed to the left. This indicates that the running speed in mph for the volunteers are quite high with being 15mph is the highest recorded speed with 2 volunteers. From the graph we can also identified the mod for this variable which is 13mph and 14mph reading which both have 7 volunteers that were recorded. From this variable data, we can also construct a box & whisker plot graph.
Figure 3.1.3.2 Box & Whiskers Plot for Skewness Based on the data for this variable, we can calculate the Q 1, Q2, and Q3 to construct the box & whiskers plot. The skewness of the weight of volunteers in lbs. is -0.360 This indicate that the overall distribution of the variable is left skewed since the value of the skewness is at a negative.
From the Box & Whiskers Plot, we could see that the running speed of the volunteers are skewed to the left (negatively skewed). The plot shows that the median line Q 2 is located more towards the right. In addition, the left whisker of the plot is much longer than the right whisker. These properties can be observed in both Box & Whiskers plot.
13
3.2
Descriptive Statistic Analysis
Based on the quantitative variables that we have identified and graphed. We will now look at the descriptive analysis for the quantitative variables of weight in lbs. and the running speed in mph for the volunteers. This will involve the measures of central tendency, measures of dispersion, measures of position, and the skewness for each of the variables.
3.2.1
The Weight of the volunteers in lbs.
Table 3.2.1 Descriptive statistics for weight of volunteers in lbs. Descriptive Statistics
Weight (lbs.)
Mean, x
160.83
Median, x
164.5
Mode, 𝐱
142
Standard deviation, s
23.0289
Variance, s2
530.302
Skewness
-0.091
Range
77.00
Minimum
120
Maximum
197
First Quartile, Q1 (25%)
142
Second Quartile, Q2 (50%)
164.5
Third Quartile, Q3 (75%)
178.5
For the first quantitative variable which is the weight of the volunteers in lbs. The Mean, x value for this variable are 160.83. this means that on the average, the volunteers weighed at 160.83 lbs. From the Median, x we can interpret that 50% of the volunteers weigh less than 164.5 lbs. while 50% of the volunteers weigh more than 164.5 lbs. As for the Mode, 𝐱. We can interpret that most of the volunteers weigh at 142 lbs. The standard deviation, s for this variable is 23.0289. While the variance, s 2 is 530.302. From this value, we can imply that the deviation of the data from the mean are large and the data will spread much greater than the mean since the variance value is big.
14
Figure 3.2.1 Box & Whisker for the weight of both Female and Male volunteers
The overall skewness of the variable is -0.091. This means, the distribution of the variable is left skewed since the value of the skewness is at a negative. From the Box & Whiskers Plot above, we could see that weight for both males and females are skewed to the left (negatively skewed). This also implies that the Mode value are greater than the Mean value. The range of the variable is 77.0 lbs. This means, the difference between the largest and the smallest dataset for the weight are 77.0 lbs. The minimum or the smallest dataset of weight are 120 lbs. while the maximum or the largest dataset of weight are 197 lbs. The first, second and third quartiles are 142lbs, 164.5lbs and 178.5lbs. respectively. This implies that 25% of the volunteers weigh less than 142 lbs. and 75% of the volunteers weigh more than 178.5 lbs. The second quartile will be referring to the Mean, x since the values of it are the 50% from the overall dataset.
15
3.2.2
The Running Speed of the volunteers in Miles per Hour
Table 3.2.2 Descriptive Statistics for running speed of volunteers in mph Descriptive Statistics
Running Speed (mph)
Mean, x
10.6
Median, x
11.2
Mode, 𝐱
13
Standard deviation, s
3.144
Variance, s2
9.857
Skewness
-0.360
Range
10
Minimum
5
Maximum
15
First Quartile, Q1 (25%)
8
Second Quartile, Q2 (50%)
11.2
Third Quartile, Q3 (75%)
13.57
For the first quantitative variable which is the weight of the volunteers in lbs. The Mean, x value for this variable are 10.6. this means that on the average, the volunteers running speed is at 10.6 mph. From the Median, x we can interpret that 50% of the volunteers running speed was less than 11.2 mph. while 50% of the volunteers running speed was more than 11.2 mph. As for the Mode, 𝐱. We can interpret that most of the volunteers running speed is at 13 mph. The standard deviation, s for this variable is 3.144. While the variance s2 is 9.857. From this value, we can imply that the deviation of the data from the mean are much smaller and the data will spread not much greater than the mean since the variance value is considerably small.
16
Figure 3.2.2 Box & Whisker plot for R...