Project 2 - Inferential Statistics PDF

Title Project 2 - Inferential Statistics
Course Probability and Statistical Data Analysis
Institution Universiti Teknologi Malaysia
Pages 15
File Size 874.1 KB
File Type PDF
Total Downloads 52
Total Views 150

Summary

Download Project 2 - Inferential Statistics PDF


Description

____________________________________________

SECI2143 – PROBABILITY & STATISTICAL DATA ANALYSIS

PROJECT 2 INFERENTIAL STATISTICS

SECTION: 03 – 1SECR COURSE NAME: BACHELOR OF COMPUTER SCIENCE – COMPUTER NETWORKS & SECURITY STUDENT’S NAME: MUHAMMAD ISKANDAR ZULQARNAIN BIN MOHD ISHAK STUDENT’S ID: A19EC0098 LECTURER’S NAME: DR. ARYATI BAKRI DATE OF SUBMISSION: 27th JUNE 2020

SECI2143 – PROBABILITY & STATISTICAL DATA ANALYSIS [PROJECT 2]

Table of Contents Introduction ................................................................................................................................ 3 Statistical Analysis on Case Study............................................................................................. 5 Hypothesis Statement.......................................................................................................................... 5 Execution of Tests............................................................................................................................... 6 Overall Execution – Compulsory Tests .......................................................................................... 6 Overall Execution – Optional Tests .............................................................................................. 10 Discussions on Results Interpretation ............................................................................................... 12

Conclusion ............................................................................................................................... 14 References ................................................................................................................................ 15

Introduction Inferential statistics are involved with producing inferences based on relationships found in the sample, to relationships in the population. Inferential statistics help us decide, for instance, whether the distinctions in the middle of groups that we see in our data are powerful and strong enough to provide support or claim for our hypothesis that group differences exist in common, in the whole population. In this Project 2 for Probability and Statistical Data Analysis course, we are going to implement all the necessary items that we had learned previously in making conclusion. The conclusion that is to be made must adhere several procedures such as producing hypothesis statement, test the hypothesis using appropriate methods and finalizing a decision. Thus, a set of data was collected regarding the Digital Single Lens Reflex (DSLR) Camera for us to do some analysis and finding based on our selected dataset. This dataset was retrieved in a Kaggle website which was uploaded by Chris Crawford, a Data Engineer at Team Rubicon Seattle, Washington US. In this data, it gathered several camera’s specifications that builds up the best photography experience in filming and shooting using this technology. This includes maximum resolution, effective pixels, zoom telephoto, and many more. To this day, we can see a dramatic decrement in DSLR camera usage as the world is transforming the current smartphones into a stunning camera that may produce the congruent images as well as the DSLR and another high-performance camera. Everybody was given the same chance to be a photographer just by owning a phone equipped with alike specification on the DSLR. A demographic information on 2009 and 2016 is shown in Figure 1. But that does not become a factor that may lead to the end of camera’s demand. The numbers in graph is just showing that most of individuals did not own a camera anymore as a phone is enough for them. However, professionals, mass media people, film production crew are still depending on this type of shooting devices as they are aware that the real DSLR cameras may give the best picture production ever compared to phones. Can we find out a production team that film their creation just by a smartphone of super powerful camera? The answer is no. Although those tech companies may develop such extraordinary devices, there will be a limitation and trade off that must be consider. Therefore, in this study we will focus on this topic on what makes the camera so special and why a powerful lens of DSLR cannot be fitted in a smartphone by considering several factors in the camera specification.

Figure 1 show the statistical information retrieved via https://www.diyphotography.net/camera-sales-report-2016-lowestsales-ever-dslrs-mirrorless/

Statistical Analysis on Case Study Hypothesis Statement In my retrieved data for this cameras’ dataset, the source does not include any specification on reliability between one factor to another. The data is free to consider any of the aspects that may suites my case study and statistical data to be analysed soon. Hence, I will be considering two sample tests to compare the mean of price per 1038 datasets between maximum resolution and storage included. The definitions of means are as follows.

𝜇1 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑝𝑟𝑖𝑐𝑒 𝑜𝑣𝑒𝑟 𝑎 1038 𝑑𝑎𝑡𝑎 𝑓𝑜𝑟 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑟𝑒𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛 𝜇2 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑝𝑟𝑖𝑐𝑒 𝑜𝑣𝑒𝑟 𝑎 1038 𝑑𝑎𝑡𝑎 𝑓𝑜𝑟 𝑠𝑡𝑜𝑟𝑎𝑔𝑒 𝑖𝑛𝑐𝑙𝑢𝑑𝑒𝑑

Next, the null and alternative hypothesis are to be defined. The mean of price per 1038 data for maximum resolution is equal to the mean price per 1038 data for storage included will be the null hypothesis while the mean of price per 1038 data for maximum resolution is not equal to the mean price per 1038 data for storage included will be the alternative hypothesis. The notations of hypothesis are defined as follows.

𝐻0 = 𝑇ℎ𝑒 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑝𝑟𝑖𝑐𝑒 𝑝𝑒𝑟 1038 𝑑𝑎𝑡𝑎 𝑓𝑜𝑟 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑟𝑒𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛 𝑖𝑠 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑝𝑟𝑖𝑐𝑒 𝑝𝑒𝑟 1038 𝑑𝑎𝑡𝑎 𝑓𝑜𝑟 𝑠𝑡𝑜𝑟𝑎𝑔𝑒 𝑖𝑛𝑐𝑙𝑢𝑑𝑒𝑑 𝐻1 = 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑝𝑟𝑖𝑐𝑒 𝑝𝑒𝑟 1038 𝑑𝑎𝑡𝑎 𝑓𝑜𝑟 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑟𝑒𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛 𝑖𝑠 𝑛𝑜𝑡 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑝𝑟𝑖𝑐𝑒 𝑝𝑒𝑟 1038 𝑑𝑎𝑡𝑎 𝑓𝑜𝑟 𝑠𝑡𝑜𝑟𝑎𝑔𝑒 𝑖𝑛𝑐𝑙𝑢𝑑𝑒𝑑 Therefore, these long definitions may be simplified into mathematical terms as follows to make us easier while doing the test.

𝐻0 ∶ 𝜇1 = 𝜇2 𝐻1 ∶ 𝜇1 ≠ 𝜇2

Execution of Tests Overall Execution – Compulsory Tests 2 Sample Test on Price for Maximum Resolution and Storage Included

𝐻0 ∶ 𝜇1 = 𝜇2 𝐻1 ∶ 𝜇1 ≠ 𝜇2

Components

Values / Explanation

Test statistics

0.0000

Critical Value

-1.9611

Decision

TS (0.0000) > CV (-1.9611) Thus, we fail to reject 𝐻0

Conclusion

There is sufficient evidence to support that at 0.05 level of significance, the mean of price for maximum resolution is equal to mean of price for storage included.

Correlation Test on Maximum Resolution and Minimum Resolution

𝐻0 ∶ 𝑝 = 0

𝐻1 ∶ 𝑝 ≠ 0

Pearson's product-moment correlation data: x and y t = 49.506, df = 1036, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.819330 0.855582 sample estimates: cor 0.8383806

Components Decision

Values / Explanation TS (49.506) > CV (2.2e-16) Thus, we reject 𝐻0

Conclusion

There is sufficient evidence to support that at 0.05 level of significance, there is a strong positive linear relationship between maximum and minimum resolution of a camera.

Regression Test on Normal Focus Range and Macro Focus Range

𝐻0 ∶ 𝛽1 = 0

Coefficients: (Intercept) 1.893 1.8936 6

𝐻1 ∶ 𝛽1 ≠ 0

x 0.1333

> summary(model) Call: lm(formula = y ~ x) Residuals: Min 1Q -15.229 -3.561

Median -1.894

3Q 1.773

Max 71.772

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.893581 3.935 8.87e-05 *** 0.481214 x 0.133349 0.009565 13.941 < 2e-16 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 7.436 on 1036 degrees of freedom Multiple R-squared: 0.158, Adjusted R-squared: 0.1572 F-statistic: 194.4 on 1 and 1036 DF, p-value: < 2.2e-16

b0 = 1.8936

b1 = 0.1333

sb1 = 0.009565

p-val = 0.00000000000000022

Regression Equation 𝑦 = 1.8936 + 0.1333𝑥

Test Statistic: 𝑡=

𝑏1 − 𝐵1 𝑠𝑏1

𝑡=

0.1333 − 0 0.009565

𝑡 = 1.3936 (𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐) Critical Value: df = 1038-2 = 1036

α = 0.05

t0.05,1036 = -1.962,1.962

Components Decision

Values / Explanation TS (1.3936) > CV (-1.962) & TS (1.3936) < CV (1.962) Thus, we fail to reject 𝐻0

Conclusion

At 0.05 level of significance, there is no sufficient evidence to support that the normal focus range and the macro focus range are interrelated to each other.

Overall Execution – Optional Tests Chi Square Test of Independence – Two Way Contingency Test

H0 = The camera specifications on effective pixels, zoom wide, and zoom tele are independent H1 = The camera specifications on effective pixels, zoom wide and zoom tele are dependent

P-value: 0.000000000000000022 Test Statistic: X2 = 31701 Critical Value: X20.05, 2074 = 2181.0615

Components Decision

Values / Explanation TS (31701) > CV (2181.0615) Thus, we reject 𝐻0

Conclusion

At 0.05 level of significance, there is sufficient evidence to support that the camera specifications on effective pixels, zoom wide and zoom tele are dependent to each other.

ANOVA Among Effective Pixels, Zoom Wide and Zoom Tele

𝐻0 ∶ 𝜇1 = 𝜇2 = 𝜇3 𝐻1 ∶ 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡 𝑓𝑟𝑜𝑚 𝑒𝑎𝑐ℎ 𝑜𝑡ℎ𝑒𝑟 > summary(anova_result) Df Sum Sq Mean Sq F value Pr(>F) ind 2 7722782 3861391 1309 Numerator: 2

Denominator: 3111

Fval @ Test Statistic =

3861391 2950

= 1308.9461

Components Decision

Critical Value: F0.05,2,3111 = 6.96 Values / Explanation

TS (1308.9461) > CV (6.96) Thus, we reject 𝐻0

Conclusion

At 0.05 level of significance, there is sufficient evidence to support that at least one of the mean either effective pixels or zoom wide or zoom tele is different from each other.

Discussions on Results Interpretation We have now done calculating and finding the required outcomes for our case study based on certain method of data analysis. Again, we are finding numerous relationship and dependency between each of camera specifications whether they are interrelated with each other or not. If yes, what is the significance? If no, why? In this discussion part we are going to look at one by one on our analysis made previously. There are two parts of analysis, compulsory and optional. Compulsory tests include hypothesis tests, correlation, and regression while chi square test and ANOVA are bounded under optional tests. Beginning with 2 sample tests on cameras’ price within the specs of maximum resolution and storage included. Based on the calculations, we found out that there is a sufficient evidence to claim that mean for both specs are the same which is 457.3844. We can say that these means are same because when the camera is built with highest resolution, the price will be highest. Same goes to storage included, the bigger the size, the higher the price. Thus, here we can see they obtained the same mean. Moving on to correlation test on maximum resolution and minimum resolution of the camera. There is sufficient evidence to support that there is a strong positive linear relationship between maximum and minimum resolution of a camera. Since the value of the correlation is 0.8384 which is approaching to 1.000, it indicates the strength of relationship is high. This is somehow true because when a camera is built, the minimum and maximum resolution equipped must be within the pre-set range. A combination of too high resolution or too low resolution might affect the quality of picture produced. Next, we go for the regression test on normal focus range and macro focus range. These ranges can be simply defined as how near or how far the picture of images can being captured with maximum focus and quality of the camera can give from the tip of lenses. Normal is how far, macro is how near. By referring to the test results, there is no sufficient evidence to support that the normal focus range and the macro focus range are interrelated to each other. This means that the camera can be built freely without any dependency or relationship to each other. For a normal DSLR camera, such as Canon, the macro focus range for this camera is within 1.2 meter of lenses diameter 70mm to 200mm. However, we can see professionals often change their camera lenses just to ensure they capture a right picture distance with right focus ranges.

Proceeding to the next data analysis of a camera, we now already reached to the optional parts of the test. Let us talk about chi square test of independence on several camera’s specifications. Effective pixels are the pixels that capture the image data. They are effective and, effective means "successful in producing the desired effect or intended result." Zoom wide is zooming while zoom tele is zooming out. We choose two-way contingency table method to analyse this data. This test was conducted to see whether the listed specs are independent or not. The tested data includes effective pixels, zoom wide, and zoom tele. The test showed that there is sufficient evidence to support that the camera specifications on effective pixels, zoom wide and zoom tele are dependent to each other. Finally, an ANOVA test among effective pixels, zoom wide and zoom tele was conducted and again, using the same specification on chi square test. There is sufficient evidence to support that at least one the mean either effective pixels or zoom wide or zoom tele is different from each other.

Conclusion In conclusion, there are so many benefits for me when going through the project from choosing the data until finalizing the project report. During this project, I read several articles regarding my chosen data, which is camera and I found that is very interesting of learning about the camera. However, there are some problems that I encountered while doing this project. For instances, the time when we are instructed to find a data in quite challenging for me as we need to find with a lot of ratio data for the analysis. Besides, there is also problems during using the R Studio. There are some major errors occur and I need to call my friends for help - since it is now in CMCO period and all of us are staying at home. Fortunately, everyone is willing to help me upon the completion of this project. Lastly, to answer my problem statement in the introduction, those professional’s camera’s specifications cannot be fitted in a smartphone because they are too huge to be implemented. We cannot change the lenses on phones as it were fixed. The resolution also must follow the phones’ compatibility. Thus, it is not suitable to put professional’s specification on an ordinary phone. Maybe one day, the world might change to a powerful smartphone with high camera spec, who knows? Let the technologies evolve.

References Admin. (7 January, 2017). What does it mean when macro focus range and normal focus range are equal? Retrieved from Photo Stack: https://photo.stackexchange.com/questions/86323/whatdoes-it-mean-when-macro-focus-range-and-normal-focus-range-are-equal Djudjic, D. (2 March, 2017). Photography. Retrieved from CAMERA SALES REPORT FOR 2016: LOWEST SALES EVER ON DSLRS AND MIRRORLESS: https://www.diyphotography.net/camera-sales-report-2016-lowest-sales-ever-dslrs-mirrorless/ Plumridge, J. (13 September, 2019). LifeWire. Retrieved from What Are Effective Pixels?: https://www.lifewire.com/what-are-effective-pixels493741#:~:text=Effective%20pixels%20are%20the%20pixels,work%20of%20capturing%20a %20picture. Scholten, A. Z. (15 April, 2016). Coursera. Retrieved from Inferential Statistics: https://www.coursera.org/learn/inferential-statistics

(Scholten, 2016) (Djudjic, 2017) (Admin, 2017) (Plumridge, 2019)...


Similar Free PDFs