STA108 Chapter 1 PDF

Title STA108 Chapter 1
Course Statistics & Probability
Institution Universiti Teknologi MARA
Pages 20
File Size 841.7 KB
File Type PDF
Total Downloads 139
Total Views 811

Summary

CHAPTER 1INTRODUCTION TO STATISTICS1 WHAT IS STATISTICS?Statistics refers to the practice of science of collecting and analyzing numerical data in large quantities. In general, one can say thatStatistics is the methodology for collecting, analyzing, interpreting and drawing conclusions from informat...


Description

Chapter 1

Introduction to Statistics

CHAPTER 1 INTRODUCTION TO STATISTICS 1.1

WHAT IS STATISTICS?

Statistics refers to the practice of science of collecting and analyzing numerical data in large quantities. In general, one can say that Statistics is the methodology for collecting, analyzing, interpreting and drawing conclusions from information. In other words, statistics is the methodology which scientists and mathematicians have developed for interpreting and drawing conclusions from collected data. Everything that deals even remotely with the collection, processing, interpretation and presentation of data belongs to the domain of statistics, and so does the detailed planning of that precedes all these activities.

Collecting

Organizing

Analyzing

Interpreting

Presenting

Figure 1: Statistics involves scientific procedures and methods

These statistical processes form a part of the decision making process in many organizations. Managers of today need to have strong mathematical abilities to interpret statistical analyses before they can make informed decisions.

1

Chapter 1

Introduction to Statistics

Quick Check 1 1. 2.

1.2

Explain what you understand by statistics. Briefly describe two meaning of the words statistics.

TERMINOLOGIES 1.2.1 Variable A variable is any characteristics, number, or quantity that can be measured or counted. 1.2.2 Data Groups of information in raw or unorganized form (such as alphabets, numbers or symbols) that represent the qualitative or quantitative attributes of a variable or set of variables. Example: Researchers may collect data on the amount of money spent by secondary school students on textbooks, the brand of the detergents most preferred by housewives in Perak, the monthly income of rubber smallholders in Malaysia, the time taken by Malaysian-made cars to accelerate from 0 to 100 km/h, the average length of stay of foreign tourists in Malaysia and their favourite places of visits. 1.2.3 Population A group of measurements about which one wishes to draw conclusions. Example: An investigator may desire to draw conclusions about school-age children with asthma. The investigator may define her population to be all school-age children with asthma treated in government hospitals in Malaysia. 1.2.4 Sample A subset of all the measurements in the population. Example :If it is not possible to obtain information about all school-age children with asthma in Malaysia, she may select just 50 school-age children with asthma treated in government hospitals in Malaysia and obtain a sample data of these 50 children.

Figure 2: Population and sample

2

Chapter 1

Introduction to Statistics

1.2.5 Census Census is a study of entire population. 1.2.6 Sample Survey Sample survey is a study on sample (selected segment of population). 1.2.7 Parameter Numerical descriptive measure of population. 1.2.8 Statistics Numerical descriptive measure taken from sample. 1.2.9 Pilot Study The study before the actual fieldwork is carried out. 1.2.10 Sampling Techniques The methods used to select samples from a population. 1.2.11 Random sampling The selection of sample requires that each member of the population has an equal and independent chance of being selected.

(http://explorable.com) Random sampling techniques include simple random sampling, systematic sampling, cluster sampling and stratified sampling. 1.2.12 Non-random sampling The selection of sample does not require that each member of the population has an equal and independent chance of being selected.

3

Chapter 1

Introduction to Statistics

Non-random sampling techniques include convenience sampling, quota sampling and judgemental sampling. 1.2.13 Sampling frame A list of all the elements in the population from which the sample is drawn. A sampling frame is needed so that everyone in the population is identified so they will have an equal opportunity for selection as a subject (element). Example :If it is not possible to obtain information about all school-age children with asthma in Malaysia, she may select just 50 school-age children with asthma treated in government hospitals in Malaysia and obtain a sample data of these 50 children. The sampling frame would be: A list of all school-age children with asthma treated in government hospitals in Malaysia.

1.3

TYPES OF STATISTICS

Broadly speaking, applied statistics can be divided into two areas: descriptive statistics and inferential statistics or inductive statistics. 1.3.1 Descriptive Statistics Descriptive statistics consists of methods for organizing, displaying, and describing data by using tables, graphs, and summary measures. Thus, raw data are transformed into meaningful forms so that the user and manager can make generalizations or conclusions just by taking a quick look at visual presentation. Example: In the last three years (2011-2013),hundreds of thousands of foreign tourists have flocked to Malaysia for medical appointments. These people are termed as foreign medical tourists. The arrivals of the tourists have shown a significant growth every year since 2011. For example, in 2011 there were 583,314 foreign patients, in 2012 there were 671,727 and in 2013 there were 770,000 foreign patients seeking medical treatment in Malaysia. Thus, the medical tourism is capable of generating foreign currencies for the country and sustaining its balance payments. Today, Malaysia is fast becoming the healthcare destination of choice for foreign patients from all over the world for reasons such as affordable costs, wide range of specialists, more advanced technologies, high quality medical care and shorter waiting periods. These factors, combined with the diverse tourism product available in the country, offer an attractive package for medical travellers. Some of the more popular treatments are orthopedics, gastroenterology, dental, cosmetic and general surgeries (Mann,2007). 1.3.2 Inferential Statistics In inferential statistics, we make generalizations about a population by analyzing a sample. If the sample is a good representation of a population, accurate conclusions about population can be inferred from the analysis of this

4

Chapter 1

Introduction to Statistics

5

sample. This is because the sample values are close representations of the actual values of the population of interest. However, there is a certain amount of uncertainty about the estimations. Therefore, probability is often used when stating the conclusions. Thus, inferential statistical techniques are used to make inferences about the population based on measurements obtained from the sample. The procedure is to select a sample from a population, measure the variables of interest, analyze the data, interpret the output and draw conclusions based on the data analysis. xample:The accompanying chart, reproduced from MALAYSIA TODAY, shows that 61% of workers urveyed said that they are not pressured by their oss or co-workers to come to work when they are ick with the flu,38% said they feel pressured, and 1% aid they are not sure.

Quick Check 2 1. 2. 1.4

What is difference between descriptive statistics and inferential statistics? Briefly explain the types of statistics.

TYPES OF VARIABLE 1.5.1 Quantitative Variables Variables whose values are measurements or counts. Example: age, weight, height, number of defective items produced or income. There are two classifications: Discrete Variable It is a countable variable Example: number of children in your family, number of students in a class or number of television sets in your houses. Continuous Variable It can be measured and the responses take on values that lie within a continuum or interval. Example: height of students, weights of babies, age and father’s income. 1.5.2 Qualitative Variables They are categorical in nature and their values cannot be counted or measured. Example: gender, father’s occupation, program of study, courses registered or place of birth.

Chapter 1

1.5

Introduction to Statistics

6

TYPE OF DATA 1.5.1 Quantitative data Data collected for a numeric variable. It is information that can be measured and written down with numbers. 1.5.2 Qualitative data Data collected for a categorical variable. It is expressed not in terms of numbers, but rather by means of a natural language description such as attributes, characteristics, properties of a thing or phenomenon. In statistics, it is often used interchangeably with "categorical" data.

TYPES OF VARIABLE /DATA

Quantitative

Qualitative

(Numerical)

(Categorical)

Discrete

Continuous

Figure 3 : Types of Variable

Here's a quick look at the difference between qualitative and quantitative data. Qualitative Data Overview:    

Deals with descriptions. Data can be observed but not measured. Colors, textures, smells, tastes, appearance, beauty, etc. Qualitative → Quality

Quantitative Data Overview:   



Deals with numbers. Data which can be measured. Length, height, area, volume, weight, speed, time, temperature, humidity, sound levels, cost, number of members, ages, etc. Quantitative → Quantity

Chapter 1

Example 1: Oil Painting

Qualitative data:     

blue/green color, gold frame smells old and musty texture shows brush strokes of oil paint peaceful scene of the country masterful brush strokes

Example 1: Oil Painting

Quantitative data:     

picture is 10" by 14" with frame 14" by 18" weighs 8.5 pounds surface area of painting is 140 sq. in. cost $300

Example 2:

Example 2:

Latte

Latte

Qualitative data:    

robust aroma frothy appearance strong taste burgundy cup

Quantitative data:    

12 ounces of latte serving temperature 150º F. serving cup 7 inches in height cost $4.95

Example 3:

Example 3:

Freshman Class

Freshman Class

Qualitative data:    

friendly demeanors civic minded environmentalists positive school spirit

Introduction to Statistics

Quantitative data:  672 students  394 girls, 278 boys  68% on honor roll  150 students accelerated in mathematics

7

Chapter 1

1.6

Introduction to Statistics

TYPES OF MEASUREMENT SCALES

Data may be described in accordance with the level of measurement attained. The four levels of measurement are – from weakest to strongest level – nominal, ordinal, interval and ratio scales.

Ratio Interval Ordinal Nominal

1.6.1 Nominal   

Data classified into various distinct categories which have no numerical meanings and the category can’t be ranked. The weakest form of measurement. Example: Gender (Male &Female)

1.6.2 Ordinal  Data for which numerical order is meaningful and the categories can be ranked.  Example:  Education Qualification (PhD, Master, Degree, Diploma, SPM)



1.6.3 Interval An ordered scale that gives meaning to the difference between measurements and does not involve a true zero point.   Example:  Temperature Reading  A temperature of 80oF is 4 degrees warmer than a temperature of 76oF. This 4 degrees difference is the same if two temperatures measured are 65oF and 61oF, so the meaning is the same throughout the temperature scale.  1.6.4 Ratio 



Like the interval scale, it also has order scales. However, the difference between measurements involves a true zero point like that in the measurement of height, weight, age or number of phone calls made per month.



Example:  A person who is 6 feet tall is twice as high as a 3 feet tall person or a 15 year-old boy is thrice as old as a 5 year-old boy.



8

Chapter 1

1.7

Introduction to Statistics

SAMPLING TECHNIQUES 1.7.1 Simple Random Sampling (SRS) A random sampling process which is each member of the population has an equal chance of being selected as an element in the sample. Procedure i. ii. iii.

Select a suitable sampling frame. Assign a number to each element in the sampling frame (e.g. 001, 002,…., 500 for population size N = 500) Select elements for study by: a. Drawing numbers from box (‘lucky draw’) b. Using a table of random numbers. It is a table displaying hundreds of digits from 0 to 9 set up in such a way that each number is equally likely to follow any other. (See text for random sampling details & table of randomnumbers) c. Computer-generated random number table

Advantage i. ii.

Easy to assemble the sample. The sample is unbiased and represents the population, thus allowing us to makegeneralizations from the result of the sample back to the population.

Disadvantage i. ii.

Difficult to get a complete and up-to-date sampling frame. We may end up with a clustered selection of subjects.

1.7.2 Systematic Sampling A random sampling process in which every kth (e.g every 4th) element or member of the population is selected for the sample after a random start is determined Procedure 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑖𝑧𝑒 . 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒

i.

Determine value 𝑘 =

ii.

Use a table of random numbers to select one number between 1 and k. Say this number ism. This is the first element of the sample. The rest of the members of the sample will be elementsm + k, m + 2k, m + 3k, ……until the desired sample size is obtained.

iii.

9

Chapter 1

Introduction to Statistics

Example: Suppose population size (N) = 2000, sample size (n) = 50. Hence k = 2000/ 50 = 40. Use a table of random numbers to select a number between 1 and 40. Suppose the number selected is 15. This is the starting point for selecting every 40th subject. With the list of the 2000 subjects in the sampling frame, we select subject number 15, 15+40=55, 15+55=70, …..until the sample size is reached.

Advantage i. ii.

It is faster and simpler than simple random sampling. There is assurance that the population will be evenly sampled

Disadvantage The process of selection can interact with a hidden periodic trait within the population. If the sampling technique coincides with the periodicity of the trait, the sampling technique will no longer be random and representativeness of the sample is compromised.

1.7.3 Cluster Sampling The population is first listed by clusters or categories. Then we randomly select 1 or more clusters and take all of their elements. Advantage i.

ii.

This sampling technique is cheap, quick and easy. Instead of sampling an entire country when using simple random sampling, the researcher can allocate his limited resources to the few randomly selected clusters or areas when using cluster samples. The researcher can also increase his sample size with this technique. Considering that the researcher will only have to take the sample from a number of areas or clusters, he can then select more subjects since they are more accessible.

Disadvantage i.

ii.

The sample may not be a representative of the population. If the individuals within a cluster have similar characteristics, there is a chance that the researcher can have an overrepresented or underrepresented cluster which can skew the results of the study. There is a possibility of high sampling error. This is brought by the limited clusters included in the sample leaving off a significant proportion of the population unsampled.

10

Chapter 1

Introduction to Statistics

1.7.4 Stratified Sampling The population is divided into subgroups, called strata, according to some variable or variables in importance to the study.Variables often used include: age, gender, ethnic origin, SES (socioeconomic status), diagnosis, geographic region, institution, or type of care.A common approach to stratification is by proportional method whereby subgroup sample sizes equal the proportions of the subgroup in the population. Example: A medical and health sciences college population has15% medical students, 25% biomedical students, 25% nursing students and 35% nutrition students. With proportional sampling the sample has the same proportions as the population.

Differences between Cluster Sampling and Stratified Sampling In stratified random sampling, all the strata of the population is sampled while in thecluster sampling researcher only randomly selects a number of clusters from the collection of clusters of the entire population. Therefore, only a number of clusters are sampled, all the other clusters are left unrepresented.

Quick Check 3 For the Example above, calculate the number of students that must be selected from each department to form a sample of size 40.

1.7.5 Convenience Sampling  Selection of the most readily available people or objects for a study.  No way to determine representativeness.  Saves time and money Advantage This technique is considered easiest, cheapest and least time consuming. Disadvantage Limitation in generalization and inference making about the entire population. Since the sample is not representative of the population, the results of the study cannot speak for the entire population.

11

Chapter 1

Introduction to Statistics

1.7.6 Quota Sampling  Selection of sample to reflect certain characteristics of the population  Similar to stratified sampling but does not involve random selection  Quotas for subgroups (proportions) are established. For example, if a quota of 50 males and 50 females have been set, the researcher will recruit the first 50 men and the first 50 women that meet certain inclusion criteria Advantages i.

ii.

It allows the researchers to sample a subgroup that is of great interest to the study. If a study aims to investigate a trait or a characteristic of a certain subgroup, this type of sampling is the ideal technique. It allows the researchers to observe relationships between subgroups. In some studies, traits of a certain subgroup interact with other traits of another subgroup. In such cases, it is also necessary for the researcher to use this type of sampling technique.

Disadvantages i. ii.

It may not be totally representative of the population since only the selected traits of the population were taken into account in forming the subgroups. Other traits in the sample may be overrepresented. In a study that considers gender, socioeconomic status and religion as the basis of the subgroups, the final sample may have skewed representation of age, race, educational attainment, marital status and a lot more.

1.7.7 Judgmental Sampling Judgmental sampling is more commonly known as purposive sampling. In this type of sampling, subjects are chosen to be part of the sample with a specific purpose in mind. With judgmental sampling, the researcher believes that some subjects aremorefit for the research compared to other individuals. This is the reason why they are purposively chosen as subjects.

1.8

Data Collection Method

After samples are selected from the population, data are now ready to be gathered from the selected samples using data collection techniques. There are four major techniques: 1.8.1 Personal Interview ...


Similar Free PDFs