Chapter 1 Introduction to Statistics PDF

Title Chapter 1 Introduction to Statistics
Course Introduction To Statistics
Institution University of Botswana
Pages 17
File Size 325.3 KB
File Type PDF
Total Downloads 3
Total Views 147

Summary

Introduction to statistics 114...


Description

Chapter 1 Introduction To Statistics 1.1

Basic Concepts of Statistics

1.1.1 Definition of Statistics The word Statistics can be defined in two ways: Definition 1: Statistics on one hand refers to a mass of data, for example, the number of births and deaths in a locality during a certain period. Definition 2: As a discipline, Statistics refers to a science of collecting, organizing and interpreting numerical information also called data with a view to explaining some relationship that may exist between events and drawing meaningful conclusions. However the occurrence of these events cannot be predicted with certainty. Using statistical methods, masses of raw data can be transformed into meaningful, useful and usable information for prediction and decision- making. 1.1.2 Types of Statistics Descriptive statistics: Used to reveal patterns/trends through the analysis of numeric data. Commonly used descriptive statistics include  Graphical displays such as bar diagram, pie charts, histogram, boxplot, stemand-leaf plot.  Numerical measures such as frequency counts, measures of central tendency (mean, median and mode), measures of position (quartiles, deciles, and percentiles), measures of variability/dispersion (range, inter-quartile range, mean absolute deviation, variance and standard deviations) and, measures of shape (skewness and Kurtosis). Inferential statistics: used to draw conclusions and make generalizations or predictions about a population based on the analysis sample data. 1.1.3 Application of Statistics  Statistics permeates every walk-of-life.  Statistical data and methods are required in social sciences, politics, sports, health, 

accounting, business, management, history, law, science, the kitchen, It has been said that no person who has not studied statistics could function intelligently in today’s world.

1

People of all walks of life, from ordinary consumer, social scientist, permanent secretary, minister, accountant, business manager, social worker, political activist, student etc. need a good understanding of statistics because:  A lot of information on radio, TV, newspapers, involves using numerical facts to persuade people to buy products, change behaviours, support certain political parties, enrol in particular programs, etc.  Most reports involve the collection, analysis and interpretation of data.  Students, academics and graduates have to carry out their own investigations involving the collection, analysis and interpretation of data (as part of degree requirement, teaching and publications or at work when they graduate),  Knowing statistics is part of Acquiring Knowledge and Achieving Success. 1.1.4 Use and misuse of Statistics  Despite its strengths, statistics is one of the most misused subjects. Some examples of misuse are in D Huff’s “How to Lie with Statistics” 

The causes for misuse of statistics arise at all stages of the statistical process such as: - Inappropriate methods of collecting the information (sampling) - Use of unrepresentative subgroups in a study -

Inappropriate methods of summarizing data Wrong choice of data analysis and inferential technique Wrong conclusions from the results

1.1.5 Important Statistical Concepts Population: The entire collection of objects or people about which information is required is called population. An individual member of the population is called a population unit. Sample: A representative subset of the population chosen according to a scientific manner. An individual member of a sample is called a sampling unit. Variable: A characteristic of a population that assumes different values for different entities. Constant: A characteristic of a population that takes on the same value from entity to entity. Observation: The particular value of a variable that one observes (or measures). 1.1.6 Statistical Data Raw Data is data in the original form in which they were collected. That is, raw data refers to unprocessed data.

2

Example 1.1: DST 123 Final marks for Semester 2 of the 2018/2019 Academic Year ID No.

Test 1

Test 2

CA

Exam

Final

1

87

61

74

69

72

2

87

59

73

53

63

3

88

60

74

32

53

4

87

72

80

55

68

5

88

55

72

21

47

6

88

0

44

51

48

1.1.7 Types of Data (a) Time series data: A sequence of observations collected over a usually regular interval of time or space.at some instants in time. Examples: Monthly sales of a departmental store made in 2017, annual youth unemployment rate for Botswana during 2014-2018, monthly or daily temperature and rainfall readings for Gaborone in 2018.

(b) Cross-sectional data: Observations taken on a group of individuals or animals at the same time or approximately the same point in time. Examples of this kind of data include sample surveys data or census data, i.e. data that arise from a complete enumeration of individuals, objects, animals, etc. (c) Longitudinal data: These are data taken on a group of people at regular intervals of time usually aimed at finding out if there has been any change in behaviour, or some other characteristics. For example, measurements can be taken on a baby’s weight, height every fortnight to determine if there has been changes in these characteristics; measurements at regular intervals of time on women suffering from obesity after being placed on some diet. 1.1.8 Sources of Data There are two kinds of data sources and these are primary and secondary sources. Primary data: primary data consist of raw information collected at first hand in order to satisfy the purposes of a particular statistical enquiry. Such data are called primary data

3

as they are said to come from the scratch and the sources of the data are primary data sources. Examples of primary data will include census data from census surveys, sample surveys data in their original form, etc. Advantages of using Primary data i) Relevance of the data to the problem at hand ii) Greater control and accuracy are ensured iii) Data can be easily manipulated in terms of definitions of basic terms and tabulations. Disadvantages of using Primary data i) Data are expensive to collect and take a lot of time ii) Elaborate planning and training are essential for their collections iii) Collecting primary data can be a costly venture. Secondary data: consist of figures that were collected originally to satisfy a particular investigation, but have been used now at second hand, as the basis for a different investigation. This a cost effective way of data usage even though it maybe inadequate in information. Examples of secondary data will include journals articles, magazines, reports, etc. Note that census data which is being used by a second user who didn’t collect the data is also secondary data. Advantages of using secondary data i) Data are easier and less time consuming to collect since they already exist. ii) They are less expensive to collect. Disadvantages of using secondary data i) Data may not be very relevant to the problems at hand. ii) No control can be exercised in terms of the accuracy in collecting procedure and rounding errors in the data iii) The data are no longer amenable to any changes. Data are primary or secondary depending on who collects it and who uses it. To the original collector, the data are primary but if after collecting the data, someone else uses the data for any purpose, the data are secondary to that user.

4

11.9 Types of Variables What is a variable? A characteristic that assumes different values for different entities is called a variable. Take for instance a characteristic like age of STA 116 students. One students’ age may be 17, yet the other students will be 21 years of age. Thus if there are a certain number of students in the class, say 120, then each student’s age can be any value within the allowable range of ages. Interestingly if the entity is course, then for each one of the 120 students, the course is the same, i.e. STA 116. A characteristic that retains the same value from entity to entity is called a constant. Statistical data arise from variables, which can either be quantitative or qualitative, categorical or numerical depending on its possible values.

VARIABLE

QUALITATIVE

ORDINAL

QUANTITATIVE

NOMINAL

5

DISCRETE

CONTINUOUS

Quantitative Variables: variables that yield measurable values or scores and frequencies e.g. heights and weights of first year students at a university, women employed in a company, time taken to complete a project.

Statisticians are more concerned with whether a variable is discrete or continuous. A discrete variable is characteristic of an object that can take only countable values

A continuous variable has values that arise out of a measurement. These values a uncountable, i.e. given an interval a continuous variable will take any value within tha interval.

Qualitative Variables: when the values of the variables cannot be expressed numerically but rather are descriptive in nature or just a classification, they are described as qualitative variables. Example: gender (male, female). 1.2

Levels or Scales of Measurement

In measuring attitude, it is assumed that there are underlying dimensions along which individual attitudes can be indicated. These attitudes can then be represented by a numerical score to indicate the individual’s position on the attribute of interest. Scaling is a process of assigning numbers or symbols to the various categories of a particular concept we may wish to measure. The level of measurement of a variable dictates the type of analysis and summaries that can be done. For example, for certain levels of measurement, percentages, pie charts and bar-charts can be used to adequately summarise the data, while averages and standard deviations would be meaningless. The four levels of measurement are: 1. Nominal 2. Ordinal 3. Interval 4. Ratio

6

a) Nominal Variable A Nominal Variable is a variable whose possible values are categories. o Furthermore, there is no ordering among these categories. o Nominal variables allow for only qualitative classification. o That is, they can be measured only in terms of whether the individual items belong to some distinctively different categories, but we cannot quantify or even rank order those categories. Examples include: 

sex of a person (male, female),



Surname of a person



Colour of paint



blood group (A, B, O),



country of origin (Botswana, Cameroon, Ethiopia, U.S.A, India, etc.), Outcome of a toss of a coin (head, tail),



Race, Religion, Government department etc.

Each distinct value can be coded into an arbitrary numerical value. All we can say is that 2 individuals are of different races, gender or country, but we cannot say which one "has more” of the quality (gender, race, etc.) represented by the variable.

b) Ordinal Scale: An Ordinal Variable is similar to a nominal variable, in that its possible values are also categories. o

The two differ in that the values of an ordinal variable have some natural ordering or ranking.

7

o

Ordinal variables allow us to rank order the items we measure in terms of which object has less and which has more of the quality represented by the variable.

o

However, they still do not allow us to say "how much more." A typical example of an ordinal variable is the examination grade of students (A, B, C, D, E).

o

We know for example, that grade A is higher/better than grade B but we cannot say that it is, for example, 18% higher.

Examples: a) Type of house in Gaborone: Low Cost, Medium Cost, High Cost and SHHA b) A Patient reaction to drug dosage. The possible reactions (values) may be: no reaction, mild-, moderate- or severe-reaction The difference between a mild and moderate reaction is difficult or impossible to quantify and is based on perception. Moreover, the difference between a mild and moderate response may be greater or less than the difference between a moderate and severe response.

Other examples include 

Performance in a Job (Excellent, Very Good, Good, Poor, Very Poor);



Ranking of beauty a contestant (Queen, 1st Princess, 2nd Princess, etc.);



Level of satisfaction (Very Satisfied, Satisfied, Neutral, Dissatisfied, Very Dissatisfied);



Dose Level of a medication (Low, Medium, High), etc.

c) Interval Scale o The values have a natural ordering (just like in Ordinal), Example: 

An arrival time of 7:00am is earlier than one of 9:00am



John was born at 1045hrs 0n 03/09/2000 while Peter was born at 0915hrs on the same day, therefore Peter is older than John. 8

o The difference between successive values also has a meaning. Example: 

Motsami arrives at their wedding at 10:10hrs while Mpho, the bride arrives at 13:10hrs. Motsami’s waiting time was 3hrs. In fact he almost left!!



A temperature increase from 10°C to 30°C is twice as much as an increase from 30°F to 40°F.

o A value of zero doesn’t indicate the “absence” of the characteristic. Example: 

The temperature in Rebecca’s fridge is 0°C. This does not mean that there is no temperature in the fridge.



Mpho is born on Monday at 0000hrs. This does not imply that Mpho has no birth time.

(What is the day of birth Mpho?)

o No meaning is attached to the ratio of any two values. Example: 

The ratio of arrival times (8am/10am) has no meaning.



Bath water at 80°F is not twice as hot as that at 40°F

Summary: Interval variables allow us not only to rank order the items that are measured, but also quantify and compare the sizes of differences between them, but not to take ratio of values d) Ratio Scale: o

The values have a natural ordering (just like in Ordinal & Interval), Example: 

An waiting time of 20mins is longer than one of 10mins.



Naledi weighs 100kg while Sechele weighs 80kg. Therefore, Naledi is heavier (has more weight) than Sechele.

9

o

The difference between successive values also has a meaning. Example: 

Motsami waited 3hrs for his bridegroom while Khama waited 1hr for his bridegroom. Motsami waited for 2hrs more that Khama.



Naledi weighs 100kg while Kago weighs 80kg. Therefore, Naledi is 20kg heavier than Kago.

o

A value of zero indicates the “absence” of the characteristic. Example: 

The number of telephone calls received in my office today is zero. Therefore No phone call has been received in my office today.

 o

If Mpho will be zero months old in a months time, then Mpho “does not exist” .

The ratio of any two (non-zero) values has a meaning. Example: 

Motsami waited twice as long as Khama for his bride.



Sechele is 0.8 (80/100) times as heavy as Naledi.

More Examples: Number of … (students, books, countries, anything), “Waiting time”, weight (kg), height (cm),area (hectares), Yield,

Coding of Qualitative values: (Using numbers to represent categories) The Categories of nominal ordinal variables usually coded into numbers for analysis purposes. For example, the sex of a child may be coded (arbitary) as Male=0, Female = 1; or Female =1, Male =2. When coding ordinal variables 

Any set of numbers can be used to represent the values, provided that the natural ordering of the values is maintained. For example, one may choose to code examination grades as:

10

A=1, B=2, C=3, D=4, E=5 and F=6 or A=5, B=4, C=3, D=2, E=1 and F=0, but not as A=1, B=4, C=3, D=6, F=0, E=5 o

Similarly the level of satisfaction may be coded as 10=’Very Satisfied’, 8 ‘Satisfied’, 5 = ‘Neutral’ 3=’Dissatisfied’ 0=’Very Dissatisfied’.

o

The level of severity of a patient’s condition may be defined as 0=’Fine’ 1=’Moderate’ 2=’severe’ 3=’Very severe’ etc.

o

In each of these cases, the differences between the magnitudes of the values do not carry any meaning.

o

What is meaningful is that one number is more or less than the other.

Refer to the Notes in the Reference below.

Un Unde de dersta rsta rstand nd nding ing Le Levels vels an and d Sca Scales les of Me Measu asu asure re remen men mentt in SSoci oci ociology ology No Nomin min minal, al, Ord Ordinal, inal, In Interv terv terval, al, and Ratio Paper Boat Creative/Getty Images by Ashley Crossman Updated June 30, 2019 Level of measurement refers to the particular way that a variable is measured within scientific research, and scale of measurement refers to the particular tool that a researcher uses to sort the data in an organized way, depending on the level of measurement that they have selected. Choosing the level and scale of measurement are important parts of the research design process because they are necessary for systematized measuring and categorizing of data,

11

and thus for analysing it and drawing conclusions from it as well that are considered valid. Within science, there are four commonly used levels and scales of measurement: nominal, ordinal, interval, and ratio. These were developed by psychologist Stanley Smith Stevens, who wrote about them in a 1946 article in Science, titled "On the Theory of Scales of Measurement." Each level of measurement and its corresponding scale is able to measure one or more of the four properties of measurement, which include identity, magnitude, equal intervals, and a minimum value of zero. There is a hierarchy of these different levels of measurement. With the lower levels of measurement (nominal, ordinal), assumptions are typically less restrictive and data analyses are less sensitive. At each level of the hierarchy, the current level includes all the qualities of the one below it in addition to something new. In general, it is desirable to have higher levels of measurement (interval or ratio) rather than a lower one. Let’s examine each level of measurement and its corresponding scale in order from lowest to highest in the hierarchy.

The Nominal Level and Scale A nominal scale is used to name the categories within the variables you use in your research. This kind of scale provides no ranking or ordering of values; it simply provides a name for each category within a variable so that you can track them among your data. Which is to say, it satisfies the measurement of identity, and identity alone. Common examples within sociology include the nominal tracking of sex (male or female), race (white, Black, Hispanic, Asian, American Indian, etc.), and class (poor, working class, middle class, upper class). Of course, there are many other variables one can measure on a nominal scale. The nominal level of measurement is also known as a categorical measure and is considered qualitative in nature. When doing statistical research and using this level of

12

measurement...


Similar Free PDFs