Engineering Data Analysis Handsout Module 1-6 PDF

Title	Engineering Data Analysis Handsout Module 1-6
Course	Engineering Data Analysis
Institution	Technological Institute of the Philippines
Pages	12
File Size	706 KB
File Type	PDF
Total Downloads	113
Total Views	180

Preview

CLICK TO PREVIEW PDF

Summary

ENGINEERING DATAANALYSIS(Summary)Lesson 1: Data CollectionData Collection – is a systematic way of gathering and measuring information on different groups of people. The data collected can be used on research, testing hypothesis, and other intended purposes.TYPES OF DATA1. Quantitative - sets of dat...

Description

ENGINEERING DATA ANALYSIS (Summary)

1st Semester, S.Y. 2020-2021

Handsout for CE 023 (Engineering Data Analysis)

EXPERIMENTATION

Lesson 1: Data Collection Data Collection – is a systematic way of gathering

On the other hand, experimentation

and measuring information on different groups of

is the collection of data in a more controlled

people. The data collected can be used on research,

manner. One example is the data you collected

testing hypothesis, and other intended purposes.

as a result of your laboratory experiments. Kindly note that the experimentation process is

TYPES OF DATA

not limited inside a laboratory. Most of the

1. Quantitative - sets of data in numerical form, can

companies use experimentation in order to test

be either counted or measured.

their hypothesis. For example, a company can

• Discrete Data - data that can be "counted"

launch

a

salespeople

(e.g. No. of Pencils, No. of People)

sales

competition to

react

to

different

test levels

how of

performance incentives. •

Continuous

"measured"

(e.g.

Data

-

data

Height,

can

Weight,

be and

Temperature)

Probability - is a measure of the likelihood that a

2. Qualitative - sets of data that is more on

• Binary Data - falls under two mutually categories

particular event will occur. To compute the probability of a particular event to happen:

characteristics and classification

exclusive

Lesson 2: Introduction to Probability

(e.g.

right/wrong,

true/false) • Nominal Data -named categories with no specific rank or order (e.g. blue/red/green)

Probability of an event = (number of ways it can happen) / (total number of outcomes) If an event is certain to happen, Probability = 1. If an event is impossible to happen, the Probability of that event = 0. Therefore, the Probability value is ranging from 0 to 1.

• Ordinal Data - categories with specific rank or natural order (e.g. short, medium, tall)

Probability can be expressed into a decimal, fraction, or percentage. Let's take a look at these

Note: Data collection involves either sampling or experimentation. SAMPLING If you are collecting data about a group of people, say, about 10 students. It is easy to tally and record them accordingly. But if the statistical population is too large to survey, it is better to use gather data within a sample size only. This process is called sampling. Sampling is the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population

examples:

1st Semester, S.Y. 2020-2021

Handsout for CE 023 (Engineering Data Analysis)

Lesson 2.1: Basic Rules of Combining

2. MULTIPLICATION RULE

Probabilities

(a) The basic idea for calculating the number of

There are basic rules to follow on combining

ways can be described as follows: If an operation

probabilities:

can be performed in n1 ways and if for each of these ways a second operation can be performed

1. ADDITION RULE

in n₂ ways, then the two operations can be

(a) If the events are mutually exclusive, there is no

performed together in n₁n₂ ways.

overlap: if one event occurs, other events cannot

Note: For more than two operations: If an operation can be

occur. In that case, the probability of occurrence

performed in n ₁ ways, and if for each of these a second

of one or another of more than one event is the

operation can be performed in n ₂ ways, and for each of

sum of the probabilities of the separate events.

the first two a third operation can be performed in n ₃ ways,

Mutually exclusive events mean two or more events cannot happen at the same time. (b) If the events are not mutually exclusive, there can be overlap between them. This can be visualized using a Venn diagram. The probability of overlap must be subtracted from the sum of probabilities of the separate events

and so forth, then the sequence of k operations can be performed in n ₁n ₂ ··· nk ways.

(b) The simplest form of the Multiplication Rule for probabilities is as follows: If the events are independent, then the occurrence of one event does not affect the probability of occurrence of another event. In that case, the probability of occurrence of more than one event together is the

Set Relations on Venn Diagram Let's look at the Venn diagram (b) and (c)

product of the probabilities of the separate events. (This is consistent with the basic idea of counting stated above.) If A and B are two separate events

• P [A ∩ B) = P [occurrence of both A and B],

that

the intersection of events A and B.

probability of occurrence of both A and B together

• P [A ∪ B) = P [occurrence of A or B or both],

is given by P [A ∩ B] = P [A] × P [B]

are independent

of

one another,

the

the union of the two events A and B. (c) If the events are not independent, one event •If two events being considered, A and B, are

affects the probability of the other event. In this

not mutually exclusive, and so there may be the

case, conditional probability must be used. The

overlap between them, the Addition Rule

conditional probability of B given that A occurs, or

becomes P (A ∪ B) = P (A) + P (B) – P (A ∩ B)

on condition that A occurs, is written P [B | A].This

If three events A, B, and C are not mutually

is read as the probability of B given A, or the

exclusive:

probability of B on condition that A occurs.

P (A ∪ B ∪ C) = P (A) + P (B) + P (C) – P (A ∩ B) – P (A ∩ C) – P (B ∩ C) + P (A ∩ B ∩ C)

Note: The multiplication rule for the occurrence of both A and B together when they are not independent is the product of the probability of one event and the conditional probability of the other:

P [A ∩ B] = P [A] × P [B | A] = P [B] × P [A | B]

1st Semester, S.Y. 2020-2021

Handsout for CE 023 (Engineering Data Analysis)

Lesson 3: Permutation and Combination

Lesson 3.1: SAMPLING DISTRIBUTION

Permutation - is an arrangement of all or part of a

Population and Sample

set of objects. The number of permutations is the number of different arrangements in which items can be placed. Notice that if the order of the items is changed, the arrangement is different, so we have a different permutation. In permutations, the order is important!

Often in practice, we are interested in drawing valid conclusions about a large group of individuals or objects. Instead of examining the entire group, called the population, which may be difficult or impossible to do, we may examine only a small part of this population, which is called a

• Rule1. The number of permutations of n objects

sample. We do this with the aim of inferring

is n!

certain facts about the population from results

• Rule2. The number of permutations of n distinct

found in the sample, a process known as statistical inference. The process of obtaining

objects taken r at a time is nPr = n! / (n − r)!

samples is called sampling. Let's take a look at • Rule3. If n items are arranged in a circle, the

these examples below.

arrangement doesn’t change if every item is moved by one place to the left or the right. Therefore in this situation, one item can be placed at random, and all the other items are placed concerning

the first

item.

The number

of

a. We may wish to draw conclusions about the weights of 12,000 adult students (the population) by examining only 100 students (a sample) selected from this population.

permutations of n objects arranged in a circle is (n

b. We may wish to draw conclusions about the

− 1)!

percentage of defective bolts produced in a

• Rule4. The number of distinct permutations of n things of which n1 are of one kind, n2of a second kind, ... , nk of a kth kind is

factory during a given 6-day week by examining 20 bolts each day produced at various times during the day. In this case, all bolts produced during the week comprise the population, while the 120 selected bolts constitute a sample.

Combinations - are similar to permutations, but with

c. We may wish to draw conclusions about the

the important difference that combinations take no

fairness of a particular coin by tossing it

account of order. Thus, AB and BA are different

repeatedly. The population consists of all possible

permutations but the same combination of letters.

tosses of the coin. A sample could be obtained by

Then the number of permutations must be larger

examining, say, the first 60 tosses of the coin and

than the number of combinations, and the ratio

noting the percentages of heads and tails.

between them must be the number of ways the chosen items can be arranged.

d. We may wish to draw conclusions about the colors of 200 marbles (the population) in an urn by

In general, the number of combinations of n items

selecting a sample of 20 marbles from the urn,

taken r at a time is

where each marble selected is returned after its color is observed.

Handsout for CE 023 (Engineering Data Analysis) Sampling With or Without Replacement

1st Semester, S.Y. 2020-2021 Sample Mean

If we draw an object from an urn, we have the choice of replacing or not replacing the object into the urn before we draw again. In the first case, a particular object can come up again and again, whereas in the second it can come up only once. Sampling where each member of a population may be chosen more than once is called sampling with replacement, while sampling where each member cannot be chosen more than once is called sampling without replacement. A finite population that is sampled with replacement

Sampling Distribution of Means

can theoretically be considered infinite since samples of any size can be drawn without exhausting the population. For most practical purposes, sampling from a finite population that is very large can be considered as sampling from an infinite population. Sample Distribution The sampling distribution describes the expected behavior of a large number of simple random samples drawn from the same population.

Lesson 3.2: POINT ESTIMATION Point Estimate A Point Estimate of a parameter θ is a single number that can be regarded as a sensible value for θ. A point estimate is obtained by selecting a suitable statistic and computing its value from the given sample data. The selected statistic is called the point estimator of θ.

1st Semester, S.Y. 2020-2021

Handsout for CE 023 (Engineering Data Analysis) Unbiased Estimators Suppose

(2) Statistics helps in the proper and efficient

we

have

two

measuring

instruments; one instrument has been accurately

planning of a statistical inquiry in any field of study.

calibrated, but the other systematically gives

(3) Statistics helps in collecting appropriate

readings smaller than the true value being

quantitative data.

measured.

When each instrument

is

used

repeatedly on the same object, because of measurement error, the observed measurements will not be identical. However, the measurements

(4) Statistics helps in presenting complex data in a suitable tabular, diagrammatic, and graphic form for easy and clear comprehension of the data.

produced by the first instrument will be distributed about the true value in such a way that on

(5) Statistics helps in understanding the nature

average this instrument measures what it purports

and pattern of variability of a phenomenon

to measure, so it is called an unbiased instrument.

through quantitative observations.

The second instrument yields observations that have a systematic error component or bias. Note: A Point Estimator theta estimator of θ. If

(6) Statistics helps in drawing valid inferences, along with a measure of their reliability about

is said to the unbiased

is not unbiased, the difference E(θ )

the population parameters from the sample data.

Descriptive statistics - is the term given to the analysis

- θ is called the bias of .

of data that helps describe, show, or summarize data

Point Estimates and Interval Estimates

in a meaningful way such that, for example, patterns An estimate of a population parameter given by a single number is called a point estimate of the parameter. An estimate of a population parameter given by two numbers between which

might emerge from the data. Descriptive statistics is at the heart of all quantitative analysis. Descriptive statistics do not, however, allow us to make

the parameter may be considered to lie is called

conclusions beyond the data we have analyzed or

an interval estimate of the parameter.

reach conclusions regarding any hypotheses we might

Note: A statement of the error or precision of an

have made. They are simply a way to describe our data.

estimate is often called its reliability.

Typically, there are two general types of statistic that

Lesson 4: Introduction to Statistics

are used to describe data: Measures of Central Statistics - is defined as a branch of mathematics that deals with the collection, analysis, interpretation, and presentation of masses of numerical data. What is the use of statistics? (1)

Statistics

understanding

helps and

in exact

phenomenon of nature.

providing

a

description

better of

a

Tendency and Measures of Variability.

1st Semester, S.Y. 2020-2021

Handsout for CE 023 (Engineering Data Analysis) Measures of Central Tendency - used a single

4. Standard Deviation - It is defined as the square root of the variance.

value to describe the center of a data set. The mean, median, and mode are all the three measures of central tendency.

NOTE:

1.) Mean - is the arithmetic average, calculated

Kurtosis - the sharpness of the peak of a frequency-

by finding the sum of the study data and dividing

distribution curve.

it by the total number of data

Skewness - the measure of the asymmetry of the probability

2.) Median - is the middle value of the

distribution of a real-valued random variable about its mean

distribution. It is calculated by first listing the data in numerical order then locating the value in the

Lesson 5: Curve Fitting, Regression, and

middle of the list.

Correlation Odd set of data - the middle value Even set of data - the average between two

Curve Fitting The general problem of

middle values

3. Mode - is the value that appears most

finding equations of

approximating curves that fit given sets of data is called curve fitting.

frequently in the set of data Measures of Variation - indicates how spread out the study data is from a central value, i.e. the mean.

Underfitting (high bias, low variance.) - too simple to explain the variance. If we have underfitted, this means that the model

The following are the commonly used measures

function does not have enough complexity

of variation:

(parameters) to fit the true function correctly.

1. Range

-

the

difference

between

the

maximum and minimum data

Overfitting (low bias, high variance) - forcefitting, too good to be true, If we have

2. Interquartile Range – quartiles divide the

overfitted, this means that we have too many

range of values into four parts, each

parameters to be justified by the actual

containing one quarter of the values. The

underlying data and therefore build an overly

difference between Q3 and Q1 is called

complex model.

Interquartile range. Like in finding median, it is necessary to list the values in numerical order. In case there will be 2 values lying on Q1 or Q3, get the average. 3. Variance - in statistics is a measurement of the spread between numbers in a data set. That is, itmeasures how far each number in the set is from the mean and therefore from every other number in the set.

Regression - is a statistical method used to determine the strength and character relationship

between

one

dependent

of

the

variable

(usually denoted by Y) and a series of other variables (known as independent variables). Simple Linear Regression In statistics simple linear regression is a linear regression model with a single explanatory variable.

1st Semester, S.Y. 2020-2021

Handsout for CE 023 (Engineering Data Analysis) That is, it concerns two-dimensional sample points

Below are called the least-squares equations or

with one independent variable and one dependent

normal equations for estimating the coefficients, a

variable (conventionally, the x and y coordinates in a

and b, by the points (xi, yi)

Cartesian coordinate system and finds a linear function (a

non-vertical straight

line

that,

as

accurately as possible, predicts the dependent variable values as a function of the independent variable. The simplest situation is a linear or straight-line relation between a single input and the response. Say the input and response are x and y, respectively. For this simple situation: EY ( Y) =α + β x Where: α and β are constant parameters that we want to regression

Referring to the equation, y = a + bx, if we get the

coefficients. From a sample consisting of n pairs of

equation of regression given by the points (xi, yi),

data (x, y), we calculate estimates, for α and b for β.

these are the formulas you need to remember:

estimat...