Biostat TA ans PDF

Title Biostat TA ans
Course Biostatistiek
Institution Rijksuniversiteit Groningen
Pages 111
File Size 7.6 MB
File Type PDF
Total Downloads 445
Total Views 532

Summary

DAY 1Exercise 1. Answer the following questions. a. Use an example to explain the difference between a population and a sample. population is an entire group of observations ฀ average age of every student in RUG in the year 2020-2021 ฀ all students in RUG must be measured A sample is part of a popul...


Description

DAY 1 Exercise 1.1 Answer the following questions. a. Use an example to explain the difference between a population and a sample. population is an entire group of observations ฀ average age of every student in RUG in the year 2020-2021 ฀ all students in RUG must be measured A sample is part of a population and its characteristics are used to make a statement (approximation) regarding these characteristics in the entire population ฀ age of every RUG student in Pharmacy for 2020-2021 ฀ must be a subset (NOT a good representative) sample should be selected randomly and independently with or without replacement Sample standard deviation is not a true estimator of population standard deviation b. Indicate which symbols are used for the mean and the standard deviation in a population and in a sample. Population: µ (mean) and σ (standard deviation) Sample:

(mean) and s (standard deviation)

Must take note the correct unit ฀ NOT interchangeable c. Think of a numerical example that clearly illustrates the differences between mode, median and mean. The mode is the most frequent value. The median is the middle value. The mean is the sum of all figures divided by the number of figures. For the following seven figures: 2 5 3 4 2 2 6, the mode is 2, the median is 3 and the mean is 24/7 = 3.4. The mode can be used for all types of variables that can be categorized (nominal variables). The median can be used for all variables that can be ranked (ordinal variables). The mean can be used for continuous variables with interval or ratio scales. Need to understand the definition + function! d. Explain the meaning of variance and standard deviation of a set of numbers. The variance is the mean squared deviation of the individual observations from their mean. The variance is a measure of the variation (= dispersion) of the data ฀ the extent that each data differs from each other A disadvantage of variance is that it is represented in units that are different from the units associated with the original data (i.e. quadratic versus non-quadratic, respectively). For that reason, standard deviation is often used in practice. This is the square root of the variance and because it has

the same units as the observations, the standard deviation intuitively gives a better idea of the dispersion of the observations around the mean. Exercise 1.2 The following variables differ in scale: Speed: 3 m/s, 5 m/s, 10 m/s Time: 1950, 1970, 1999 Exam grades: 5, 6, 7 Colours: yellow, blue, red, green Determine for each variable whether the scale is a nominal, ordinal, interval or a ratio scale. Give reasons for your answer. Speed: Quantitative; ratio scale (set unit; 0-point is set) ฀ metres per second Time: Quantitative; interval scale (set unit; 0-point is random) ฀ second only one variable Exam grades: Qualitative; ordinal (with ranking) Colour: Qualitative; nominal (only categories; other example is sex). Exercise 1.3 A student measures the phosphate concentration of a solution. To increase the accuracy, he carries out the measurement three times, giving him results of 1.22, 1.40 and 1.30 mg/L. In his report, the student claims that the phosphate concentration is 1.30667 mg/L. Is it permitted to state the concentration in this manner? Give reasons for your answer. No, the student does not state that this is the mean of 3 measurements (must state that it is average reading ฀ not definitive). Furthermore, the suggested precision is much higher than the student can justify. Value written based on least no. of significant figures ฀ 3 Answer should be: 1.31 mg/L Exercise 1.4 The following values were obtained from a contents measurement: 100.3250; 99.9962; 100.0123; 100.2543 and 100.1234. Calculate the mean and the standard deviation. What is the number of significant figures of these observations? Round the measurements.

Exercise 1.5 Someone calculates the standard deviation of a series of 10 observations. He finds the following intermediate results: ΣX = 24 and ΣX2 = 56. How can he tell that he has made a calculation error? Variance is Sxx / (n-1); Sxx is positive. 𝑆

𝑥𝑥

2

2

2

2

= ∑ (𝑋 − 𝑋) = ∑ 𝑋 − 𝑛𝑋 = 56 − 10 * 2. 4 =− 1. 6

Exercise 1.6 A researcher wants to use a sample to find out the mean income of pharmacists. He realizes that there is a big difference in income between managing pharmacists, second pharmacists, hospital pharmacists, pharmacists working in industry and pharmacists who work in the public sector. a. He wants to take a representative sample of 100 pharmacists from the total population of approximately 2,100 pharmacists. How will he do this? Separate the pharmacists according to their income group (first categorise them based on their range of income that fits the criteria) and determine the percentage of pharmacists for each group. From each group, a number of pharmacists is selected that corresponds to the percentage found for that group. ฀ see which group (low/mid/high) is majority ฀ then split accordingly This ensures that no group will be overrepresented or underrepresented. Cannot be randomly selected as the results may be bias b. He decides to take a sample of 30 from the group of 630 second pharmacists. Which pharmacists will be selected? A sample of 30 is taken from the group of 630 second pharmacists. The second pharmacists are numbered 1 to 630. Using random numbers, a random sample is taken (must be selected independently as the criteria is already set to be from second pharmacists only). We need three-digit numbers, e.g. the first 3 digits in columns 1 and 2; numbers over 630 are excluded: (993); 502; 261; (813); (697); (858); 196; etc until there are 30 relevant numbers (randomly selected) Exercise 1.7 Use the example of weighing scales to explain that a precision instrument is not necessarily an accurate instrument. Use the terms accuracy, precision and bias. Precision instrument: High precision means little dispersion: repeated weightings are very similar ฀ but does not mean that the value is close to actual value For example: readings of 110mg 112mg 111mg 110.50mg 111.10mg ฀ close to each other but the true value is 150mg If the mean of these weightings is close to the true weight, the apparatus is also accurate (no systematic error). If this mean weight deviates from the true value, a systematic error (bias) exists. Exercise 1.8 The weight (in kilograms) of 129 men was determined. The obtained weights – ranked from low to high a. Calculate the mean, the median, the variance and the standard deviation.

b. Subdivide the weights into classes with a class width of 3 kg. Start with the group 57, 58, 59. Create a table in which you state the frequency and the cumulative relative frequency for each class. Draw a histogram and a cumulative frequency polygon. What is the mode?

Mode: Class 75-77 kg with class mean: 76 kg

DAY 2 Exercise 2.1 Of a group of patients, 25% are given a medicinal drug. Of the entire group of patients, 10% develop headaches. Of the patients who were given the medicinal drug (condition), 25% develop headaches. What percentage of patients with headaches have taken a medicinal drug? Tips to approach probability question: 1. analyse the given variables ฀ let A be the first event… / B be the second event… 2. write down the info known in numerical value ฀ convert from % 3. Understand problem given in probability notation ฀ conditional probability phrase indication: “of this sample [outcome A], [probability] has also [outcome B]…. 4. use appropriate equation given

Pr(A) = 0.25 (medicinal drug) Pr(B) = 0.10 (headache) Pr(B|A) = 0.25 (headache caused by medicinal drug)

𝑃𝑟(𝐴∩𝐵) = 𝑃𝑟(𝐵|𝐴) × 𝑃𝑟(𝐴) = 0. 25 × 0. 25 = 0. 0625 𝑃𝑟(𝐴|𝐵) =

𝑃𝑟(𝐴∩𝐵) 𝑃𝑟(𝐵)

=

0.0625 0.1

= 0. 625 = 62. 5%

Exercise 2.2 In Groningen, 40% of the population have blue eyes, 30% have blond hair and 20% of the population have both blue eyes and blond hair. Based on this information, show that blue eyes and blond hair do not occur independently in the population. If A and B are independent: Pr(A ∩ B) = Pr(A) * Pr(B) ⇒ (product rule) Pr(A ∩ B) = 0.4 * 0.3 = 0.12 ≠ 0.20 ⇒ Therefore, A and B are not independent.

Exercise 2.3 A batch of 1,000 ampoules was prepared. Of these ampoules, 1% are non-sterile. A random, independent sample of 5 ampoules is taken without putting the ampoules back . What is the probability that the batch will be approved? (Hint: none of the 5 ampoules are non-sterile.) Of the 1,000 ampoules, 990 are good. This means that when taking the first sample, the probability of finding a good ampoule is 990/1,000. Once the first ampoule has been taken, there will be 999 ampoules left, of which 989 are good. The probability of taking a good ampoule the second time is 989/999, etc. Combining the 5 events, assuming they are independent: product Pr(approval) = 990/1,000 * 989/999 * 988/998 * 987/997 * 986/996 = 0.951

Exercise 2.4 A sample of 50 patients is prescribed the medicinal drug DUISAL. It is known that 90% of patients taking DUISAL are cured and that the drug causes dizziness as a side effect in 5% of patients. a. What is the probability that more than 5 patients in the sample will not be cured?

Binomial distribution ฀ two outcome: cured / not cured and patients are selected independently

b. What is the probability that more than 6 patients in the sample will experience dizziness as a side effect?

Exercise 2.5 2,423 samples of 5 mice were taken of a population of mice. Of this population 40% is infected. The number of infected mice was recorded for each sample. The data are shown in the table below.

a. Create a graph of the frequency distribution. b. What is the theoretical frequency distribution of the sample called and which parameters characterize it? Calculate these parameters and draw the theoretical frequency distribution in the graph produced in part a. Use binomial distribution equation to obtain expected number ฀ No. expected = probability * sample size

Exercise 2.6 A filter from a Laminar Air Flow Cabinet lets an average of 3.5 particles through per m3 of air. One m3 of air is being tested. What is the probability of detecting more than 5 particles in this volume of air?

Poisson distribution ฀ random event in fixed time and space / infinitely large sample

Exercise 2.7 It is known that a test to diagnose lung cancer has a sensitivity of 78% and a specificity of 98%. Lung cancer is diagnosed in 1% of the population.

a. Calculate the predictive values of the test. b. What is the probability of a test being positive in the population of patients with lung cancer? c. What is the probability of a test being negative in the population of patients with lung cancer? d. If, following the test, lung cancer is detected in a patient, the patient will be referred for radiotherapy. Is the diagnostic test suitable to be applied to all 76,369 women in Groningen? Explain your answer. Predictive values: diseased/total pos = 28.3% and healthy/total neg = 99.8%. The probability of a positive test in the population of diseased individuals is a/(a+b) 78% (sensitivity). The probability of a negative test in the population of diseased individuals is b/(a+b) 22%.

diseased

positive

negative total

596

168

764

healthy

1512

74093

75605

total

2108

74261

76369

Therefore, the number of false positives is 1,512 (healthy women who are referred for radiotherapy) ฀ too large and expensive to get all the testing. The number of false negatives is 168 (women who have lung cancer but are not diagnosed) ฀ big number since lung cancer is a lethal severe disease but a lot of patients were untreated. The sensitivity of the test must be improved.

DAY 3 Exercise 3.1 For each of the following variables, state whether you would expect a binomial, Poisson or normal distribution. Explain your answer. a. The number of viral infections contracted by an individual in one year b. The dissociation time of tablets c. The number of women in a sample of 20 patients taken from a population that has a male : female ratio of 2:3 d. The number of new mutations in a bacterial culture that occurs within a certain time period. a. Poisson distribution ฀ not controlled / occur spontaneously so difficult to determine the average accurately (arbitrary range of value that cannot be predicted accurately); indicates time b. Normal distribution (chosen by excluding other distributions) ฀ range of values based on the sample size of tablets given c. Binomial distribution ฀ only two outcomes ฀ female or male and selected independently d. Poisson distribution ฀ mutation is spontaneous / arbitrary so cannot be quantified with fixed value; indicates time Exercise 3.2 The weights of a population of tablets show a normal distribution with μ = 1,000 mg and σ = 10 mg. a. What percentage of tablets has a weight of more than 1,010 mg? b. What percentage of tablets has a weight of more than 930 mg? c. Above which lower limit in tablet weight do 20% of the population lie? ⇒ X

d. Above which lower limit in tablet weight do 75% of the population lie? ⇒ X e. Between which limits (choose symmetrically) do 25% of the population lie? f. Between which limits (choose symmetrically) do 98% of the population lie? g. What percentage of tablets have a weight between 990 and 1,025 mg?

Exercise 3.3 Codeine tablets are produced from a homogeneous mixture that contains exactly 10.0% codeine. A large number of tablets are weighed: the mean weight is 102.1 mg and the standard deviation is 1.8 mg. What is the mean codeine contents of a tablet and what is the standard deviation of this contents? What is the probability of there being a tablet in this batch with a codeine contents of 10.57 mg or more? It is a linear transformation. Y = aX. Codeine content is 10%, which means a = 0.1. Thus, µcodein = a × µtablet ; σ2codein = a2 × σ2 tablet σcodein = |a| × σtablet = 0.1 × 1.8 =0.18 µcodein = 0.1 × µtablet = 0.1 × 102.1 = 10.21 𝑍 =

10.57−10.21 0.18

= 2. 00;  𝑃𝑟 ( 𝑍 > 2. 00) = 0. 023

Exercise 3.4 A supplier supplies tritium-labelled flunitrazepam that has an activity of 5,500 Bequerel/ml (1 Bq is the decay of 1 atomic nucleus per second). Radioactivity follows a Poisson distribution. Between which limits do 95% of the population lie? (hint to do normal distribution) µ>>15 and therefore the Poisson distribution may be approached with a normal distribution. In a Poisson distribution, the variance is equal to the mean: σ2 = µ = 5500

σ = 74.16

X ~ N(5500,5500) 𝑋 = µ±(1. 96 × σ) = 5500±(1. 96 × 74. 16); 5354 < 𝑋 < 5645

Exercise 3.5 When carrying out a titration three times, we obtained the following results: 99.135, 98.857 and 98.812 mg. What is the number of decimal places to which these results are rounded when presenting the data? Calculate the mean and the standard error. How many decimal places should we use to present the mean?

Data: {99.135; 98.857; 98.812} → standard deviation: s = 0.174947 [Note. n < 10] Results are rounded to 3 decimal places (2 decimal places + 1 because n < 10) and are therefore not rounded to more than 3 decimal places. se (standard error) = s/√3 = 0.1749/√3 = 0.1010; se/2 = 0.05 → 2 decimal places + 1 (n < 10). The mean is also rounded to 3 decimal places: 98.935. With the observations we look at s/2, but with the mean we look at se/2. Exercise 3.6 A testing laboratory carries out a pharmacopoeia test using 2 different samples. The results are shown below. Sample A: 91.82, 91.62 Sample B: 93.50, 93.17, 93.67, 93.10, 93.41 a. Calculate the mean and the standard deviation for both samples. b. What is the standard error of the difference between the means?

DAY 4 Exercise 4.1 A manufacturer tests a batch of aspirin, which has to contain at least 98% acetylsalicylic acid, by performing a titration in duplicate. The standard deviation of the titration is known to be 0.5%. The established acetylsalicylic acid contents were 98.7% and 99.1%. a. Test whether the batch can be approved (meaning > 98%) State the hypotheses. Calculate the p-value and state your conclusions. Also test the approval using the confidence interval. **we always want to reject the null hypothesis so alternate hypothesis is what we want to test** as null hypothesis is the initial claim and the alternate is what we want to challenge HA contains the claim that we wish to demonstrate to be correct.

a) x1 = 98.7 and x2 = 99.1

The question is one-sided. The manufacturer has to approve the batch (µ > 98). Thus, µ > 98 in HA. The manufacturer wants to be very sure that he supplies a good product.

The obtained p-value 98.32%, H0 μ = 98 is not within the CI, therefore H0 must be rejected and HA must be accepted.

J

A more rapid spectrophotometric test has a standard deviation of 1%. Performing the test in duplicate results in the same values: 98.7% and 99.1%. b. Does this change the conclusion you reached under a.?

Se change ฀ so need to find new Se as Z value would be different ฀ find new Z and find corresponding P value Obtained p-value > 0.05: H0 cannot be rejected. The product cannot be approved. c. How can you achieve the same performance for the spectrophotometric test as for the duplicate titrations? Carry out more readings ฀ minimise the random error ฀ so the standard deviation is smaller (less drastic fluctuation) The batch is approved (is established to be true from the testing in previous part) and supplied to a pharmacy. The pharmacist tests the batch using the spectrophotometric test (σ = 1%) and determines the duplicate values: 97.1% and 97.9%. d. What hypotheses does the pharmacist draw up to test whether the batch meets the requirement (at least 98%)? Assuming the product is good ฀ H0 is meet the requirement (µ >= 98) the pharmacist will only return the product if he can reject it (µ < 98). ฀ challenge claim because we want to test if the product meets requirement (already established/ assumed to be true) ฀ want to reject it

Therefore, µ < 98 in HA. The purchaser of the product will only reject the product and return it to the manufacturer if he is at least 95% certain that the product is not good. e. Does the batch meet the requirement? P-value? Also test the approval using the confidence interval.

What are your conclusions and recommendations to the pharmacist? Exercise 4.2 In a clinical study comparing the effects of two medicinal drugs on blood pressure, 20 patients were tested with both drugs. The change in blood pressure compared to the baseline blood pressure was measured. The σ, measured as the difference between individuals, is estimated to be 5, based on previous observations. a. If the statistical test is performed at a 5% level, what is the test’s power of decision if you wish to see a difference of 3 mmHg (HA: μ1 - μ2 = 3 or μ1 - μ2 = -3)?

Two sided ฀ minus twice the Z value (for the case where diff = 3/ diff = -3) b. How many patients must be tested to show a difference of 3 mmHg or more, using a β of 1% (power = 99%)? The question remains two-sided (p = 0.01): 𝑛 =

( )( σ 2 ∆

2

𝑍1 α + 𝑍1

Exercise 4.3

2

)

β 2

=

5 2 3

( ) (1. 96 + 2. 576)

2

= 57. 15 ≡ 58

Consider the following three problems. For each problem, state the null hypothesis and the alternative hypothesis. In each case, describe the meaning of type I and type II errors and which error is worse. Use this to determine whether you would recommend a significance level (α) of 0.01, 0.05 or 0.10. a. Someone has developed a new material to produce a reactor vessel for a nuclear power station. He claims that the new material is ch...


Similar Free PDFs