BIOC0003 Data analysis test revision PDF

Title BIOC0003 Data analysis test revision
Course Experimental Biochemistry
Institution University College London
Pages 47
File Size 4.6 MB
File Type PDF
Total Downloads 220
Total Views 273

Summary

BIOC0003 Data analysis test revisionChecklist฀ Unit conversion and constants ฀ Stats (Dr Poolman) Formula Glossary Statistical Methods Tables Exercises and answers฀ Acids, bases, and buffers (Dr Maréchal) Formula Exercises and answers฀ Free Energy (G),Redox Potential and pH Electrodes (Dr Maréchal) ...


Description

BIOC0003 Data analysis test revision

Checklist ฀

Unit conversion and constants



Stats (Dr Poolman) Formula Glossary Statistical Methods Tables Exercises and answers



Acids, bases, and buffers (Dr Maréchal) Formula Exercises and answers



Free Energy (G),Redox Potential and pH Electrodes (Dr Maréchal) Formula Exercises and answers



Spectroscopy (Dr Maréchal and Dr Raleigh) Beer Lambert Law Absorbance Plasmid



Protein Purification (Dr Katan) Calculations



Mass Spectra (Dr Thalassinos) Calculating mass from ESI Spectra FWHH (full width at half maximum height) TOF Peptide Sequencing (b,y,a ions) Amino acid mass sheet Systematic approach (including tutorial 6)



Structure and Biophysics (Prof Andres Ramos) Binding affinity and kinetics Forster resonance energy transfer (FRET) Isothermal titration calorimetry (ITC) Tools comparison



Bioinformatics and modelling (Prof Martin) Database analysis Comparative modelling Homology modelling



Practical 1 (Dr Cain) pKa determination Extension questions



Tutorial 9



Amino acids

Unit conversion and constants

Stats (Dr Poolman) 

Formula Standard deviation

The larger the standard deviation the greater the variability

Standard error of mean

Smaller the SEM  greater the n *SEM is always less than S

Coefficient of variation

Smaller the CV  greater the precision

Variance

S2

Confidence interval

A mathematical statement relating the sample mean to the population mean

*The confidence level increases the range of potential values for the population mean also increases



Types of t-test Unpaired t-test

Paired t-test

Test whether two data sets have the same mean Measures the overlap between the data sets such that the smaller the value of tcalc the greater the overlap between the two data sets

Test whether two data sets have the same mean where each value in one set is paired with a value in the other set

One-sample t-test test whether the mean of a data set is equal to a particular value (e.g. Analysis of a standard solution)



Glossary P value

Statistic power

the probability of obtaining a result equal to or 'more extreme’ than what was actually observed when the null hypothesis is true --> how likely to capture a false positive denote how well one does in missing desired results (1 – beta) and will increase with the ability to accurately reject the null hypothesis.

True value

The accurate value that could be measured without error

Population

Complete set of all objects or people in interest (definite/indefinite)

Sample Interquartile range (IQR)

A group of selection from a population Distance between the upper (75th) and lower quartiles (25th)

Standard deviation

A measure of the amount of dispersion of set of values (variability)

Parameter

A set of values establishes/limits how something can or must happen or be done (a numerical summary of a population)



Statistical methods

Normal distribution (Gaussian distribution) Consists of 2 parameters: population mean (μ) + population standard deviation (σ) (the area under curve is normalised to 1) Smaller the S  less likely to find values of a normally distributed random variable far from the population mean



Most observations are near the centre and extend to infinity



Graph is symmetrical, so mean = median = mode



Median values is less effected by outliers

Symbol and equation The more data points the larger the sum --> dividing by N or N-1 to normalise (Units are different) Dividing by N --> unit is squared Dividing by N-1 --> unit is not squared

Hypothesis method Alternative hypothesis (H1) Null Hypothesis (H0)

A claim about the population that is contradictory to null hypothesis Assumed to be correct (no significant difference)

Differences between sets of data are due to chance rather than predicted effects of interest * Only reject/ fail to reject the null hypothesis  DO NOT ACCEPT H1 Type 1 error: the rejection of a true H0 --> false positive (alpha value) Type 2 error: the non-rejection of a false H0 --> false negative (beta value) One-tailed

Unidirectional prediction about the outcome

Two-tailed

No prediction about the direction of the outcome

Alpha level Alpha level= likelihood that the true population parameter lies outside the confidence interval (usually expressed as a proportion)

Example graph: Confidence intervals are expected to span the mean 19 out of 20 times for a 95% confidence level *Mean does not affect the width of the confidence interval



Tables Values of student’s t-test

Critical values of F at the 95% confidence level



Exercises and answers Q1. If the data form a "bell-shaped" normal distribution, approximately what percent of the observations will be contained within two standard deviations around the mean? A: 95% Q2. In a hypothesis test, the probability of obtaining a value of the test statistic equal to or even more extreme than the value observed - given the null hypothesis is true - is referred to as? A: the p-value Q3. If an individual rejects a true null hypothesis, then she/he has? A: made a type I error Q4. A primary school has 335 children, in classes as follows; Reception: 60 Y1: 42 Y2: 59 Y3: 38 Y4: 57 Y5: 34 Y6 40 From the above question, how many degrees of freedom are there? A: 6 Q5. The value in a data set that appears most frequently is called? A: the mode Q6. If two events are both mutually exclusive and collectively exhaustive, the probability that one or the other occurs is? A: 1.00 Q7. In order to reduce the likelihood of committing a Type II error, the researcher could? A: increase the sample size Q8. Which one of the following is the complement of alpha? A: the confidence coefficient Q9. Which of the following statistics always corresponds to the 75th percentile in a distribution? A: 3rd quartile Q10. If the p-value is greater than alpha in a two-tailed test? A: the null hypothesis should not be rejected Q11. A t-test is a significance test that assesses? A: The means of two independent groups Q12. A confidence interval estimate is constructed around? A: the point estimate Q13. If the p-value is greater than alpha in a two-tailed test? A: the null hypothesis should not be rejected Q14. If the mean of a numerical data set exceeds the median, the data are considered to be leftskewed. A: false Q15. If the p-value is less than alpha in a one-tailed test: the null hypothesis should be rejected

Q16. A 95% confidence interval for the mean can be interpreted to mean that: if all possible samples are taken and confidence intervals are calculated, 95% of those intervals would include the true population mean somewhere in their interval. Q17. A height measurement of 6ft, is at the 90th percentile. This means that: 90% of people are shorter than 6ft Q18. The value of alpha for a 98% confidence interval would be 0.02 Q19. When is a paired-samples t-test used? A: When comparing two separate groups that share a common feature Q20. In a sample of size 25, in order to compute any sample statistic, how many sample values are free to vary? A: 24

Acids, bases, and buffers (Dr Maréchal) ฀

Formula



Exercises and answers

Free Energy (G), Redox Potential and pH Electrodes (Dr Maréchal) ฀

Formula



Exercise and answers

Spectroscopy (Dr Maréchal and Dr Raleigh) ฀

Beer Lambert Law



Exercises and answers

Q2. How many microlitres of your purified protein will you load per lane? Express your answer in microlitres to one decimal place. Weight of protein per lane/ concentration of purified protein = 10/0.64 = 15.6 Q3. You have 15 µg of pUC19 plasmid DNA. The average weight of a nucleotide is 330g/mol. How many moles of nucleotides is this? Express your answer in nanomoles to the nearest nmol.

Q4. You have 15 µg of pUC19 plasmid DNA, which is 2686bp in size. The average weight of a nucleotide is 330g/mol. How many moles of plasmid is this? Express your answer in picomoles to one decimal place

Protein Purification (Dr Katan) ฀

Formula Specific activity (SA)= Activity/protein Yield = activity/activity in previous step Purification fold = SA/previous SA Cumulative yield = activity/activity in step 1 Cumulative purification fold = SA/step 1 SA



Exercises and answers

Q1. The specific activity of enzyme at each step Q2. The purification-fold and yield for each purification procedure (step) with respect to the previous step Q3. The cumulative purification fold and yield after each procedure with respect to step 1, including the final purification fold and overall yield.

Q4. Which purification procedure used for this enzyme is most effective and which is least effective? Provide a brief explanation of your choice. Based on this, what changes would you recommend? Most effective: Step 4 (purifica4on 16-fold, yield 80%) Least effective: Step 3 (decrease in purity/purification 0.42-fold, yield 33%). Step 3 should be omitted. Size exclusion chromatography may be either omitted or placed at a different stage (e.g. between ion exchange and affinity chromatography to avoid dialysis). Q5. What could be done to confirm the identity and assess the purity of the final enzyme preparation? - From the table, there is no increase in specific activity with further purification steps. Specific activity for a given enzyme at 100% purity is likely to be known and used for comparison.

-SDS PAGE, protein bands show enrichment of one band of predicted size after each purification step, with the final step showing just this band if the purification is complete. -Mass spectrometry (MS) after the last step showing a singe peak of the predicted size and sizes of derived peptide fragments Amino acid sequencing (by MS or other methods) shows known primary structure -Specific antibodies confirm the identity of the protein -ELISA, size exclusion, enzyme assays etc. Q6. Purification tables are no longer frequently used to document protein purification. Explain the main reasons for this change and relevant methodological advances Recombinant technology and tag-based affinity purification allow for a large purification fold in a single step. Often, the recombinant protein is the main protein present in such preparations. Analyses by SDS PAGE, mass spectrometry (advanced methods now routine) and use of specific antibodies (against the tag or the protein of interest) can be applied to identify correct recombinant protein and assess purity. Functional assays for a given biological activity also need to be conducted for a purified preparation. In contrast to purification tables where activity is measured in each fraction and their pools after each purification step, functional assays are usually performed at the end.

Mass Spectra (Dr Thalassinos) ฀

Calculating mass from ESI Spectra

Example:



FWHH (full width at half maximum height) Resolution can be determined by Mass/Δmass.

Higher the resolution, the more discrete the peaks became on the graph. At high m/z + low resolution, only average mass of ions can be obtained



Quadrupole Φ = U + Vcoswt

There are 4 metal rods in the instrument. A voltage of opposite polarity is applied to adjacent pairs of rods (consists of a DC whose amplitude is denoted U and a radio frequency RF component whose amplitude denoted V)

This creates a complex electric field that can attract/repel the ions The ions experience oscillations to and from the rods Specific value of U and V will allow an ion of a particular m/z to traverse the rods safely --> the ion is transmitted along the quadrupole in a stable trajectory RF field can reach the detector Particular frequencies correspond to particular ions



TOF Time that takes for an ion to reach a detector at a known distance is measured Time depends on the m/z (larger ions --> more time taken) 𝒕 = 𝒌√𝒎/𝒛 T = TOF K = constant depending on its instrumental settings and characteristics Reflectron can be used to make the souces with different kinetic energy reach the detector at the same time → sharp peak is obtained



Peptide Sequencing (b,y,a ions) Low-energy collisions induce: Fragmentation along the peptide backbone + side chain fragmentation

3 types of bonds can fragment along the amino acid: NH-CHR, CHR-CO, CO-NH Each bond cleavage gives rise to 2 species (1 neutral + 1 charged) 6 possible fragment ions for each amino acid residue: A,b,c ions having the charge retained on N-terminal fragment X,y,z ions having the charge retained on the C-terminal fragment

CO-NH bonds --> most common cleavage sites --> give rise to b/y ions The major difference between ions are the residues (R-groups) Trypsin is used to cut the peptide due to its high specificity The maximum number of charges that peptide can carry is related to the basic amino acid residues it contains (R, K, H). Y and b ion fragments contain R, K, Q, and N --> lose ammonia (-17)/ contain S,T, E --> lose water (-18) The difference between b and y ions corresponds to residue mass Residue mass = mass of amino acid - H2O

Low mass immonium ions Immonium ions: An internal fragment with just a single side chain (H2N+=CHR). Usually the first peak of the MS. Useful for detecting and confirming residues but no position info MH+ = [NH2] + [M]n + [C] + H+ [NH2] = mass of N-terminating group (H=1) [COOH] = mass of C-terminating group (OH=17) [M]n = sum of the masses of n amino acid residues (+ any PTMs) [Mass of bx] + [Mass of yn-x] = mass of MH + 1 A ions Mass difference: (16+12=)28 Da < the corresponding b ion Identification of ion pairs with a mass difference is useful during the sequencing process MH+ - (Mass of residue lost + CO +H2O) Ax = [Mass of residues retained] + [NH2] – CO Mass of residues retained -27 (when N-terminal group is H) --> Ax = 1 + [M]n - 28 = [M]n - 27

B ions Series is formed by the loss of H2O and successive residues from the C-terminus although it is rare to observe (stable in cyclic form) MH+ - (Mass of residues lost + H2O) Bx = [Mass of residues retained] + [NH2] Mass of residues retained + 1 (when N-terminal group is H) --> Bx = [M]n + 1

Y ions

Series consist of ions that lose successive amino acid residues from the N-terminus of the peptide. Difference between neighbouring y ions = mass of residue Yx = [COOH] + [M]n + H + H+ [Mass of residues retained] + 19 (when C-terminal group is OH) --> Yx = [M]n + H2O + H+ = [M]n + 19



Amino acid mass sheet Table 1 Atomic and amino acid residue masses Amino Acid

3 Letter Code

Single Letter Code

Amino Acid Residue Mass mass (Da) (Da) (monoisotopic) (monoisotopic)

Glycine

Gly

G

75.03

57.02

Alanine

Ala

A

89.05

71.04

Serine

Ser

S

105.04

87.03

Proline

Pro

P

115.06

97.05

Valine

Val

V

117.08

99.07

Threonine

Thr

T

119.06

101.05

Cysteine

Cys

C

121.02

103.01

Isoleucine

Ile

I

131.09

113.08

Leucine

Leu

L

131.09

113.08

Asparagine

Asn

N

132.05

114.04

Aspartic Acid

Asp

D

133.04

115.03

Glutamine

Gln

Q

146.07

128.06

Lysine

Lys

K

146.10

128.09

Glutamic Acid

Glu

E

147.05

129.04

Methionine

Met

M

149.05

131.04

Histidine

His

H

155.07

137.06

Phenylalanine

Phe

F

165.08

147.07

Arginine

Arg

R

174.11

156.10

Tyrosine

Tyr

Y

181.07

163.06

Tryptophan

Trp

W

204.09

186.08

H

1.01

-

O

15.99

-

C

12.01

-

P

30.97

Atoms

Table 2 m/z values of common immonium ions

Table 3 m/z values of b2 ions



Systematic approach Mass of b-ions = ∑ (residue masses) + 1 (H+) Mass of y-ions = ∑ (residue masses) + 19 (H2O+H+) Mass of a-ions = mass of b-ions – 28 (CO) [Mass of bx] + [Mass of yn-x] = mass of MH + 1

Example:

1. Identify MH+ ion (the peak with the highest m/z on the graph) MH+ --> 875.54 2. Divide m/z by 110 to get probable amino acids 875.54/110=7.9595 (~8 amino acids) 3. Identify immonium ions if possible (the peak with the lowest m/z) [a1] = immonium ions --> 86.11 4. Identify b2 ion at (residue 1 + residue 2 + 1) and look for a2 ion 28 Da lower [b2] = 191.06 --> C/S or S/C (N-terminal) *Use the table of b2 ion combinations [a2] = 163.06 5. May suggest yn-2 and yn-1 and the N- terminal sequence [Mass of bx] + [Mass of yn-x] = mass of MH + 1 [yn-2] = MH+ +1 - [b2] = 875.54+1-191.06=685.48 6. Identification of y1 ion e.g. m/z 147 or 175 for Lys-OH/Arg-OH from trypic peptide helps to suggest bn-1 ion [y1] = 147.12-1.01 = 146.11 --> K (C-terminal) 7. Difference between neighbouring y ions = mass of residue : *(Use the table of residue mass)

Difficulties  Fragmentation can be more complicated  Lys and Glu have the same nominal mass --> they cannot be distinguished  Ile and Lys are structural isomers --> same molecular weight  Quite sophisticated computer programs now exist to aid in the sequencing of peptides by mass spec but still make mistakes  X is sued to indicate a residue  N-terminal cleavage at a Pro residue increases the y ion intensity --> proline effect

Structure and Biophysics (Prof Andres Ramos) ฀

Binding affinity and kinetics



Forster resonance energy transfer (FRET)

e- in the acceptor molecule can decay radiatively Limitation: orientation factor Perturbation of the system Distance convert Monitor either the decrease in donor fls or the increase in acceptor fls



Isothermal titration calorimetry (ITC)



Tools comparison

Bioinformatics and modelling (Prof Martin) ฀

Database analysis



Comparative modelling



Homolog...


Similar Free PDFs