Week2 recap PDF

Title Week2 recap
Author Anouck van der Vlist
Course Introduction to Bioinformatics
Institution Wageningen University & Research
Pages 43
File Size 3.6 MB
File Type PDF
Total Downloads 19
Total Views 487

Summary

BIF-Introduction to BioinformaticsWeek 2 – Recap and reviewAnne Kupczok & Rens HolmerShould I still finish the practical?§ It is strongly recommended to work your way throughthe provided materials§ You need to obtain the necessary practical skills now§ Just looking at the answers will likely not...


Description

BIF-20306 Introduction to Bioinformatics Week 2 – Recap and review Anne Kupczok & Rens Holmer

Should I still finish the practical?

§ It is strongly recommended to work your way through the provided materials

§ You need to obtain the necessary practical skills now § Just looking at the answers will likely not be enough to do well on the project, which is part of the exam

● Passive versus active knowledge/skills

§ Practice now, apply during the project § Consider giving yourself a time limit for each assignment to ensure you get through the exercises

● The aim is to deal with each topic at least once

How to formulate answers?

§ We need to be able to follow your reasoning ● A screenshot is not sufficient

§ Describe ● Your observations ● Your assumptions, if any ● Your interpretation (this requires some expectation)

§ Both in the exam and in the project report

Make your own checklist

§ Write down ● Tools ● Databases ● Formats ● Etc.

§ You can use this list during the project to see what tools and databases you can use for the analysis

How can we improve?

Self-test

§ 52 out of 149 took the self-test (status @9:00) § Class average 78% § Note that the self-test is not at exam level; it is checking whether you know the basic concepts (like in part A of the exam)

§ A test exam will be provided later in the course

6

Dot-plot analysis

§ Detect repeats Example: MRRPDFMDALQGFLSPLNPAHQLDFMDSLGNLRLEECRIM

Scoring alignments

Many typos in BLOSUM62

Up to date matrix: https://www.ncbi.nlm.nih.gov/Class/FieldGuide/BLOSUM62.txt

Scoring alignments Gap opening -4 Gap extension -1

R L A - - V T L T A S L -1+4+0-4-1+4 = 2

Choice of substitution matrix depends on the problem to be solved Different substitution PAM (Point Accepted Mutations) matrices are used for different purposes. Is the BLOSUM (BLOck SUbstitution Matrix) following statement correct? PAM120 and BLOSUM80 substitution matrices work well on distantly related sequences. True or false? False

12

How do I recognize a good BLAST hit?

§ A good combination of ● E-value ● Percent identity ● Query coverage

What does a BLAST hit mean?

A BLAST search returns only a single sequence with an evalue of 0.4. What does this suggest?

a) The two sequences are likely related b) The two sequence are likely not related

You find the following hit with blast, what can you conclude?

a)

The query has 70% local identity with the cytochrome P450 714C3like protein

b)

The query has 70% global identity with the cytochrome P450 714C3-like protein

c)

The query has 70% homology with the cytochrome P450 714C3like protein

d)

The query is unrelated to the cytochrome P450 714C3-like protein

Flavours of BLAST

You sequenced parts of a gene and you want to know if it also exists in a distantly related organisms for which you only have the nucleotide sequence available. Which blast algorithm do you choose to compare both sequences? tblastx

Query

Database

Uniprot BLAST

Motifs

Take a look at the motif. Which of the statements is true?

a) Position 1, 16, and 31 are conserved in all sequences and thus likely important for the function

b) At position 38, approximately half of the sequences have a F and the other a I

c)

Position 29 is highly variable and thus is likely less important for the function

d) All of the statements above are correct

Primer design

What is the ideal product length?

§ Ideal product length depends on many variables and design preferences!!!

§ Standard PCR: 150-1000 bp ● Typically 150 – 300 bp is used, but it strongly depends on the research question

§ Other flavors of PCR require different product lengths

PCR flavors PCR rtPCR qPCR

qrtPCR

Polymerase Chain Reaction Reverse Transcriptase PCR, used to amplify RNA Quantitative PCR, used to quantify DNA abundance in real time during amplification with the use of fluorescent dye (confusingly sometimes referred to as rtPCR) Quantitative Reverse Transcriptase PCR, used to quantify RNA (cDNA) abundance

PCR & introns

Genomic DNA PRIMER2

PRIMER1

mRNA (cDNA) PRIMER1

PRIMER2

Primers designed in the exons give a length difference between the products from DNA and mRNA

PCR & introns

Genomic DNA

mRNA (cDNA) PRIMER1

PRIMER2

Primers designed on the exon-exon junction only amplify the mRNA (cDNA)

PCR & introns

Genomic DNA PRIMER1

PRIMER2

mRNA (cDNA) Primers designed on the splice site only amplify genomic DNA

Primer specificity RefSeq representative genomes

§ Curated § Non-redundant § For the eukaryotes, only one genome is included per species

Nr

§ default database for

nucleotide BLAST (nr/nt) contains all RefSeq RNA records plus all GenBank sequences

§ For proteins the default

database (nr) is a nonredundant set of all CDS translations from GenBank along with all RefSeq, UniProtKB/SwissProt, PDB and PRF proteins.

Check against nr instead of RefSeq representative genomes

Genome annotation

Structural versus functional annotation

§ Structural annotation: identification of genomic elements ● ncRNAs ● Repeats (TEs, satellites, terminal repeats, simple) ● Protein-coding genes

§ Functional annotation: assigning biological information ● Biological/biochemical function ● Cellular components ● Protein domains ● Enzyme codes (EC numbers)

Ab initio versus evidence-based

§ Ab initio methods ● Only use the genome sequence itself ● Predict based on a model of what a gene should look like

● Such models must be trained on appropriate data

§ Evidence-based methods ● Use external evidence, like RNAseq, ESTs,

homologous proteins, genome-to-genome alignments

What is the correct order?

1. Blastp 2. Evidence-based predictions 3. Repeat masking 4. Integration/consensus 5. Ab-initio gene predictions 6. GO annotation

What is the correct order?

1. Repeat masking 2. Ab-initio gene predictions 3. Evidence-based predictions 4. Integration/consensus 5. Blastp 6. GO annotation

Structural

Functional

Protein-coding genes are described by

A GFF file

A GFF file

It’s a regular text file. Open in Wordpad, Notepad, Notepad++ on Windows. Open in TextEdit on mac.

Eukaryotic gene structure

3’ UTR 5’ UTR

Exon

Intron

CDS

Exon

Coding sequences - Phase Chr1 Chr1

TAIR10 CDS 24347792 TAIR10 CDS 24348122

24347897 24348258

. .

+ +

0 2

Parent=AT1G65484.1,AT1G65484.1-Pr Parent=AT1G65484.1,AT1G65484.1-Pr

CDS1

CDS2

012 ATGGGT...AAAATAG M G K I

012 TAGAAGAA... V E E

10 20 30 40 50 MGLKMSSNAL LLSLFLLLLC LFSEIGGSET THWKIVEEPV RGQIATPPSL A phase of "0" indicates that the next codon begins at the first base of the region described by the current line, a phase of "1" indicates that the next codon begins at the second base of this region, and a phase of "2" indicates that the codon begins at the third base of this region

GFF file

A line in a GFF file describes the following feature: Chr1 TAIR10 CDS 10 24 . + 0 Parent=AT1G65484.1,AT1G65484.1-Protein; What is the sequence of this CDS given this Chr1 sequence? AGAAGAATAATGGGTTTGAAAATGTCAAGCAATGCACTTC?

GFF file

A line in a GFF file describes the following feature: Chr1 TAIR10 CDS 10 24 . + 0 Parent=AT1G65484.1,AT1G65484.1-Protein; What is the sequence of this CDS given this Chr1 sequence? AGAAGAATAATGGGTTTGAAAATGTCAAGCAATGCACTTC?

ORFfinder

Nice for prokaryotes, less useful for eukaryotes

Week 3 – Sneak Preview

Course overview (weeks) 1

2

3

4

5

6

7

8

Introduction and databases

Sequences

Evolution

Structure and function

-Omics

Project (in groups)

Self study

Exam

Molecular evolution Phylogenetics

Molecular evolution

§ Recover evolutionary

history from sequence evidence (DNA or protein)

Phylogenetics

§ Tree reconstruction § Tree interpretation

Meredith & al. Science 2011...


Similar Free PDFs