Bioinformatics 2 PDF

Title Bioinformatics 2
Course Molecular Biology
Institution Murdoch University
Pages 14
File Size 517.1 KB
File Type PDF
Total Downloads 64
Total Views 152

Summary

Download Bioinformatics 2 PDF


Description

WORKSHOP 2: General Bioinformatics

Name: ______ANGELA BALUVURI_________

Student number:_____33695106_____

In this workshop, you will learn·

Searching databases for DNA/protein sequences

·

Finding ORFs and translating DNA sequences

·

Designing PCR primers

·

Carrying out BLAST searches of DNA/protein databases

·

Aligning multiple sequences

·

Building phylogenetic tree

·

Investigating taxonomy of organisms

ASSESSMENT The assessment consists of a series of questions in following exercises - your responses are to be entered into the space provided. Once you completed the exercise write your answers again in the answer sheet at the end of this document. Submit this whole document through the submission link.

LAUNCHING GENEIOUS: If you don’t have Genious on your computer, download Geneious Prime from the following link https://www.geneious.com/download/ Once downloaded, here are the steps you will need to follow to activate this license (internet required for this one-off process): 1. Open up Geneious Prime on the desired computer 2. Go to Help > Activate License via the menu bar 3. Select the ‘License Key’ option, and provide your institutional email address and the offline license serial number: ETAL-400ADA-170E-B42B-88FB-A3B1 4. Select OK, and this will activate the offline license on your machine

The Geneious workplace is shown in Fig B.1. Familiarize yourself with the terms Toolbar, Document Table, Document Viewer and Source Panel in the Geneious interface. Create your folder under ‘Local’ in the ‘Source panel’. Right click on ‘Local’ and select ‘New Folder’ and name it ‘Your Name-BIO282 Bioinformatics-2.’

Fig B.1: The main components of the Geneious interface

EXERCISE 1: ACCESSING GENBANK AND INTERPRETING ITS OUTPUT Most frequently, you start with a DNA or RNA sequence. You need to find out some information about this sequence, what species does it come what gene does it come from, what protein(s) does it encode and so on. A nucleotide sequence by itself tells us nothing. The starting point is to compare it to all the sequences in the database and find those that are the most similar. This is called a BLAST search (Basic Local Alignment Sequence Tool). It is also useful in taxonomy if you want to find the most closely related species. BLAST searches can be carried out with either nucleotide (blastn) or protein (blastp) sequences. 1.1 Carrying out a BLAST search of the Genbank Database. In this exercise, you will be given an unknown DNA sequence to do a BLAST search. Like all databases, if many people are accessing it simultaneously then the output can be slow. Be patient. An estimate of the approximate search time will appear below the menu buttons. Select your newly created folder ‘–Bioinformatics 2’ from the Source panel. Make a new sequence file, click on ‘Sequence’ (in Toolbar panel) and choose ‘New Sequence’. Type or paste the following DNA sequence in the new popup window. CAT CCG TTG CCC ACA CAT GTC GTG ATG TAC AGT ACG GCT GAT TAA TCC G (double check your input – incorrect information will cause you to get all the wrong answers to the questions below) Name the sequence as Query seq 1 and select OK. Now if you select the Query seq 1 (in the Document Table), you will see your sequence in the Document Viewer. Make sure that your sequence appears as a single-stranded sequence. On the right side of the Document Viewer, if you select the General tab (the top tab), you have some options to choose. Make sure all options are unselected except the ‘Wrap’.

Click on BLAST in the Toolbar panel; in the popup window select following options Query: Query seq 1 (your sequence). Database: Nucleotide collection (nr/nt) Program: blastn Results: Hit table Retrieve: Matching region Maximum hits: 20 and then click on Search. The results will appear in a new file under your folder. In the Document Table, sort the results based on E value, the lowest value being on top. (Note: 3e15 is lower than 8e-10) Select ‘Query Centric View’ located at the top in the Document Table, you will see the 20 sequences aligned to your Query sequence. The type of view will be based on the selection of various parameters. For an optimum view, select the following settings under the General Tab (the top tab on the right hand side of the Document Viewer. Under Colors, through the pull-down menu, choose Clustal. Make sure the options- Graphs, Consensus, Highlighting, wrap, Show names, Show (description), Show sequence numbers are selected. The Green colour in the Consensus sequence indicates that the base at that particular position is the same in all aligned sequences. The mismatches in the database sequences are identified by highlights. In this example, the first (or top) match – under the ‘Query Centric View’ have generated an exact match with a Genbank record. If you see the other matches on this list – you will notice although most of the bases are the same as in the query sequence, but there are many differences as well. Now go back to the Hit Table view and select the top match that was identical to the query sequence. Select the Download tab in Document Viewer and then click Download to get the full document. This will download the complete Genbank file. Read the information under ‘Info’ tab. If you click on the hyperlink View GenBank record on NCBI website, it will open a page in your web browser; using the information from this page answer the following question.

Question 1.1.1: From the closest Genbank match to the query sequence what is the: Genus name _Raphus__ (e.g. Homo sapiens) (Hint: Names for Genus and Species are under ORGANISM Species name __cucullatus___ Accession number ____AF483338.1 ___________________ (this is a unique number assigned to every DNA sequence in Genbank) Name of gene _______cytb CDS________________________(look under ‘gene’ Is the origin of the DNA nuclear or mitochondrial (circle correct answer)

1.2 Investigating the taxonomy of the organism. Click on the NCBI/Taxonomy tab (located in the Source panel) and search for the genus and species names you recorded above. This information yields the entire taxonomic lineage of the species (phylum, order, family etc.). Under Text View click on the Lineage (full) link, it will take you to the NCBI website – on this website under the External Information Resources INCBI LinkOut) heading click on the species name corresponding to the “Animal Diversity Web”: the link will take you to the Animal Diversity Web page; answer question 1.2.1: Question 1.2.1 What is the common name for this species? __ Dodo birds________________ Where was this species found? _____________Mauritius__________________ What is the estimated body mass of this species? ____ 13000 to 23000 g_____________

1.3 Searching Genbank for further sequences from the same species. Click on the NCBI/Nucleotide tab (located in the Source Panel of Geneious) and search for the genus and species names you recorded earlier in 1.1.1. From the results of your analysis answer Question 1.3.1 Question 1.3.1 How many Genbank entries are there for this species ___84_____ What is another gene present in Genbank (a single gene entry) for this species ___ _________

Select the cytochrome B sequence and then click on the text view tab in the Document Viewer (this changes the view from the sequence view option).

Under the text view tab, you will notice a publication is listed – this is the original paper that described this Genbank sequence. You can have a read of the paper’s abstract by going to: www.sciencemag.org/cgi/content/full/295/5560/1683 The authors of this paper deposited the sequence on Genbank. When you publish a DNA sequence, it is a requirement to deposit the DNA sequences onto Genbank so that other researchers can access them. Read the first paragraph of the paper – it will give you a little perspective on why researchers conducted this research. If you can’t download the paper through the given link, you may get it from Bioinformatics Exercise resources on LMS. To check that you have read this paragraph answer Question 1.3.2 Question 1.3.2 What is the presumed closest relative of Raphus cucullatus: What was the name of the “first” author on this paper? _____ _______ (first name)

____ ______________ (surname)

1.4 Genome searching: So far in this exercise you have dealt with only single genes. However, whole genomes exist on Genbank - these are very valuable tools in bioinformatics. A nuclear genome is approximately 3 billion bases in size and would be difficult to visualise. A mitochondrial genome is ca. 16,000 bp in size and is circular (similar to a plasmid). Here you will learn to download sequences based on accession numbers. Under the NCBI nucleotide (in Source panel) type DQ316067 (Genbank accession number of Mammuthus primigenius mitochondrial genome) and execute the search. Click on the Mammuthus primigenius file; you will see the sequence in the Document Viewer. “Drag and drop” this sequence into your bioinformatics folder. Select your file ‘DQ316067’ and select ‘Sequence View’ in the document viewer. Zoom in/out until you can visualise the entire genome. Find the ‘Annotations and Tracks’ tab (on right-hand side in Document Viewer) and select only Gene, rRNA and tRNA; it will enable you to see the location of genes on the mitochondrial genome. Information for various gene names can be found under ‘Annotations’ tab at the top of the ‘Document viewer’. Answer question 1.4.1. Question 1.4.1 The exact length of the mammoth mitochondrial genome: _ 16842_ base pairs The names of the rRNA genes___16S rRNA, 12S rRNA___ The number of protein encoding genes______13________________________ The number of tRNA genes___22__

EXERCISE 2: IMPORTING, ALIGNING AND BUILDING TREES USING GENEIOUS This exercise aims to familiarise you with importing DNA sequences, aligning them and then analysing the output. The DNA sequences you will be using in this exercise originate from various species of Bears. Before you begin, you should try and become familiar with the names and locations of the bear species that you will be analysing. List of the Ursidae (bear) family: GIANT PANDA (Ailuropoda melanoleuca) MALAYAN SUN BEAR (Helarctos malayanus) SLOTH BEAR (Melursus ursinus) ASIATIC BLACK BEAR (Selenarctos thibetanus) SPECTACLED BEAR (Tremarctos ornatus) BLACK BEAR (Ursus americanus) POLAR BEAR (Ursus maritimus) BROWN BEAR (Ursus arctos) Log in and open the LMS BIO282 site – Click on ‘cytB genes’ to download the sequences of 8 bear species – download this to the desktop. Drag the downloaded file ‘CytB genes’ and drag and drop into your bioinformatics folder in Geneious. Once the data has been imported, you will find 8 files corresponding to the 8 bear species listed above. Because you have the same gene from different bear species, you can directly compare them first by aligning them and secondly by modelling the DNA sequences (building a phylogenetic tree).

2.1 Aligning Bear DNA sequences. Select all 8 bear sequences simultaneously.

Once the files have been selected, you will note that the sequences are not aligned even though all the sequences are from the same mitochondrial gene. Geneious can do this alignment for you. Click the Align/Assemble tab (in Toolbar) and selecting the Multiple Align option. In the popup window select the Geneious Alignment option. The program will prompt you with many options for the alignment. These options are the algorithm that the software uses to fit the data i.e. should a gap be inserted at any given position? Use the following alignment options: Select ‘Automatically determine direction (slower)’ Alignment type: Global alignment with free end gaps Cost matrix: 93% similarity Gap open penalty:100 Gap extension penalty:1 Refinement iterations: 0 Unselect ‘Build guide---’ and ‘Create an alignment---’ And then select ‘OK’ Once the alignment is complete, you may visualize the alignment in different ways and using different colours. You can do so by altering some of the options on the panel to the right of the alignment. Using the statistics tab (% symbol, located at the bottom option of the panel to the right of the alignment) answer Questions 2.1.1. Questions 2.1.1 What is the total length of the alignment: _ 1,140_ base pairs What is the %GC content of the alignment? _ 43.9__ %

2.2: Phylogenetic reconstructions:

If

As you scan the alignment you have constructed you should notice that the DNA sequences are similar, but not identical. you look closer at some of the nucleotide differences, you will probably be able to see that some bear sequences are more closely related than others.

Rather than “eyeball” the sequence to guess evolutionary relationships, it is possible to utilize the changes in DNA to model DNA sequences statistically. This modelling is known as phylogenetics. In this exercise, we will build a phylogeny of the Ursidae family (bears). Select the alignment that you generated and click on the tree icon in the Toolbar menu. Geneious will prompt you for some options. These options alter the way that the program models DNA sequences on a tree - today you are building a very simple tree. Under genetic distance model select: HKY and for the tree building method select UPGMA. Leave the other boxes unchecked then click OK.

Once Geneious has finished constructing the tree, select the Tree View tab (on top of the Document Viewer panel). This is a phylogenetic reconstruction of the bear alignment you made. A few things you should know about phylogenetic trees: 1) A tree summarises the relatedness of all taxa (i.e a family tree). 2) The branch lengths represent evolutionary distances. Longer branches mean less sequence similarity. From the phylogenetic tree you have constructed answer Question 2.2.1 Question 2.2.1 Which two bear species (common names) in your phylogenetic tree are most closely related: ___ursus arctos____________ and _____ursus maritimus___ Which bear species (common name) is the most basal (at the base) on the phylogenetic tree? _____Tremarctos ornatusa__________________ Given that S. thibetanus has an Asian/Russian distribution and U. americanus an American distribution what geographical “feature” may have caused the speciation “event” in the common ancestor of these bears.______oceans_________

2.3 Finding the Closest Living Relatives of an Extinct Bear You have just constructed a tree of all the extant (living) bears there are also two bear species that have gone extinct in the past 20,000 years. There is a Genbank record for the mitochondrial cytochrome B gene of the extinct cave bear (Ursus spelaeus). This DNA sequence was isolated from a fossil bone – the retrieval of “old” degraded DNA is known as ancient DNA. It is technically challenging, as the DNA is degraded into small pieces (typically 200-300bp). The goal of this exercise is to find the cytochrome B DNA sequences for Ursus spelaeus and find out what its closest living relatives are by integrating it into your existing bears' phylogenetic tree. Go to the NCBI/Nucleotide search (in the Source panel) and input: Ursus spelaeus cytochrome B and click the search button. By doing this, you are searching Genbank for a species and gene name which is often the easiest way to locate DNA sequences. Select the DNA file that has cytochrome B gene sequence (Genbank accession number AF264047) and “drag and drop” it into your CytB folder – this will enable you to compare it to the other bears' sequences. Check that the file now appears in this folder. One of the columns on sequence summary window is labelled “name”. At present, the Genbank number AF264047 has been inputted for this newly added sequence. Right click on this name and select edit names - change it to Ursus spelaeus. This will ensure that the species name appears in your tree. Select all 9 bear sequences simultaneously; 8 old sequences and one new Ursus spelaeus sequence you have just imported. Click the alignment button and perform the alignment (as done previously – with same parameters in section 2.1). Once alignment is complete build a new tree (as before). View the new tree and

from the results of your analysis answer Question 2.3.1 Question 2.3.1 What are the closest living relative(s) of the extinct cave bear? ursus arctos and ursus maritimus

2.4: Finding ORF’s and translating a DNA sequence. The DNA sequence of Ursus spelaeus that you just downloaded codes for a protein called cytochrome B that plays a role in mitochondrial function. In this exercise, we are going to translate the DNA sequence and use the protein sequence to query a protein database Select The Ursus spelaeus record that you downloaded previously. Click on the ‘Live Annotate and Predict’ tools button (on right hand side of ‘Document Viewer’ ) and select the find ORF option. ORF stands for open reading frame – by conducting an ORF search you are asking Geneious to locate what could be protein-coding regions within this DNA sequence. Under ‘Find ORFs’ make the ORF size 300 and for the genetic code select: vertebrate mitochondrial. Also, check the box to include interior ORF’s; click apply to save the ORF search. Geneious will generate a series of annotations with the ORF’s shown. From the results of your analysis answer Question 2.4.1 Question 2.4.1 How many ORF’s longer than 300 bases are in this sequence? Answer: ___35______ Look at the nucleotide codon (3 bases) at the start of each ORF, do they all start with ATG? If not, why? No because it has other possible start condons.

Select the Ursus spelaeus DNA file once again. If you see all the ORF, you generated in previous section, you can remove them by pressing ‘Delete Applied’ (next to Document Viewer’ under ‘Live Annotate and predict’). Click on the translate button (located just on top of the ‘Document viewer’ window). The genetic code for this translation is: vertebrate mitochondrial – select this option. Translate the entire sequence setting the translation frame to “1” – meaning that the translation will begin at the first nucleotide. Click OK, and a new file should have appeared that contains the cytochrome B protein sequence for Ursus spelaeus. To check that you have completed the translation correctly answer Question 2.5.2. Question 2.5.2 What are the last 4 amino acids in the sequence?____Tryptophan, lysine, leucine, Asparagine____ What is the length of the cytochrome B protein? ____1140______

In the same way that it is possible to search Genbank with a DNA sequence, it is also possible to search with a protein. Select your newly translated protein sequence and click on BLAST (you should ensure that the blastp option is selected – which queries the protein database) and click search – this may take a few minutes. Answer Question 2.5.3 Question 2.5.3 Which bear species (Other than Ursus spelaeus) has the closest protein sequence according to the blastp search ______________________ and ________________________ Does this result agree or disagree with the phylogenetic tree that you constructed earlier? Agree or Disagree (circle one).

EXERCISE 3: DESIGNING A PCR ASSAY. You have extracted some ancient DNA from a cross section of a Mammoth tusk (pictured) – 14C dating of the ivory demonstrated that the tusk is 18,500 years old and has intact biomolecules (DNA and protein). In this exercise, you will design a polymerase chain reaction (PCR) assay that will amplify a piece of the mammoth mitochondrial genome that you downloaded earlier. If you are not confident about what PCR then visit the following website for a quick refresher course: http://en.wikipedia.org/wiki/PCR 3.1: Designing a PCR assay: The design of PCR primers is relatively simple from a conceptual point of view: search along a sequence and find short sub-sequences that fit specific criteria. However, the molecular biology of PCR is very complex, and the design of primers is best accomplished with the aid of co...


Similar Free PDFs