Lab1Bioinformatics Lab Report PDF

Title Lab1Bioinformatics Lab Report
Course Biology II
Institution University of Ontario Institute of Technology
Pages 8
File Size 310 KB
File Type PDF
Total Downloads 108
Total Views 145

Summary

prof: annette tavares...


Description

BIOL 1020 Lab 1

LAB 1 LAB REPORT BIOINFORMATICS

NAME: ID #:

CRN: 72821

TO BE SUBMITTED ANY TIME BEFORE THE END OF THE LAB!! YOUR LAB REPORT IS DUE AT THE END OF THE SCHEDULED LAB PERIOD AND AN ELECTRONIC COPY MUST BE SUBMITTED through the assignment submission on Canvas. Submissions after the end of the lab are going to receive a 10% late penalty per day!!!! It is recommended that you work on the lab report well before the due date and submit the report any time before the end of your scheduled lab to ensure that you do not incur any late penalties. You need to upload a pdf, doc and/or docx file as an assignment submission in Canvas. Here are the steps to upload a file as an assignment submission in Canvas. Pay close attention that you are submitting the correct assignment in the correct assignment folder.

1. 2. 3. 4.

Open Assignments. In Course Navigation, click the Assignments link. Select Assignment. Click the title of the assignment. Submit Assignment. Click the Submit Assignment button. Add File. ... 5. Add Another File. ... View Submission

Biological Problem #1: Imagine that you are working in a pathology lab and need to identify the bacterial species contained in a sample from a very sick patient. Once you know

1

BIOL 1020 Lab 1

what species they are infected with, the doctor will be able to recommend the appropriate antibiotic. You have purified the bacteria from the patient’s samples and extracted bacterial DNA from a single colony. You then performed PCR using primers that anneal to the region containing the 16S rRNA gene. You have sequenced the PCR product and you are now ready to identify the bacterium. As you will recall, you performed a similar exercise in your pre-lab assignment. Objective: BLAST of an unknown bacterial 16S rRNA gene sequence Step 1: You will use an unknown bacterial 16S rRNA gene sequences provided to you by your TA, during your synchronous lab. Step 2: Copy your assigned unknown bacterial sequence. Step 3: Open the BLAST website, http://www.ncbi.nlm.nih.gov/BLAST select nucleotide blast and paste your sequence into the large empty window labelled Enter Query Sequence. Your sequence is in FASTA format already, but click on the link to find out what FASTA format means. Step 4: In the pull-down database window called Choose Search Set, select nucleotide collection (nr/nt) since you want to compare your nucleotide sequence with all other nucleotide sequences in the database. Step 5: Hit the “BLAST” button and wait for your results. a) Which bacterial species is the patient most likely infected with? Escherichia coli b) What features of the Blast output influenced your decision? There were many factors such as a scientific name that was displayed as well as a 100% identity with the ribosomal RNA of the bacterium. Also, the E-value was 0.0, which represents the expectation of finding that specific sequence by random chance, and the lower the Evalue the better. c) Which sequence is the query sequence, and which one is the subject sequence? The query sequence was the unknown sequence inputted into blast and that would mean that the subject sequence is one of the sequences in the bank that the unknown sequence was compared to, which in this case would mean it is the known sequence of Escherichia coli. Step 6: Click on the hyperlink associated with your best Blast match to get to the GenBank record. d) What is the Accession Number of your best match? J01859.1 e) Who submitted this sequence? Ehresmann,C., Stiegler,P., Fellner,P. and Ebel,J.P.

2

BIOL 1020 Lab 1

f) Why is there no CDS (sequence coding for amino acids in protein) associated with this record? Since this was an RNA sequence, it has not gone through translation yet, so it is in a different language and therefore there is no amino acids to code, because the sample sequence is not a protein. g) If you were to perform the same blastn analysis with your bacterial sequence a year from now using a public database where sequences are constantly being added, would you expect to obtain the same Score? E-value? Explain. Yes, I would expect to receive the same score if the sequence being inputted is the same. Even if it is a public database and sequences are constantly being added, there was still a subject sequence that matched the query sequence perfectly, so much that it had a 100% identity. That is not to say that there might not be some other, new, and better matches, however the old matches are still just as relevant. The E-value should again, remain the same, there might be a miniscule increase, just because of the higher number of sequences, the likelihood of being chose randomly decreases so the E-value could have a small increase.

Biological Problem #2: You and your fellow colleagues in the pathology lab have sequenced the 16S rRNA gene sequence for a total of 5 bacterial species. You are curious to see how similar your 16S rRNA sequence is to the other 4. You have heard that multiple sequence alignments may provide some information about this and would like to give it a try. Objective: Multiple Sequence Alignment (ClustalW) of bacterial 16S rRNA gene sequences from 5 bacterial species. Step 1: Open the file posted in the lab folder “Bacterium Unknown Sequences 1 to 5 16S rRNA”. All of the 16S rRNA genes sequences you need have been provided for you. These are the same 5 sequences that you also used in your pre-lab assignment which are individually posted in the same folder. Copy all of the sequences. Step 2: Open the Multiple Sequence alignment link (ClustalW) http://www.genome.jp/tools/clustalw/ Paste your bacterial sequences into the window. In the pull-down window, select DNA since that is what you are aligning. Step 3: Click the Submit button and wait for your results to appear. a) What do you think it indicates when the nucleotides have a star? What do you think it means when they have a white space and no star? When the nucleotides have a star, it means that they match, are the same nucleotide. That would leave the white spots when the nucleotides for all 5 sequences are not the same. b) Would you estimate that all 5 sequences are quite similar (>90%) or not? Yes, I would estimate that all five sequences are quite similar. While looking at all the sequences compared to each other there is a lot of stars, which indicate a matching nucleotide. And at a quick glance approximately per row is about a maximum of 6 blank spaces, and overall, I would conclude they are pretty similar.

3

BIOL 1020 Lab 1

c) Which bacterial sequence appears to differ the most from the others? Click on “View Tree” at the top of the output page for a different visual representation. (Note that trees generated by ClustalW represent sequence similarity and are not necessarily intended to be interpreted as a tree indicating evolutionary descent). Bacterium 1 seems to differ the most from the other bacterial sequences meaning that it has the most differences in nucleotide bases matching. d) Does your bacterial sequence cluster with any other sequence? Briefly describe the relationship between your sequence and those of your colleagues. Bacterium 4 and 3 cluster together as well as bacterium 2 and the unknown sequence cluster together. This could mean that the bacteria in clusters are possibly part of the same clade or something similar. Also, the clusters of bacteria share more similarities with each other than with bacterium 1.

Biological Problem #3:

The ribulose bisphosphate carboxylase (Rubisco) protein is essential to carbon fixation in photosynthesis and is found in green algae and all land plants. Therefore, it is an ideal choice to establish phylogenetic relationships among green algae and land plants using nucleotide sequences. The gene sequence for the large subunit (rbcL) of the Rubisco protein has been isolated for Arabidopsis, Chara, Equisetum, Lilium, Marchantia, Pinus, Polypodium, Polytrichum and Zamia are available in GenBank and in the lab folder on Canvas named “Biological Problem 3 rbcL 9 sequences”. Use your knowledge and laboratory experience to develop a morphological phylogenetic tree that depicts the evolutionary relationships for the organisms listed above. This tree will serve as your hypothesis, which you will test using molecular data for the rbcL nucleotide sequences. Use the file “Biological Problem 3 pictures” to help you with the morphological phylogenetic tree

4

BIOL 1020 Lab 1

Step 1: Examine the organisms selected and determine which are charophytes, bryophytes, pterophytes, gymnosperms or angiosperms. a) What are the ancestral and derived characteristics of the major phyla of plants? Develop a hypothesis for which organisms will be more closely related to each other (compare their morphological and life cycle characteristics of land plants). Arrange the organisms most closely related to each other into clades, and which clades might share a common ancestor. Include your morphological phylogenetic tree. Clades and my hypothesis Charophytes: Algae, freshwater environment (chara) Bryophytes: Moss, small, flowerless, clumps or mats, non-vascular (Marchantia, polytrichum) Pterophytes: spores, no seeds/flowers, vascular (equistum, polypodium, zamia) Gymnosperms: produce seeds, more precisely, unenclosed seeds (pinus) Angiosperms: flowering plants (Arabidopsis, lilium) Since each clade seems to have somewhat evolved from charophytes and derived characteristic and grown off/built upon each other, this is what my tree looks like.

5

BIOL 1020 Lab 1

Step 2: You will be testing your hypothesis that the morphological tree accurately represents land plant phylogeny. You will use molecular data from the nucleotide sequences of rbcL. b) Do you think the molecular evidence will support (be consistent with) your hypothesis or falsify it? Write your predictions below. The molecular evidence should be consistent with my hypothesis; however, this was a hypothesis made on morphological features as well visual assumptions based upon basic characteristics these clades have. Step 3: Create a phylogenetic tree of the 9 plant species mentioned above using the rbcL gene using ClustalW. Use the sequences posted on lab folder on Canvas in the file named “Biological Problem 3 rbcL 9 sequences”.

a) Compare your molecular phylogenetic tree to your hypothesized morphological tree. Describe any similarities or differences and your thoughts on why there may be differences. Overall, the trees were similar. Yet there still were some differences in my hypothesis compared to the genetic information. The differences may just be because of how similar the plants were, how close the species were related, which could definitely impact my hypothesis. The most common differences were found in algae/moss type plants as well as some confusion between plants that produce spores and plants that produce seeds. These could be considered common misconceptions and have very similar visual features.

6

BIOL 1020 Lab 1

b) The two phylogenetic trees are supported by different types of evidence. What evidence was used to create the phylogenetic tree using bioinformatics in ClustalW? The evidence used to create the phylogenetic tree would have to be molecular data. The DNA sequences from different groups are compared to detect similarities and differences in nucleotide bases. And the tree is arranged by groups the percentage of matching bases along with other factors.

c) What types of evidence support your hypothesized “morphological” tree? The evidence used to form a hypothesized morphological tree was morphological evidence; the possible clades based on physical characteristics and overall visual similarities. d) What is rbcL, and why is it a particularly useful molecule for studying evolutionary relationships in plants and green algae? Rubisco is an enzyme involved in one of the first steps of carbon fixation, a cycle in photosynthesis which eventually turns carbon dioxide into glucose. And since it is an essential part of photosynthesis, and practically every plant and algae perform photosynthesis, so it is a good evolutionary link between them. e) What are the limitations in using rcbL to construct a phylogenetic tree? Since practically every single organism that performs photosynthesis has this enzyme it can be a common similarity, to plants/organisms, and does not really differentiate between them, it offers more of a base to start narrowing down for more intense comparison.

Applying Your Knowledge 1. Can you suggest reasons why a phylogeny based on molecular evidence and a phylogeny based on morphology and other evidence might not be exactly the same? Molecular evidence is overall quite different from any other evidence. This is because physical features can be detrimental to identifying possible evolution and related species. Some improper conclusions can be made based upon appearance and morphological evidence. Molecular evidence just provides a much more reliable source since is looks and the DNA sequences and compares the similarities and differences of bases. However, some physical features are more obvious such as flowering plants because assumptions of related clades and such can be made easily because for example the plant would have a flower and that is very distinguishable.

7

BIOL 1020 Lab 1

2. Zoologist worldwide are sequencing a mitochondrial gene CO1 (cytochrome c oxidase subunit), which is found in all animals and appears to be distinctive for each species. The sequence of nucleotides can be used as a universal DNA bar code. By comparing the CO1 DNA sequence for an animal to a growing database of DNA sequences, scientists can accurately identify any animal and also discover species not previously known to science. How might DNA bar coding, which uses molecular biology and bioinformatics, be useful in enforcing international laws for banning the import of endangered species? How might these approaches stimulate the study of biodiversity in remote areas? If the importation of species is properly monitored by qualified experts that could DNA bar code every species that is being imported, then definitely the import of endangered species could be monitored. If the scientists can accurately identify any animal, they should have the ability to tell which are endangered and if that specific animal is being imported it should be put to a stop. And since this is a large database, it should contain many animals and the region the tests are being performed should not matter. This can also help expand the knowledge of the biodiversity in more remote areas, especially if new species not previously know to science are being discovered. If this testing is being done whenever any species is being imported, it can help at least identify animals in a remote location.

8...


Similar Free PDFs