Bioinformatics Exercise PDF

Title Bioinformatics Exercise
Author Mariem Salcedo
Course Biochemistry Laboratory
Institution Queens College CUNY
Pages 15
File Size 1.6 MB
File Type PDF
Total Downloads 104
Total Views 186

Summary

Download Bioinformatics Exercise PDF


Description

Chem376 Bioinformatic Exercise Mariem Salcedo Prof. Abeyweera 09/29/20

A.

Databases for the Storage and “Mining” of Genome Sequences

1.

Finding Databases.

BLAST: Basic Local Alignment Search Tool. It finds region of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance. Taxonomy: The branch of science concerned with classification, especially of organisms; systematics. Gene ontology: It’s a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. Phylogenetic tree: A phylogenetic tree or evolutionary tree is a branching diagram of “tree” showing the evolutionary relationships among various biological species or other entities, based upon similarities and differences in their physical or genetic characteristics. Multiple sequence alignment: It’s a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. 2.

Analyzing a DNA Sequence.

a.

What is an open reading frame (ORF)? It’s a portion of DNA molecule that, when

translated into amino acids, contains no stop codons.

Chem376 Bioinformatic Exercise Mariem Salcedo Prof. Abeyweera 09/29/20

b.

What is a reading frame and why are there six? A reading frame is a way of dividing the sequence of nucleotides in a nucleic acid molecule into a set of consecutive, non-overlapping triplets. There are six possible reading frames for every region of DNA, three in each direction.

Chem376 Bioinformatic Exercise Mariem Salcedo Prof. Abeyweera 09/29/20

MRIILLGAPGAGKGTQAQFIMEKYGIPQISTGDMLRAAVKAGSELGLKAKEIMDAGKLV TDELVIALLKERITQEDCRDGFLLDGFPRTIPQADAMKEAGIKVDYVLEFDVPDELIVERI VGRRVHAASGRVYHVKFNPPKVEDKDDVTGEELTIRKDDQEATVRKRLIEYHQQTAPLV SYYHKEADAGNTQYFKLDGTRNVAEVSAELATILGc.

What does BLAST stand for? Basic Local Alignment Search Tool.

The protein is adenylate kinase. [Yersinia pseudotuberculosis complex]. The source is Sequence ID: WP_002208600.1 3.

Sequence Homology.

a.

Homolog: having the same relation, relative position, or structure.

Ortholog: Any of two or more homologous gene sequences found in different species related by linear descent. Paralog: either of a pair of genes that derive from the same ancestral gene. Yeast: Adenylate kinase [Beauveria bassiana D1-5] Score: 384. E-Value: 2e-134

Human: adenylate kinase 2, mitochondrial isoform b [Homo sapiens]

Chem376 Bioinformatic Exercise Mariem Salcedo Prof. Abeyweera 09/29/20

Are the yeast and human sequences homologous to the Yersinia pestis sequence? Yes, they are, the yeast is very close to the original protein sequence. b.

What is the difference between an identity and a conservative substitution?

An identity scores a value of 1 for every time two amino acids or DNA base pairs align, and zero at all other times. Conservation is not taken into account. Conservative substitution replaces one amino acid with another that is similar in size and chemical properties. Such conservative amino acid substitutions may have minor effects on protein structure and can thus be tolerated without compromising function. This leads to a need to find a better way to score the similarity of sequences. For example, Adenylate kinase [Beauveria bassiana D1-5] has identities: 187/214(87%) from the original sequence. c. Substitution matrix: A substitution matrix is a large scoring guide to tell you how to rank the similarity of two amino acid sequences. A large positive score corresponds to

Chem376 Bioinformatic Exercise Mariem Salcedo Prof. Abeyweera 09/29/20

a substitution that occurs frequently. A large negative score corresponds to a substitution that occurs rarely. 4.

Plasmids and Cloning.

a.

What is the abbreviation for this enzyme? Rma43812

What is the recognition sequence for this enzyme? PI-Rma43812IP What is the 5'-3' recognition sequence for this enzyme? 5’-C/TA-G-3’ 3’-G-AT/C-5’ b.

What is a plasmid? A genetic structure in a cell that can replicate independently of

the chromosomes, typically a small circular DNA strand in the cytoplasm of a bacterium or protozoan. Antibiotic target from pBR322: 2987..3847 /note="Amp resistance" pBR322 sequence in FASTA format:

Chem376 Bioinformatic Exercise Mariem Salcedo Prof. Abeyweera 09/29/20

c.

What is the size of the pBR322 plasmid in number of base pairs? 4362 base pair. How many cut sites are there for the restriction enzyme HaeIII on pBR322? 22 cut sites.

d.

Blunt ends: A straight cut, down through the DNA that results in a flat pair of

bases on the ends of the DNA. Sticky ends: staggered ends on a DNA molecule with short, single-stranded overhangs. For example:

Chem376 Bioinformatic Exercise Mariem Salcedo Prof. Abeyweera 09/29/20

e.

How many pBR322 fragments did “all” the enzymes generate? 3181 fragments.

f.

How many fragments are obtained with AvaI? 4 fragments

What is the size of the restriction site for AvaI?

g.

How many pBR322 fragments are produced when the three different enzymes

(BamHI, AvaI, PstI) are combined? How large are the fragments? 3 fragments for AvaI only. BamHI and PstI weren’t in the enzyme list option. h.

Restriction map and enzymes/sites and fragments.

Chem376 Bioinformatic Exercise Mariem Salcedo Prof. Abeyweera 09/29/20

i.

Enzyme with 10 fragments from pUC18.

B.

Using Databases to Compare and Identify Related Protein Sequences.

1. Obtaining Sequences from BLAST.

Chem376 Bioinformatic Exercise Mariem Salcedo Prof. Abeyweera 09/29/20

Does the NP_000356.1 entry represent a human ortholog of rabbit muscle triose phosphate isomerase? Yes, it represents a human ortholog of rabbit muscle triose phosphate isomerase. What is the percent identity between the two enzymes? 98% 2. Multiple Sequence Alignment.

How many identical residues did you find? 68 Polar: G (14) + N (2) + Q (3) + T (2) + S (4) + H (2) + C (1) + Y (2) = 30 Non-polar: W (2) + P (2) + L (3) + A (6) + M (1) + V (7) + I (2) + F (1) = 24

Chem376 Bioinformatic Exercise Mariem Salcedo Prof. Abeyweera 09/29/20

Basic: R (5) + K (2) = 7 Acidic: D (2) + E (5) = 7 Do most of the “identities” fall into a single class of amino acids? Yes, the polar has 30 identities. Unknown Sequences: Protein: Unknown 1:

Unknown 2:

Chem376 Bioinformatic Exercise Mariem Salcedo Prof. Abeyweera 09/29/20

Unknown 3:

Multiple Sequence Alignment:

Chem376 Bioinformatic Exercise Mariem Salcedo Prof. Abeyweera 09/29/20

How many identical residues did you find? 47 Nucleotides: Unknown 1:

Chem376 Bioinformatic Exercise Mariem Salcedo Prof. Abeyweera 09/29/20

Unknown 2:

Chem376 Bioinformatic Exercise Mariem Salcedo Prof. Abeyweera 09/29/20

Unknown 3:

Multiple Sequence Alignment:

Chem376 Bioinformatic Exercise Mariem Salcedo Prof. Abeyweera 09/29/20

How many identical residues did you find? 565...


Similar Free PDFs