BCMB30002 notes PDF

Title BCMB30002 notes
Course Bachelor of Science
Institution University of Melbourne
Pages 109
File Size 6.9 MB
File Type PDF
Total Downloads 10
Total Views 149

Summary

Notes from 2017 ...


Description

Functional Genomics Lectures 1-7

Lecture 1 What is functional genomics? ● Functional genomics is finding the function of genes and proteins, including when they are expressed ● Utilises the data produced by genomic and transcriptomic projects to describe gene functions and interactions. ○ Genomics = finding out what the sequence of a gene ● Focuses on gene transcription, translation, the regulation of gene expression and protein interaction ● Genome wide approach - generally involves high throughput methods as opposed to the “traditional” gene by gene approach ● Understand biological properties at an organism or cellular level How do we conduct experiments in functional genomics? ● Can analyse by sequence to find function, or infer function from when and where the gene is expressed at what level (expression analysis) ○ Conserved sequences; see if the function has similarities to other genes (primary) ○ Infer function based on the similarity it has to another domain in another gene/protein (3D structure - tertiary) ● Gene to function ○ Mutate a gene and see the impact on the function ● Function to gene ○ Observe function of mutated organism; determine which gene has been mutated ● Disease to gene ○ Compare genetic mutation of diseased individual to a healthy individual Why is functional genomics often focussed on model organisms? ● Model organisms are chosen to represent types of life. Most model organisms have had their entire genomes sequenced. ● Benefits of model organisms: ○ Too many species to study them all in detail ○ In depth knowledge of one species provides tools and knowledge to gain further knowledge ○ Some organisms provide unique opportunities to study phenomenon ○ Assumes most organisms share common molecular mechanisms ● Own reading: ○ Disease models: react to disease or treatment in a way that resembles human physiology ■ Isomorphic: shares same symptoms and treatments ■ Homologous: shares same causes symptoms and treatments ■ Predictive: make predictions about mechanisms of a set of disease features

Lecture 2 What is the history of genome projects?

● Why are genomes sequenced? ● Discover the molecular basis of life ○ Organisms are determined by their DNA ○ Facilitates a broader approach to biology ○ Enables powerful new technologies; functional genomics often utilises high throughput sequencing methods ● Discover the molecular basis of disease ● However, not necessarily hypothesis driven; interpreting foreign data How are genomes sequenced? ● Sanger sequencing ○ Low throughput due to poor primer binding ○ Larger sequences have poor sequencing traces ○ Unable to sequence an entire genome; DNA is broken into smaller fragments and then randomly sequenced. These fragments are then reordered based on the overlap of sequences.

What are the differences between hierarchical genome assembly and whole genome shotgun assembly?

● ●



Terminology: ○ Coverage: number of reads that include a given nucleotide in a reconstructed sequence ○ Shotgun sequencing: DNA is broken up randomly into numerous small segments, which are sequenced using the chain termination method to obtain reads. Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing. Computer programs then use the overlapping ends of different reads to assemble them into a continuous sequence. ○ Assembly = putting the picture back together ○ Longer read length = larger pictures ○ High fold coverage = take many more pictures The primary difference between the two is that hierarchical genome assembly clones the random mechanically sheared DNA fragments into a vector, forming a low resolution physical map of the genome. ○ Since the genome has been amplified, and these genomes have been sheared at random, the fragments theoretically should have all different ends. ○ With enough coverage this means that we are able to build a scaffold



How are genomes annotated?

● ●

All chromosomes are manually annotated Annotation of the genome is distributed throughout the research community

What are the components of the genome?

Lecture 3 What kind of biodatabases are there? ● Gene and protein databases ● Genome databases ○ Structure databases ● Enzyme databases ● Metabolic databases ● Mutation and polymorphism databases ● Experiment databases ● These databases allow: ○ Collection of data for gene of interest e.g. aa sequence, location on chromosome, exon/intron structure, manual and automated annotation, publications etc ○ Search for similar genes and infer function ○ Post information for other researchers ○ Conduct large-scale analyses of many genes simultaneously What is gene prediction? ● The input of raw genomic data and output of the expected protein sequence ● Initially done manually, now an automated process through gene finding software ● Utilises open reading frames and the intron splice junctions ● Combines evidence with ○ Similarity to other known genes; allows us to make inferences to common functions, role of protein domains, possible structure and evolutionary relationships ○ Mapping of transcripts (EST/cDNA libraries) ○ Mapping of protein sequence data ○ Promoter sequences ○ DNA secondary structure predictions What is an orthologue? Paralogue? Homologue? ● ● ● ● ●

Identity - sequence is the same Similarity - a quantitative measure of how alike sequences are Homologue - share ancestral gene Orthologue - separated by speciation event Paralogue - separated by gene duplication event

How can sequence similarity be quantitated? ● Used to be quantitated by hand - utilised pair-wise sequence alignments ● Longer amino acid sequences are now aligned using software. Most common software is BLAST (basic local alignment search tool) ● Own reading: ○ BLAST reads the sequence in “words”; usually three letter fragments. ○ The algorithm then locates all the common three letter words between the sequence of interest and the hit sequence. ○ After making words of the sequence of interest, the rest of the words are assembled. ○ All these words must have a certain degree of sequence similarity before they are compared to sequences in the database. If a word is not similar enough, it is not included in the sequence that is compared. How are substitution matrices used? ● Substitution matrices are used to define which amino acids most frequently substitute for one another when comparing proteins

● ●

○ Best score is “0” - degree of deviation from 0 indicates how well it matches or not Blast output ○ E value = parameter that describes the number of hits one can expect to see by chance when searching a database of a particular size ○ Score for how good the sequence similarity is - takes into account length of matches, substitution matrix etc

Lecture 1-3 questions 1. How do we use sequence analysis to infer function? How is this different from expression analysis? 2. How are model organisms chosen? Why are they important in studying 3. What are the benefits of E.coli as a model organism? What is it modelled after? 4. Describe the process of Sanger sequencing. Why is it consider a “low throughput” method? 5. What is the difference between hierarchical shotgun sequencing and whole genome shotgun sequencing? Why is hierarchical shotgun assembly said to be better for establishing new genomes relative to whole genome shotgun assembly? 6. Why is it important to have a high fold coverage? 7. What percentage of the nuclear genome is extragenic and genes and gene related sequences? Of the gene related sequences what percentage is coding or non coding? 8. What features are used to look for genes? How can we use this information to predict gene function? 9. What is the difference between an orthologue, paralogue and a homologue? 10. How do BLAST algorithms quantitate sequence similarity? (define “word”) 11. How are substitution matrices used?

Lecture 4 What are multiple alignments? ● It is often assumed that the sequences have an evolutionary relationship and share a common ancestor e.g. homologue ● Can be depicted visually to show the mutation events such as single amino acid or nucleotide changes ● Useful for large scale comparisons of multiple sequences ● Can highlight structurally important residues however can often only be interpreted in the light of known structures ● Assess sequence conservation of protein domains, tertiary and secondary structures ○ E.g. compare all known ATP binding proteins to establish similar sequence region in common, align abc1 genes of 50 individuals to see which mutations correlate with diabetic individuals ● Method: progressive alignment e.g. ClustalW ○ Generate pairwise alignments for all possible pairs using either gap penalties and either substitution matrix or simple identity scores ○ Start multiple alignment with most closely related pairs ○ Progressively build up by adding on gradually more distant sequences ○ Quick but may produce suboptimal results Why catalogue protein domains? ● Terminology ○ Protein domains: Conserved part of a given protein sequence and tertiary structure that can evolve, function and exist independently of the rest of the protein chain. ○ Consensus sequence: Represents the results of the most frequent nucleotide/amino acid residues found at each position in a sequence alignment ● Domains are often catalogued as they are useful to make alignments of domains of known and unknown function ○ Databases e.g. pFam, Interpro ● When alignments are represented as a consensus sequence we are able to predict the functions of proteins with unknown function

When do we make a phylogenetic tree? ●





Trees are useful to categorise a gene when domain scanning or sequence BLASTing is insufficient to categorise a gene. Possible because: ○ Functional domains that are known in annotated sequences can be used for alignment in non annotated sequences ○ Conserved regions known to be functionally important can be found ○ Therefore multiple sequence alignments can be used to analyse and find evolutionary relationships through homology between sequences. 3 different ways of making a tree: ○ Distance: tree with minimum branch length is preferred e.g. sequences paired off based on degree of similarity (simplistic) ○ Maximum parsimony: tree represents the fewest evolutionary changes (difficult to find the “best” tree) ○ Maximum likelihood: uses flexible statistical models of evolution to assign probabilities assigned to various trees (computationally expensive) Inference of trees is limited by: ○ Assumptions regarding how a sequence evolves ○ Incomplete sampling of life’s current diversity ○ Sampling of only extant species

How is bioinformatics used in system biology “omics”? ● E.g. Microarrays - collection of microscopic DNA spots attached to solid surface. Measure the expression levels of large numbers of genes simultaneously. ○ mRNA is reverse transcribed into cDNA ○ The cDNA is then associated with a dye ○ This cDNA is then hybridised with the microscopic dots that have been printed onto the glass slide. ○ The genes currently active in the cell can then be identified by the level of fluorescence and the colour of the spots. ● Since microarrays produce many light signals, bioinformatics is crucial for sorting the signals from the noise and grouping the tissues that have similar expression patterns. ● Can also be used to group together genes that are coordinately regulated; can identify novel genes in a biological process.

Lecture 5 What is comparative genomics? ● The field where genomic features of different organisms are compared ○ Allow us to understand broader biological processes, relationships between species, annotate the human genome, and identify disease gene homologues from model organisms Comparative genomics and genome regions under selective pressure ● Has evolutionary principles - there is selective pressure to conserve the function of a sequence e.g. exons, gene control regions, start points of DNA replication, genes encoding RNAs that do not encode proteins ● This conservation is referred to as synteny - the conservation of an inherited region of genetic material between two different organisms. ○ Usually identified by observation that two organisms possess a block of genes in conserved order ● Britannica definition of synteny: ○ Genomic sequencing and mapping have enabled comparison of the general structures of genomes of many different species. The general finding is that organisms of relatively recent divergence show similar blocks of genes in the same relative positions in the genome. This situation is called synteny, translated roughly as possessing common chromosome sequences. For example, many of the genes of humans are syntenic with those of other mammals—not only apes but also cows, mice, and so on. Study of synteny can show how the genome is cut and pasted in the course of evolution. ● Comparative genomics as a genome reannotation tool ○ E.g. extensive comparative data available for funghi; well characterised telomeres, centromeres and repetitive elements. Is obvious when one of these features has been incorrectly annotated. Effectiveness of comparing organisms is dependent on how closely related they are ● Too distant: hard to recognise conserved functional elements ● Too close: conservation of any feature may be by chance rather than through preservation of a functional element ○ E.g. Organisms that are too closely related will share much of the same sequence including the introns. Organisms that are too distantly related will not share any similarities, even in the exons. ○ “Ideal” spot: conservation in the exons, some variation in the introns Example 1) Rodent genomes as a model organism for human biology ● 90% of mouse and human genomes correspond to regions of synteny ● 99% of genes are orthologues ● As a consequence was able to identify millions of SNPs; markers for disease ● Almost all human genes known to be associated with disease have orthologues in the rat genome

Example 2) Using the platypus genome to understand basis for species similarities/differences

● ● ●



Informative variation - differences and similarities between humans to search for human/platypus genes Search for the sequence that turns certain genes on and off Genome sequence revealed that the platypus has both reptilian and mammalian features even at a genomic level e.g. density and distribution of repetitive sequences, non coding regions are expanded in the Therian mammals

Lecture 6 Why study genetic diversity? ● Many of the differences between individuals is down to genetic differences ● Helps us to understand human origins and prehistoric human movements as well as susceptibilities to diseases What is human genetic diversity? ● Total number of different genetic characteristics in a species ● Genetic variations between individuals is due to: ○ Insertions and deletions ○ SNPs ○ Epigenetic differences e.g. histone code variation How can SNPs help us understand genetic diversity? ● SNPs: single nucleotide polymorphisms. Sites in the genome where two or more alternative choices of a nucleotide are common in the population. ● Comparison of genome reveals differences at some sites, with some sites more variable than others. Do not change the phenotype but are often co-inherited with other mutations that may lead to disease ● Therefore SNPs are known disease susceptibility markers What are some of the methods used for detecting genetic differences between individuals? ● Some methods suit only detection of SNPs; some detect only large scale rearrangements, some detect both, some detect heterozygosity, some are more sensitive/industrialised ● Cytogenetics ○ Microscopic examination of chromosomes; only good for large scale arrangements ● Restriction enzyme based methods e.g. RFLPs (restriction fragment length polymorphism) ○ Exploits the difference between samples of homologous DNA molecules from different RE sites ● PCR based methods ○ e.g. AFLP-PCR: REs digest genomic DNA, adaptors are ligated to the sticky ends of the restriction fragments. ○ A subset of restriction fragments are selected to be amplified using primers that are complementatry to the adaptor sequence. ○ Amplified fragments are separated and visualised via SDS page ● Hybridisation methods (Microarray technologies) ○ Affymetrix GeneChip ○ Illumina BeadChip array ● Direct sequencing ○ Resequencing small target regions (present technology) ○ Resequencing whole genomes (near future)

Example 1) Hybridisation array approach to detect genetic variants ● “Spots” of DNA, many of which are replicates. ● Each spot contains multiple copies of the same ssDNA. ● Each spot corresponds to one SNP. ● The DNA to be tested from the subject is labeled with a fluorophore; this labeled DNA binds with its reverse complement in the spot. Is said to be hybridised to the microarray ● Fluorescence intensity is detected at each spot; thereby identify the SNPs in the subject DNA ● Microarray approaches can be used to detect presence of sequence change, or copy number variant by measuring decreased hybridisation of spots ● Can also use microarray approaches to detect exact sequence change by having lots of spots corresponding to all possible or likely sequence variants → requires lots of DNA samples; many spots required on the microarray

What are the opportunities and limitation of next generation sequencing technologies for discovering disease genes? ●

Ion torrent sequencing ○ 2-3 hour run time ○ Current read length of 400bp; 100Mb to 1000Mb per run ○ Not good for long (>7) repeats of a single base ○ Good for resequencing, however some problems with assembling new genomes because of short reads and poor repeat handling



Pacific Biosciences Sequencing ○ Less total data per run than Ion Torrent, Illumina, high error rates in some applications ○ Produces much longer reads; easier and earlier assembly of genomic DNA with a possibility to sequence through entire mRNA molecules.

Gene Expression & Chromosomes Lectures 8-13

Lecture 8 Sequence analysis and its relationship to gene function ● Sequence analysis allows us to explore whether a protein is homologous to other proteins, has any functional motifs, or if it is an evolutionarily conserved region ● Example 1) BRCA 1 gene ○ Tumor suppressor gene found in all humans; repair damaged DNA or destroy cells if DNA cannot be repaired ○ BRCA1 mutation can increase the risk of breast cancer ○ Sequence has been conserved across the mouse and human genome



Above are the clues that point us to the BRCA1 gene being a tumour suppressor: can associate with other proteins via the RING finger domain, is localised in the nucleus and has a function in DNA repair

Methods and techniques for the subcellular localisation of proteins ● Eukaryotic cells are elaborately subdivided into functionally distinct membrane-bound compartments; finding out which compartment it is localised to allows us to gain further knowledge with regards to the function of that protein ● Technique 1: Antibody to the protein ○ Purify the protein and then immunise animals using this protein. ○ Slow and difficult process; protein may not be immunogenic ● Technique 2: Tag recombinant protein ○ Tag the protein either using the antigenic epitope or using fluorescent protein ○ 2.1) Antigenic epitope ■ Short peptide sequences which are chosen because high affinity antibodies can be reliably produced ■ Usually derived from viral genes ■ Cell dies during the process; immune reaction will kill the cell



2.2) Fluorescent protein ■ GFP (green fluorescent protein) is isolated from jellyfish; fluorescence can be viewed with UF light ■ Tag is inserted the same way the epitope tag is inserted ■ Used to view live cells; contrast to epitope tagging

Design experiments that uncover protein interactions in vitro and inside cells ● Technique 1: Yeast 2-hybrid analysis


Similar Free PDFs
BCMB30002 notes
  • 109 Pages
Notes
  • 18 Pages
Notes
  • 12 Pages
Notes
  • 61 Pages
Notes
  • 35 Pages
Notes
  • 19 Pages
Notes
  • 70 Pages
Notes
  • 6 Pages
Notes
  • 35 Pages
Notes
  • 29 Pages
Notes
  • 70 Pages
Notes
  • 6 Pages
Notes
  • 19 Pages
Notes
  • 32 Pages
Notes
  • 28 Pages