Prac 1 Writeup - jklkl PDF

Title Prac 1 Writeup - jklkl
Course Human Resources and Organizational Development
Institution Champlain College
Pages 14
File Size 530.3 KB
File Type PDF
Total Downloads 64
Total Views 148

Summary

jklkl...


Description

Genetics of Human Disease BIOL3204/6204 2021 Practical Class 1 To join the practical class on Thursday 25th February at 2pm via Zoom please click here Welcome to BIOL3204/6204. In this semester’s practical classes, you will be taken through the case study of a young girl named Naomi. Naomi was adopted by a couple, Sue and Jenny, when she was very young. Since her adoption, Naomi was diagnosed with, and was being treated for, a specific illness. Despite this treatment, Naomi’s illness suddenly got worse and unfortunately, she has recently died. Sue and Jenny, who are highly educated and naturally curious people, want to understand more about the disease and why Naomi died. Originally, they accepted Naomi’s diagnosis without question but now they are not so sure this diagnosis was correct. In particular, there was no genetic testing performed to confirm the diagnosis. The situation is complicated by the fact that the couple have no knowledge of Naomi’s birth parents. In addition, the doctor and counsellor who had been treating Naomi had been having an affair and were killed in a suspicious car accident while away together over a long weekend. Sue and Jenny are now finding it very difficult to access their daughter's medical records. The couple plans to approach the hospital where Naomi had been treated to find out more. Before they do, they want to be as well informed as possible so that they can ask the right questions and have the understanding to evaluate the answers they receive. You have been asked to figure out what disease Naomi has and learn as much as you can about the underlying genetic and molecular causes of this disease. In your assignments you will report back to her parents and answer the following questions: • • • • • • • • •

What is known about the gene that causes Naomi’s disease? [Molecular genetics] What kind of genetic variation causes Naomi’s disease? [Mutations and variation] What is known about the biological basis of this disease’s pathology? [Functional biology] How do we know whether a genetic variant does cause this disease? [Mutations and variation] What kind of genetic tests are performed to diagnose this disease? [Genetic Testing] Could Naomi's unknown ancestry have had implications for diagnosis and the effectiveness of treatment? [Genetic Testing] How do genetic and other factors affect the clinical presentation of this disease? [Factors that modify clinical presentation of the disease] What therapies/drugs are available for treatment of this disease? [Therapy and personalised medicine] How do these drugs work and could they have saved Naomi? [Therapy and personalised medicine]

In Practical 1 you will: 1. 2. 3. 4.

Read over the Doctor’s report (Page 3) from when Naomi first presented at hospital. The Doctor narrowed down Naomi’s diagnosis to three possibilities. Answer questions 1-3 to determine which diagnosis was correct. Use multiple online resources that are frequently used by geneticists and clinicians to learn what you can about Naomi’s disease Investigate what (if any) genes are involved in Naomi’s disease.

Case Notes:

Page 1 of 14

Name: Naomi Jane Doe

Admitted: 9:36 am 28th February 2010

Age: 12 (redacted, About -> Guidelines'. Use these guidelines, and the Guidelines for Formatting Gene and Protein Names found at the Bioscience Writers website (https://www.biosciencewriters.com/Guidelines-for-Formatting-Gene-and-Protein-Names.aspx) to answer the following questions: 9.

What is the difference between a gene name and a gene symbol?

Gene symbol is usually an acronym for the gene name for easier nomenclature. Gene symbols are also italicised.

10. Are human and mouse gene symbols written differently? If so, how? Yes, Humans contain 3-6 italicised characters all in upper-case usually using Arabic numbers, without Greek letters, roman numerals or punctuation. Mice usually have gene symbols atalicised with only the first letter in upper-case. 11. Are human gene and protein symbols written differently? If so, how?

Page 4 of 14

Only in the fact that they are not italicised 12. Are human and mouse protein symbols written differently? If so, how? No, as they are both non-italicised and in uppercase

Now we’re going to take a quick detour to learn how to search for medical and scientific terms. This is a very important skill to have and will help immensely with your assignments this semester. Sometimes it can be hard to search for the correct information you’re after, particularly when you are trying to define abbreviations and acronyms. For example, open Google in your internet browser and search for ‘CDS’. CDS is the abbreviation used in genetics to identify the Coding Sequence of a gene i.e. the nucleotides that encode the exons (and thus protein) and not the introns. When you search for ‘CDS’, however, this is not the information that comes up. 13. What are the links that come up when you google ‘CDS’? CDs, credit default swap You may find that the search results you’ve received are different to the search results other people in the class have received. This can be due to: 1. 2. 3.

The browser you’re using and the past searches you have performed in that browser- try using Google Chrome instead of Firefox and see if that changes the results you receive. The search engine you’re using – open Yahoo or Bing and see if you get different search results. The search terms you are using – you need to add in more keywords to narrow down your search.

14. Change your Google search to ‘CDS genetics’. What results do you get now? Coding region Wikipedia, uniport, CDS annotation The key to finding the correct definition you’re after is to try multiple search terms that include the general discipline you are searching for (e.g. genetics or science), words that you know are part of the term you’re searching for (e.g. CDS DNA), or keywords for what you’re looking for like ‘CDS definition’. You should also click to the second and third page of results as this is often where more specific (and consequently less popular thus they don’t make the first page) results are found. It is also important to use multiple resources when trying to define a term or answer a question, as often different websites and different scientific fields will have slightly altered definitions of that term. Open the following four links: 1. 2. 3. 4.

https://dictionary.cambridge.org/dictionary/english/species https://www.nature.com/scitable/definition/species-312 https://www.mammalsociety.org/articles/speciation-mammals-and-genetic-species-concept https://www.ncbi.nlm.nih.gov/books/NBK8406/

15. How does each link define the term ‘Species’? (Max. 1-2 sentences for each definition)

Page 5 of 14

Cambridge - a set of animals or plants in which the members have similar characteristics to each other and can breed with each other: Nature - A biological species is a group of organisms that can reproduce with one another in nature and produce fertile offspring. Mammal society - a group of genetically compatible interbreeding natural populations that is genetically isolated from other such groups. NCBI - Species, groups of similar organisms within a genus, are designated by biochemical and other phenotypic criteria and by DNA relatedness, which groups strains on the basis of their overall genetic similarity.

16. Does each link define ‘Species’ the same way? Hint: their differences might be subtle but important. No, Nature and mammal society group them as interbreeding populations. NCBI goes into a very molecular definition of species. Cambridge just focuses on their phenotype and ability to breed with one another. Mammal society includes genotype and NCBI goes to the deep genotypic level. In your assignments this semester you will be asked to define genetic concepts. It’s important that you use multiple resources to produce a clear and coherent definition for the term instead of just relying on the first definition you find. The best way to do this is to compare all of the different definitions – which part of the definition do they all have in common? Which part of the definitions seem the most logical? Which parts of the definitions best fit with the topic of this class? 17. Use the four links above to create your own definition for ‘Species’. Species – a group of organisms with phenotypic, biochemical and genotypic similarities that are able to interbreed naturally within their group.

Next, open the ‘National Centre for Biotechnology Information' (NCBI) website, which is housed by the National Institutes of Health in Washington, DC (https://www.ncbi.nlm.nih.gov/). This is a good website to bookmark for future use! Click the ‘learn’ link, followed by ‘tutorials’.

Page 6 of 14

Note particularly the range of tools, databases and other resources available, which are at the heart of modern biomedical science. Take some time (in or out of class) to explore this hugely valuable and important website, including explanatory videos and documents. These tutorials are very helpful in navigating through NCBI and its associated databases. Go back to the NCBI home page. Ensure the dropdown box on the left is set to ‘all databases' and use the search box at the top of the page to search for the gene symbol that you identified earlier in question 7. The search result will give you a broad overview of what is known about this gene. Take some time to explore it. When you consider that what you are seeing is the return from a search for just one gene, you get a sense of how useful and important the NCBI site is. 18. Click the links for ClinVar and OMIM and briefly describe what the databases are telling you about your gene/disease. Note: you may have to click on a few links in the OMIM database before you understand what it is showing you. The ‘About’ page is also very helpful. OMIM is concerned with genetic disorders and their cytogenic locations ClinVar investigates variants in the disease OMIM is curated by physicians and other professionals concerned with genetic disorders, such as genetic researchers. This means that the information it contains is up-to-date and relatively accurate. ClinVar is a resource within the NCBI website and is frequently used by clinical geneticists to investigate variants. Take time after the class to have a closer look through this website. It may come in handy for future tutorials and your assignments. 19. Are all of the variants listed on ClinVar for your gene pathogenic? No, some are likely pathogenic and others are not certain Ensure that the data being presented is sorted by location on the ClinVar page for your gene and scroll down to variant 244 (third page), which is listed as CFTR, Tyr109His This is the typical way in which human mutations/alleles are written as it tells you the mutation that is occurring. In this case, it is telling you that at amino acid 109, a mutation changed a tyrosine (Tyr) to a histidine (His). The same mutation may also be written as Y109H, depending on which database you are using, and will often have ‘p.’ written in front of it to tell you that it is referring to the protein sequence and not the DNA sequence. To figure out the exact nucleotide that was mutated and the abbreviations for other amino acids, you can refer to the following amino acid codon tool. By reading from the inside letter to the outside letter, you can see that Tyrosine is encoded by the codons TAT or TAC.

Page 7 of 14

20. What codon(s) encode histidine? CAT,CAC

21. What nucleotide change(s) in the codon would have to occur to change tyrosine into histidine? C -> T for TAT and TAC Some mutations are easier to decode than others, depending on the number of different codons that encode each amino acid. The specific nucleotide change is not listed on ClinVar for this mutation, but if you would like to know the exact nucleotide substitution you can follow the links on ClinVar to the paper in which information about the mutation was published. Other variants have more information. If you look at variant 228 for CFTR, you will see that is it written as NM_000492.4(CFTR): c.292C>T (p.Gln98Ter) [NOTE: if the website produces an error when you try and change to the next page, try it in a different internet browser e.g. safari, firefox, etc]. ‘NM_000492.4’ is the accession number for that allele’s sequence. The accession number is a unique ID that ensures it can be used in many different kinds of analysis by researchers all over the world, who all know that it is exactly the same data they are studying. ‘c.’ tells you that the mutation is within the DNA coding sequence of CFTR (instead of in an untranslated region or intron). ‘292C>T’ tells you that at nucleotide 292 within the gene, a C was mutated into a T. 22. The protein sequence mutation is written as p.Gln98Ter. You already know that Gln is glutamine and the mutation occurs at amino acid 98. What does Ter stand for? Termination

Page 8 of 14

23. It’s important to know all the different ways databases use to refer to the same amino acid. What are the three additional ways to write Ter? Note: Two are written in something you have already used today. One will be harder to find. Ask a demonstrator if you are unsure. TGA, nonsense mutation, STOP, Ter, X, *

ClinVar can also tell you a mutation’s type and location within the genome. Click on the link for variant 228 and it will take you to more information.

The cytogenetic location tells you where on the chromosome the variant occurs, and the genomic location tells you the exact nucleotide number that the variant occurs at within the genome. Go back to the page that displays the search results for your gene on NCBI (just before question 18). Under ‘Genomes’, click on the ‘Nucleotide' database. Note the first few entries on this new page. They are from different species and have very different lengths (e.g. 90 bp versus 568 bp). Think about why this might be the case. We will come back to this point in a minute. Under ‘Results by taxon' in the top right of the page, click on the 668 entries for ‘ Homo sapiens'. Make sure that the results page is set to default order sorting. (*: the NCBI website is continuously being updated with new sequencing data, therefore the number of results for Homo sapiens may be different when you access this page.)

Scroll down and click on entry ‘25' to open in a new window. Once this page has loaded, have a brief look at the information displayed. 24. What is the accession number of your gene? AH006034 25. How many base pairs (bp) is the gene sequence you are looking at? 25346bp

Click on ‘FASTA' towards the top left, which will display one of the experimentally determined sequences of your gene in a very simple format called FASTA. The FASTA format begins with a line containing a unique ID or ‘accession number' for this sequence data entry and a brief description of the sequence, followed by the sequence itself. 26. Knowing that four nucleotides make up a DNA sequence, what do you think the letter N signifies in this sequence? Gaps in the sequence – just to do with the alignment More detailed information about the gene is encoded in metadata in a GenBank formatted file that can be viewed by clicking the ‘GenBank' link in the top left of the page. Click this link now.

Page 9 of 14

Researchers who sequence genes or genomes have to submit this information before they can publish their results. This submission requirement has been in place since the mid 1980s, which is why this resource is so extensive and valuable. Take some time to look at the metadata. Scroll down to the bottom of the file. You will see the nucleotide sequence in a different format from the FASTA file, with gaps in the sequence indicated. Above the nucleotide sequence is information about the gene structure including exon boundaries and sites of variation. Above that is the sequence of the translated protein, and above that additional summary information and information about the source of the data. 27. How many exons are listed for your gene? Hint: you will have to count how many times ‘exon’ is written down the left side. 27

28. What are the first 10 amino acids in the protein translation for your gene? MQRSPLEKAS 29. What are the nucleotide locations of the start and stop codons in the protein, and what amino acid letter/symbol is the start codon represented by? Hint: look at the information listed as ‘CDS’. This stands for coding DNA sequence and it tells you the boundaries of all of the coding nucleotides in the gene ie. exonintron-exon boundaries. 829 ends at 23630 (even if they are spliced in exons) M= Met or start *= stop Once you have had a good look at this page, go back to the list of human sequences for your gene. Keep entry 25 open, but also open entry 1 and entry 8 30. How many base pairs (bp) is the gene sequence for entry 1? 568

31. How many base pairs (bp) is the gene sequence for entry 8? 6070

Page 10 of 14

32. Look at the metadata of these 2 new entries and compare them to the original entry (25). Why are the lengths of all three sequences so different? One is a partial CDS, entry 8 is the mRNA of the sequence and entry 25 is the full CDS

The metadata in these files is only a small part of the wide range of information currently available about the gene. This information is so vast and complicated and would be impossible for any one person to put it all together from the published literature alone. Fortunately, there are now easy ways to access and understand the data–and it is all down to the metadata. Because the information is represented as structured metadata it can be visualized using tools known as genome browsers. To see this in action, go back to entry 25 and click on the ‘Graphics' link towards the top of the page. When it is loaded, you will see a simple visual representation of the gene as a series of tracks. The boundaries of the gene are shown in the top track (green) and the exon positions are indicated in the bottom tracks (purple, red and black). Hover your mouse over one of the grey exons and you will pull up additional information and links. Zoom in as far as you can go and you will see that the grey middle track reveals the nucleotide sequence. You can add additional tracks containing other kinds of information about the gene, but rather than doing that here, we are going to explore another genome browser, which is hosted by the University of California, Santa Cruz. Using your web browser, find the UCSC Genome Browser (https://genome.ucsc.edu/). This is another enormously useful resource, which you would be well advised to spend time exploring. It is another good website to bookmark as we will use it again in future tutorials. Click the ‘Genome Browser' tab up the top of the screen:

It will take you to a page that looks similar to this:

Page 11 of 14

Type your gene symbol into the search box and click ‘Go’ (don’t click the suggested link that shows up! This will take you to the wrong link). You will get a page of results for your gene. The list of results is a bit overwhelming, but it gives you a sense of how much is known about this gene. Scroll down until you find the large heading ‘RefSeq Genes’ where two entries are listed. Click on the first entry (NM_000492). We will spend the remainder of this class understanding how data about this gene is represented through this browser and explore some of the data. First watch either of these videos for a general overview of the browser: https://www.youtube.com/watch?v=DNXI-M9oQl8 https://www.youtube.com/watch?v=09McdeQYcmU For the remainder of the class you will use the browser to explore some of the features of your gene. If you are not sure what you are doing, ask one of the tutors and they can help. You will navigate the data through selective use of the available tracks, the zoom and move navigation tools and by viewing the documentation linked to features on the t...


Similar Free PDFs