Bioinformatics project PDF

Title Bioinformatics project
Author Bas Braad
Course Introduction to Bioinformatics
Institution Wageningen University & Research
Pages 21
File Size 1.4 MB
File Type PDF
Total Downloads 83
Total Views 425

Summary

Warning: TT: undefined function: 32BIF- 20306project reportStudent name Student ID1. Bas Braad 9904091150202. Jorg Roelofs 981003700040INSTRUCTIONSFor each project task, you have one page (approx. 600 words) to describe the followingitems:1. Materials & MethodsWhat did you do? Which data, da...


Description

BIF-20306 project report

Student name

Student ID

1.

Bas Braad

990409115020

2.

Jorg Roelofs

981003700040

INSTRUCTIONS

For each project task, you have one page (approx. 600 words) to describe the following items: 1. Materials & Methods What did you do? Which data, databases and tools did you use, and why did you choose these? What important settings did you select? 2. Results What did you find, what are the main results? Report the relevant data, numbers, tables/figures, and clearly describe your observations. 3. Discussion & Conclusion Do the results make sense? Are they according to your expectation or do you see something surprising? What do the results mean, how can you interpret them? Do different tools agree or not? What can you conclude? Make sure to describe the expectations and assumptions underlying your interpretation. Each project task may be supported by one page of figures and tables. You have to be selective on your supporting materials, e.g. sequences do not have to be included; accession numbers are sufficient. Checklist: ● Is your work (largely) reproducible based on your description of Materials & Methods? ● Are the figures and tables labeled with descriptive captions? ● Did you clearly describe your results and your observations? ● Have you critically discussed your results? ● Are the names and student numbers of all project partners in the file? ● Does the filename of your PDF file include usernames or last names?

TASK 1 Materials & Methods: A search was done in Uniprot for ARF5 in A. thaliana on default settings. We searched for the ID’s P93024 (ARF5) and P33078 (IAA5). Uniprot was used because it is the biggest database for proteins. We looked at the different properties of the protein and which other proteins it reacts with. We did the same for the IAA5 protein. We used the align two sequences setting to put the two sequences against each other using blastp on the NBCI site to see if they have any similarities. We used blastp because it is an easy way to compare proteins. Finally we looked at the genes that encode these protein. We searched in the gene database on NCBI for the genes encoding ARF5 and IAA5. We had a look at the location and the expression of the genes.

Results: Auxin response factors (ARF’s) bind to a specific DNA sequence: 5'-TGTCTC-3' found in the auxinresponsive promoter elements . The length of the ARF5 protein is 902 amino acids. The interaction partners of ARF5 are BRX and the auxin-responsive proteins IAA1, IAA12 (BODENLOS), IAA17 and ARF7. The functional region of the protein is the DNA binding site on the position 158-260 amino acids. The molecular functions of the protein are activator, development protein and DNA-binding. IAA5 is a short-lived transcriptional factor that functions as a repressor of genes in the auxin pathway. The length of the IAA5 protein is 163 amino acids. IAA proteins interact with ARF proteins, it forms heterodimers and alters the functionality of the ARF proteins. The IAA5 proteins has different important domains, domain 1 on the N-terminal of the protein includes an EAR motif which seems to be involved in repression of transcription of different genes in the auxin pathway. Domains 3 and 4 are the regions where interactions with the ARF proteins take place. There are 2 sequences in both proteins that align. This is from 760-873 amino acids for ARF5 which align with 32-156 for IAA5. And the second one is from 227-237 for ARF5 and 143-154 for IAA5. Location of the IAA5 gene is on chromosome 1 and it is expressed in the flower. It is expressed during petal differentiation and expansion stage. The ARF5 is on the same chromosome and is expressed in 21 different plants structures and during 11 growth stages of the plant.

Discussion & Conclusion: As seen from our uniprot searches for both the IAA5 and ARF5 proteins, they both reacts with other members of the same family. IAA5 however does not interact with ARF5 itself. The IAA5 protein is a small transcriptional factor that is involved in the repression of genes in the auxin pathway, it can bind to different ARF proteins and keep them from activating certain genes. ARF5 and IAA5 show similarities in amino acid sequence, we expected to find these similarities because they interact in the same pathway and with other members of the same family. IAA5 is much smaller than ARF5 because it is only a transcriptional factor that represses the function of certain proteins. ARF5 acts as an activator for different genes in the auxin pathway, it’s bigger than the IAA5 protein because it needs to have more active sites to bind to DNA. The genes for both proteins lie on chromosome 1, IAA5 is expressed very specifically and in only during the petal differentiation and the expansion stage, whereas ARF5 is expressed all throughout the plant and during a lot of development stages of A. thaliana. Because the big difference in expression of the genes we can conclude that the Aux/IAA family is very specifically regulated during different stages in development and the ARF proteins are more widespread and TASK 1 TABLES AND FIGURES

Figure 1: Alignments of ARF5 and IAA5

TASK 2 Materials & Methods: We performed a protein-protein blast using the ncbi website, we searched for similar proteins to ARF5 in the swissprot database using and furthermore using default settings. First we searched for similar proteins within A. thaliana, then we searched for similar proteins outside of A. thaliana by using the exclude button and excluding A. thaliana. The conserved regions were found in the graphic summary tab.

Results: The homologs of ARF5 are ARF19 with Q(uery cover)= 60%, E(-value)=5e-161 and P(er. ident)= 71.30%, ARF7 with Q=56%,E=6e-158 and P= 63.24%, ARF6 with Q=42%, E=3e-156 and P=62.08%. Furthermore are a lot of other auxin response factors (ARF) homolog to ARF5. Outside A. thaliana we find different ARF proteins for O. sativa with ARF19 with Q= 71%, E=2e-160 and P= 63.73%. ARF12 with Q= 50%, E=2e-158 and P= 63.11%. Other species that have similar proteins are S. lycopersicum. The conserved regions for ARF5 are: - The B3 binding domain, the auxin response factor and the AUX/IAA transcription.

Discussion & Conclusion: We found that many ARF proteins are similar to the ARF5. This is what we expected because they have the same functions. From the comparison of proteins outside of of A. thaliana we found a lot of ARF proteins in O. sativa. The E-value represents the number of alignments with a score of at least S that would be expected by chance alone in searching a complete database of n sequences. The lower the e-value the more significant is the hit. The E-values for the first 10 hits for ARF5 are really low which means the hits are definitely significant (Figure 1).

TASK 2 TABLES AND FIGURES

Figure 2: Homologs of ARF5 within the A. thaliana species.

Figure 3: Conserved regions for ARF5 homologs on NCBI

TASK 3

Materials & Methods: We downloaded all the different files from brightspace and we put them into Mafft. In Mafft we used the default settings and we saved them in FASTA format. We opened the file in Mesquite and got the proper reading frame by using the following settings: Display > Widths > Thin Rows, Display > Bird’s Eye View, minimised the stop codons, Display > Color Matrix Cells > Color nucleotide by amino acid, Display > Show Color Legend. Than we used MEGA7 to test the sequence divergence. We computed pairwise distances with the following settings: Variance estimation Method = None • Model/Method = Tamura Nei model • Substitutions to include = Transitions + Transversions • Rates among sites = Uniform rates • Gaps/Missing Data Treatment = Pairwise deletion and complete deletion

Results: In alignment 86ARF_DNA_unal there are no stopcodons but a few broken codons. For example in taxon XM0220467871PREDICTEDCaricapapayaauxinresponsefactor5LOC1108129. The overall sequence divergence for complete deletion using 86ARF_DNA is 0.239. Pairwise deletion= 0.276 In alignment 207ARF_DNA_unal there are no stopcodons but there are some broken codons. For example in taxon "XM0096226682PREDICTEDNicotianatomentosiformisauxinresponsefactor83 The average of alignment 207ARF_DNA with complete deletion = 0.232. Partial deletion = 0.278

Discussion & Conclusion: We did not find any stop codons which is good because we did not expect any. We found a few broken codons but that is expected. The MSA’s can be considered good because it has a divergence lower than 0.3. We used the Tamura-Nei model because it corrects for multiple hits, taking into account the differences in substitution rate between nucleotides and the inequality of nucleotide frequencies. In this case using complete deletion is better because there are more gaps and errors present which get eliminated resulting in a lower sequence divergence. Apparently there were some gaps and errors in the alignment which is normal.

TASK 3 TABLES AND FIGURES

alignment 207ARF_DNA_unal.fas MESQUITE

Figure 3: The Mesquite alignment of 207ARF_DNA

Figure 4: Settings for 207ARF_DNA with complete deletion

TASK 4 Materials and method With the alignments from task 3, we constructed neighbour joining trees in MEGA 7 for the 86DNA, 207DNA and 207AA alignments, we rooted the trees at the outgroup family nelumbo nucifera. We used the Tamura-Nei model to construct the trees. The trees were then recreated through IQ-tree for the maximum likelihood analysis, this analysis was with Ultrafast Bootstrapping. Eventually we ended up with 5 trees. We analyzed the trees in figtree, highlighted different clades and families. Then we compared the different trees with each other to see if the poor DNA group had roughly the same clades and families as the rich DNA group. We also looked if aligned based on protein sequences the families would still be the same.

Results:

We clustered the trees at the major family clades, highlighted in different colors to make it easier to find certain clades. This was done for all the trees. The trees with DNA had very similar clusters as the tree with the Amino acids. For all five trees a characteristic cluster can be found with ARF proteins in Brassicaceae, these cluster are mostly uniform. Only in the NJ tree with 207 DNA sequences some contamination is found in this cluster. ARF proteins are found in all the organisms in the eudicots. These ARF’s are not the same for each organism. The cause of this could be a duplication event where one gene was evolved and a different gene was suppressed. Such an event would have given rise to a new kind of ARF. The event of gene loss is also possible but a lot less likely than a duplication event, this is because with a gene loss event can result in the lack of ability to activate gene expression and then the root formation would likely be inhibited.

Discussion & Conclusion:

For the trees with 207 DNA sequences it is visible that there are 2 major clades that originated later than the rest of the clades. These clades are also visible in the 86 DNA sequences trees but we don’t know if they happened around the same time because it is hard to compare the two as they have a different amount of sequences. Monophyly can be found in all the trees, in some trees there is however some contamination of clades what makes it harder to spot it.

TASK 4 TABLES AND FIGURES You may go over the one-page limit in this task; max 1 page per tree. R DI DNelu bonuciferaau inresponsefactor li e R DI DNelu bonuciferaau inresponsefactor li e R DI DNelu bonuciferaau inresponsefactor li e R DI D itis iniferaau inresponsefactor R DI D e eabrasiliensisau inresponsefactor li e R DI D anihotesculentaau inresponsefactor R DI DRicinusco unisau inresponsefactor R DI D atrophacurcasau inresponsefactor R DI D ucal ptusgrandisau inresponsefactor R DI D ucal ptusgrandisau inresponsefactor F Di ocarpuslonganculti ar onghe iau inresponsefactor RNA R DI D itrussinensisau inresponsefactor R DI D rus bretschneideriau inresponsefactor R DI D alus do esticaau inresponsefactor R DI D runusa iu au inresponsefactor R DI D runus u eau inresponsefactor R DI DFragaria escasubsp escaau inresponsefac R DI D i iphus u ubaau inresponsefactor orusnotabilisAu inresponsefactor partial RNA R oeh eriani eaau inresponsefactorARF RNAco plete R DI D uglansregiaau inresponsefactor R DI D uglansregiaau inresponsefactor R DI D o ordicacharantiaau inresponsefactor R DI D ucu issati usau inresponsefactor R DI D upinusangustifoliusau inresponsefactor l R DI D upinusangustifoliusau inresponsefactor l R DI D a anusca anau inresponsefactor R DI D l cine a au inresponsefactor li e R DI D ignaangularisau inresponsefactor R DI D ignaradiata arradiataau inresponsefact haseolus ulgarish potheticalprotein A edicagotruncatulaau inresponsefactor partial RNA A otus aponicuscDNAclone F AF R DI DArachisduranensisau inresponsefactor R DI DArachisduranensisau inresponsefactor R DI D oss piu rai ondiiau inresponsefactor li R DI D oss piu arboreu au inresponsefactor li e R DI D heobro acacaoau inresponsefactor R DI D aricapapa aau inresponsefactor R DI D aricapapa aau inresponsefactor R DI D a elinasati aau inresponsefactor li e R DI D a elinasati aau inresponsefactor R DI D a elinasati aau inresponsefactor li e apsellarubellah potheticalprotein AR g N Arabidopsisthaliana ranscriptionalfactor fa il protei R DI DArabidopsisl ratasubspl rataau inrespons R DI D rassicaoleracea aroleraceaau inresponse R DI D rassicaoleracea aroleraceaau inresponse R DI D rassicanapusau inresponsefactor li e R DI D rassicanapusau inresponsefactor li e R DI DRaphanussati usau inresponsefactor li e R DI DRaphanussati usau inresponsefactor li e R DI D arena ahasslerianaau inresponsefactor R DI D arena ahasslerianaau inresponsefactor R DI D arena ahasslerianaau inresponsefactor l R DI D pinaciaoleraceaau inresponsefactor R DI D apsicu annuu au inresponsefactor R DI D apsicu annuu au inresponsefactor li e R DI D olanu tuberosu au inresponsefactor R DI D olanu pennelli au inresponsefactor N olanu l copersicu au inresponsefactor ARF RNA R DI DIpo oeanilau inresponsefactor R DI DIpo oeanilau inresponsefactor R DI D esa u indicu au inresponsefactor R DI D r thrantheguttatusau inresponsefactor l R DI D esa u indicu au inresponsefactor R DI D r thrantheguttatusau inresponsefactor R DI D elianthusannuusau inresponsefactor li e R DI D elianthusannuusau inresponsefactor li e R DI D laeisguineensisau inresponsefactor R DI DAnanasco osusau inresponsefactor R DI D usaacu inatasubsp alaccensisau inrespons R DI D usaacu inatasubsp alaccensisau inrespons R DI D usaacu inatasubsp alaccensisau inrespons R DI D upinusangustifoliusau inresponsefactor R DI D upinusangustifoliusau inresponsefactor R DI D a anusca anau inresponsefactor li e R DI D l cine a au inresponsefactor li e R DI D l cine a au inresponsefactor li e R DI D oss piu rai ondiiau inresponsefactor R DI D oss piu arboreu au inresponsefactor R DI DAsparagusofficinalisau inresponsefactor R DI D itis iniferaau inresponsefactor R DI D opuluseuphraticaau inresponsefactor li e opulusto entosaclone to au inresponsefactor R DI D oss piu rai ondiiau inresponsefactor li

.

Figure 5: NJ tree for the 86 DNA sequences

R DI DNelu bonuciferaau inresponsefactor li e R DI DNelu bonuciferaau inresponsefactor li e R DI DNelu bonuciferaau inresponsefactorRliDIe DNelu bonuciferaau inresponsefactor li e R DI DNelu bonuciferaau inresponsefactor li e

R DI D itis iniferaau inresponsefactor R DI D itis iniferaau inresponsefactor

R DI D e eabrasiliensisau inresponsefactor li e R DI D anihotesculentaau inresponsefactor R DI D e eabrasiliensisau inresponsefactor li e RR DIDI DD atrophacurcasau atrophacurcasau inresponsef inresponsefaactctoorr R DI D atrophacurcasau inresponsefactor opulusto entosaclone to au inresponsefactor opulustrichocarpah potheticalprotein R s R DI D opuluseuphraticaau inresponsefactor li R DI D opuluseuphraticaau inresponsefactor li R DI D iopul icaauactorinresponsef R DI D opuluseuphrat caauuseuphrat inresponsef li actor li opulusto entosaclone to au inresponsefactor R DI DRicinusco unisau inresponsefactor F Di ocarpuslonganculti ar onghe iau inresponsefactor RNA R DI D itrussinensisau inresponsefactor R DI D itrussinensisau inresponsefactor itruscle entinah potheticalprotein I g RR DIDI DD oss inresponsefactor llii e oss pipiuu raihirsutondiu iauauinresponsefactor R DI D oss piu arboreu au inresponsefactor li e R DI D oss piu hirsutu au inresponsefactor li e R R DI D oss piu hirsutu au inresponsefactor li e R DI D oss piu rai ondiiau inresponsefactor li R R DI D oss piu hirsutu au inresponsefactor li e RR DIDI DD oss rsutu auauinresponsefactor oss pipiuu hiarboreu inresponsefactorliliee R DI D erraniau braticaau inresponsefactor R DI D heobro acacaoau inresponsefactor R DI D aricapapa aau inresponsefactor R DI D aricapapa aau inresponsefactor R DI D ucal ptusgrandisau inresponsefactor R DI D ucal ptusgrandisau inresponsef actor h potheticalprotein A utre asalsugineu N Arabidopsisthaliana ranscriptionalfactor fa il protei AF Arabidopsisthalianatranscriptionfactor RNAco plete N Arabidopsisthaliana ranscriptionalfactor fa il pro ArabidopsisthalianatranscriptionfactorAt g RNAc ArabidopsisthalianaIAA IAA RNApartialcds Arabidopsi oneratahauau inrespons inresponsefactor RAFDI DArabi dopsissthall ratiaanacl subspl R DI DArabidopsisl ratasubspl rataau inrespons A hellungiel ahalophila RNAco pletecdscloneR F R DI D a elinasati aau inresponsefactor li e R DI D a elinasati aau inresponsefactor apsellarubellah potheticalprotein AR g R DI RDDIa elDinaasati aau inaauresponsefactor li e li e elinasati inresponsefactor R DI D rassicanapusau inresponsefactor li e R DI D rassicanapusau inresponsefactor li e R DI D rassicaoleracea aroleraceaau inresponse R DI D rassicanapusau inresponsefactor li e R DI D rassicarapaau inresponsefactor ARF R DI D rassicarapaau inresponsefactor ARF R RDIDID DRaphanussati rassicarapaau inusau responsefact or ARF li e inresponsefactor R DI DRaphanussati usau inresponsefactor li e R DI DRaphanussati usau inresponsefactor li e R DI DRaphanussati usau inresponsefactor li e R DI D rassicanapusau inresponsefactor li e R DI D rassicanapusau inresponsefactor li e R RDIDID Drassicaol eracea arolinresponsefactor eraceaau inresponse rassicanapusau li e N rassicarapasubsppe inensisau inresponsefactor AR R DI D rassicanapusau inresponsefactor li e R DI D rassicarapaau inresponsefactor ARF R DI D rassicarapaau inresponsefactor ARF R DI D rassicanapusau inresponsefactor li e R DI D rassicanapusau inresponsefactor li e rassiccanapusau anapusau iinnresponsefactor responsefactor llii ee RR DIDI DD rassi R DI D rassicarapaau inresponsefactor ARF R DI D rassicanapusau inresponsefactor li e N rassicarapasubsppe inensisau inresponsefactor AR R DI D rassicarapaau inresponsefactor ARF R DI D rassicaoleracea aroleraceaau inresponse R DI D rassicaoleracea aroleraceaau inresponse rassiccanapusau anapusau inresponsef inresponsefact RR DIDI DD rassi actoorr lili ee R DI D rassicanapusau inresponsefactor li e R DI DRaphanussati usau inresponsefactor R DI D arena ahasslerianaau inresponsefactor l R DI D arena ahasslerianaau inresponsefactor R DI D arena ahasslerianaau R DI D pinaciianresponsefactor oleraceaau inresponsefactor R R DI D upinusangustifoliusau inresponsefactor l R R DI D upinusangustifoliusau inresponsefactor l R R DI D upinusangustifoliusau inresponsefactor l R R DI D upinusangustifoliusau inresponsefactor l R DI D upinusangustifoliusau inresponsefactor l RR DIDI DD upi upinnusangusti usangustiffololiiusau usau iinnresponsefactor responsefactor ll R DI D upinusangustifoliusau inresponsefactor l R DI D upinusangustifoliusau inresponsefactor l R DI D a anusca anau inresponsefactor R DI D l cine a au inresponsefactor li e R DI D ignaangularisau inresponsefactor R DI D ignaradiata arradiataau inresponsefact haseolusedicagotruncat ulgarish pothetiulcaalauprotinresponsef ein A actor partial RNA A...


Similar Free PDFs