Title | Project Part A - Protease |
---|---|
Course | Bioinformàtica Aplicada |
Institution | Universitat de Girona |
Pages | 15 |
File Size | 1.6 MB |
File Type | |
Total Downloads | 6 |
Total Views | 911 |
Applied Bioinformatics ProjectPart AUniversity of GironaYear 2020/Biology GradeGene Name: HIV‐1 ProteaseAccession number: GQMartí JunyentSara GuzmánPart 1: Provide a brief gene description & structure description. What's thegene length in number of nucleotides and amino acids?HIV-1 isolate ABPv3...
Applied Bioinformatics Project Part A University of Girona Year 2020/21 Biology Grade
Gene Name: HIV‐1 Protease Accession number: GQ118492
Martí Junyent Sara Guzmán
Part 1: Provide a brief gene description & structure description. What's the gene length in number of nucleotides and amino acids? HIV-1
isolate
ABPv3.15
from
USA
protease (pol) gene, partial cds. with
accession number GQ118492. (pol) indicates that our gene, protease (PR), is encoded in the bigger polymerase (pol) sequence of the HIV-1 virus, meaning that it is a section of the larger pol gene of the virus. [1][2] We can see this on the image below [i1], where we find the pol sequence situated in the 3rd reading frame. The pol sequence encompases other HIV-1 structural type genes, one of them being the PR gene as mentioned above. In more specific detail we can find the Pro gene located between the p6 gene and the Reverse transcriptase (RT) gene, situated in the C-terminus of PR and N terminus of the PR (transframe region) respectively. [i1][3]
Image 1 [i1]: Genome map of the HIV-1 Virus. The protease sequence has a total length of 297bp, the
product of which is a
99aa protease enzyme. [4] To understand the mechanism of protease transcription we must understand how the HIV virus enters the cell, when the viral HIV-RNA enters the cell, it does
so
accompanied
by
a
reverse
transcriptase,
an
integrase
and
most
importantly a mature HIV-1 protease. The reverse transcriptase will convert viral RNA into DNA which will be subsequently incorporated to the host-DNA thanks to the integrase. The viral DNA will then either remain dormant or begin transcription into mRNA followed by being translated by the host cell.
Putting this information into context, we can see that the transcription of the protease gene is englobed on a bigger more complex process [m1]. In the image above we can see that the Gag gene and the Pol gene are found in different reading frames. When transcription begins, there is a certain motive in the gag mRNA that in some circumstances causes the ribosome to shift frame, in this context we will introduce the Gag-Pol precursor, meaning that in about 5% of the time, the ribosome will shift frame and begin the transcription of the pol gene. This means that the pol gene will always be transcribed in the context of the Gal-Pol precursor. Implying that the PR gene will be transcribed only after the frame shift occurs. In the human immunodeficiency virus type-1 (etiologic agent of the acquired immunodeficiency syndrome, AIDS) during the stages of virus assembly, the precursors that will encode the structural proteins and enzymes that form the final mature virus, are processed by this viral protease. This protease plays many vital roles in viral replication, and without it, the final mature virus product wouldn’t present infectious activity. This protease (HIVP), as in all of the retroviruses, belongs to the group of aspartic proteases, which only function as a dimer. [5] Aspartic proteases are a group of proteolytic enzymes of the pepsin family which
usually
function
in
acid
solutions,
they’re
found
in
different
organisms. [6]
Part 1.1: Does the gene codify for a protein? What number of exons and introns does it have? The gene does codify for a protein, the HIV-1 Protease with accession number GQ118492.1. This protease will work in a variety of processes as seen before. However this particular gene does not have any introns present, meaning the number of exons is 1, the sequence itself.
Part 1.2: Choose five sequences that are homologous with HIV‐1 Protease gene. For each sequence, provide its access number and explain the reason for choosing it. The sequences chosen are as follow: >HIV-1 isolate M from India protease (pol) gene, partial cds JX256205 >HIV-1 isolate N from India protease (pol) gene, partial cds JX256207 >HIV-1 isolate O from India protease (pol) gene, partial cds JX256206 >HIV-2 isolate 2001 from Mali protease (pol) gene, partial cds AY688934 >HIV-2 isolate V05-06108 from Burkina Faso protease (pol) gene, partial cds EF090175 >Simian immunodeficiency virus STLV-III(AGM) proviral genome protease Y00295 The
reasoning
behind
this
choice
was
to
facilitate
a
phylogenetic
reconstruction. Based on sources [7] and [i2] we can see that the Human Immunodeficiency Virus first appeared in apes. Through history there have existed a great variety of Simian Immunodeficiency Viruses, just as currently we find different varieties of HIV in humans, as seen on the image below. The
most
transmitted
widespread to
variant,
humans
through
HIV-1 a
M
has
zoonotic
been event
traced
to
from
Pan
its
origin,
troglodytes
troglodytes, in a remote area in the southeast corner of Cameroon. [7] Knowing this we have decided to take the sequences of different HIV varieties which seemed appropriate, (HIV-1: M group, O group and N group; as well as HIV-2 and SIV), which will allow us to do a phylogenetic analysis as all of these are homologous sequences.
Image 2 [i2]: Phylogenetic tree of SIV and HIV viruses.
Part 1.3: Define your INGROUP and OUTGROUP in order to describe a phylogeny of the gene. In order to define our INGROUP and OUTGROUP we must keep in mind the genera of each homologous sequence that we have chosen. Knowing this, we have decided that our INGROUP is made up of the following sequences that belong to the Homo genera: >HIV-1 isolate M from India protease (pol) gene, partial cds JX256205 >HIV-1 isolate N from India protease (pol) gene, partial cds JX256207 >HIV-1 isolate O from India protease (pol) gene, partial cds JX256206 >HIV-2 isolate 2001 from Mali protease (pol) gene, partial cds AY688934 >HIV-2 isolate V05-06108 from Burkina Faso protease (pol) gene, partial cds EF090175 Otherwise, our OUTGROUP is made up of one sequence that belongs to the Pan genera even though it is closely related to our INGROUP sequences: >Simian immunodeficiency virus STLV-III(AGM) proviral genome protease Y00295
Part 1.4: Perform a multiple alignment of your sequences. Once we have performed the multiple alignment of our sequences, we obtain an alignment of 962 base pairs. To perform the multiple alignment we used the BioEdit program, specifically the accessory application ClustalW Multiple Alignment. The sequence alignment in fasta format is shown below:
Next, we have created a consensus sequence, it is also shown below in fasta format:
Regarding the number of variable positions, there are a total of 139/962 variable positions among the 6 homologous sequences we have studied. As we can see, the six sequences are quite similar, as we only find 139 variable positions out of a total of 962 positions studied.
Image 3. Number of variable positions for the six sequences obtained with the Mega program.
To finish, we have created a distance matrix. The MEGA program has been used to perform the Pairwise Distances matrix. The distance matrix is shown in the table below:
Image 4. Distance matrix obtained at MEGA program. Table 1. Distance matrix obtained using MEGA for AMS sequences. Below the diagonal are the calculations of distance between sequences and above the diagonal the variances. Seq->
1
HIV-1 isolate M from India protease
2
3
4
5
6
0.013
0.012
0.028
0.028
0.031
0.012
0.028
0.028
0.031
0.028
0.028
0.031
0.018
0.025
HIV-1 isolate N from India protease
0.051
HIV-1 isolate O from India protease
0.044
0.047
HIV-2 isolate 2001 from Mali protease
0.368
0.355
0.368
HIV-2 isolate V05-06108 from Burkina Faso protease
0.370
0.367
0.367
0.110
Simian immunodeficiency virus STLV-III
0.408
0.400
0.408
0.197
0.024 0.169
As this is a matrix of distances, the most similar sequences will be those with a lower value. Thus, the most similar sequences are those corresponding to
HIV-1
isolate
M from India protease and HIV-1 isolate O from India
protease with a value of 0.044. On the other hand, the less similar sequences are HIV-1 isolate M from India protease and Simian immunodeficiency virus STLV-III; and HIV-1 isolate O from India protease and Simian immunodeficiency virus STLV-III, both with a value of 0.408. Knowing this, we can say that we have chosen our Outgroup correctly.
Part 1.5:Perform a phylogenetic reconstruction with at least two methods. From
the
multiple
alignment
of
the
chosen
sequences,
we
performed
a
phylogenetic analysis in order to analyze the kinship relationships between the sequences. First,
we
are
going
to
perform
a
phylogenetic
reconstruction
with
the
Neighbor-Joining tree as the first method (Bootstrap method), based on a distant matrix. The principle of this method is to find pairs of operational taxonomic units (OTUs) that minimize the total length of the branch in each stage of grouping of OTUs while obtaining a rootless tree.[8] The phylogenetic tree we have obtained through the Bootstrap method is shown below.
Image 5. Neighbor-Joining tree obtained for the six sequences of the AMS. For the second method, we are going to use the Maximum Parsimony tree, this method uses for a given topology (branching pattern of a tree), the sum of the minimum number of possible nucleotide substitutions. Consequently, it generates what is known as the tree of length (usually without root) and the topology
of
a
tree
of
minimum
length is known as the tree of maximum
parsimony.[9] The phylogenetic tree we have obtained through this method is also shown below.
Image 6. Maximum Parsimony tree obtained for the six sequences of the AMS.
The results obtained by the phylogenetic reconstructions (Image 5 and 6) fit quite well with respect to those obtained by the distance matrix (Table 1). In this way it is reconfirmed that the most similar sequences are HIV-1 isolate M from India protease and HIV-1 isolate O from India protease, and the most different sequence is Simian immunodeficiency virus STLV-III, being able to consider this last one like an outgroup.
Part 1.6:Make a biological description of the gene using GO terms (Gene Ontology). To
obtain
the
GO
terms
for
our
gene
first
we
will
need
the
UniProt
identifying number, in our case no.O90777. We find 6 related annotations with our gene product, all of them related to the HIV-1 virus protease. · 2 of these annotations relate to the aspartic-type endopeptidase activity, which is enabled by the protease, in the category of molecular function. · 2 of these annotations are involved in the proteolysis process, in the category of biological process ·
And
the
latter
2
relate
to
the
peptidase
and
hydrolase
activity
respectively, both are enabled by the protease, belonging in the category of molecular function. These GO terms fit the explanation given at the start of the project, given as our molecule is a protease (of aspartic type) it is well described by the found GO terms. By grouping the findings using the Basket tool we get the following Ancestor chart, which situates the less specific terms at the top.
The chart visually represents how the aspartic endopeptidase activity (which forms part of the peptidase activity) plays a role in the proteolysis. This proteolysis will take part in many more general roles involving proteins, we can relate this to the HIV-1 maturation process. The GO terms closely relate to the role played by the HIV-1 protease, which is a key player in the virus life cycle. During infection, the HIV virus generates long chains of proteins through its mRNA. The role of the HIV-1 protease will be to assemble a new virus particle, it will do so by dividing and cutting these long protein chains into individual ones, at this point the shorter
proteins
join
with
other
virus
RNA
copies
forming a new virus
particle. The long protein chain is mostly needed at early life stages of the virus, given the fact that they form the non-mature virus capsid, in the infection process the protein chain must be cut to remodel this capsid into the ideal shape to mature the virus, allowing infection to happen.
This obviously involves the hydrolase activity presented by aspartic-type endopeptidases,where
the
dimerized
HIV-1
protease
functions
through
the
aspartyl group complex to perform hydrolysis. All of these processes and functions will have to happen in a coordinated manner to allow for the correct virus life cycle to happen.
With this knowledge we can determine that the function of the protease will happen within the virus, around the virus capsid. When the virus particle is endocytosed
by
the
new
host
cell
the
protein
envelopes
(including
the
protease) get removed, only leaving the viral RNA. The next part where we can locate the protease is in the continuation of the replication process, where the protease will be located assembling the new virus particle.
Bibliography [i1]By Thomas Splettstoesser (www.scistyle.com) - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=33943759 [m1]Mark Birkhead. (1st of June 2017). HIV genomic structure and function. Youtube. https://www.youtube.com/watch?v=0hg_U3WSqeA [1] HIV-1 isolate ABPv3.15 from USA protease (pol) gene, partial cds Nucleotide
-
NCBI.
(s/f).
Retrieved
May
27,
2021,
of
Nih.gov
website:
https://www.ncbi.nlm.nih.gov/nuccore/285027826
[2]Gene Products, pol - MeSH - NCBI. (s/f). Retrieved May 27, 2021, of Nih.gov
website:
https://www.ncbi.nlm.nih.gov/mesh?Db=mesh&Cmd=DetailsSearch&Term=%22Gene+Prod ucts,+pol%22%5BMeSH+Terms%5D
[3]Mirambeau,
G.,
Lyonnais,
S.,
Coulaud,
D.,
Hameau,
L.,
Lafosse,
S.,
Jeusset, J., … Le Cam, E. (2007). HIV-1 protease and reverse transcriptase control the architecture of their nucleocapsid partner. PloS One, 2(7), e669. [4]Protease, (s/f).
partial
Retrieved
[Human immunodeficiency virus 1] - Protein - NCBI. May
29,
2021,
of
Nih.gov
website:
https://www.ncbi.nlm.nih.gov/protein/285027827
[5]Zhang, S., Kaplan, A. H., & Tropsha, A. (2008). HIV-1 protease function and structure studies with the simplicial neighborhood analysis of protein packing method. Proteins, 73(3), 742–753.
[6]Tang, J., & Wong, R. N. (1987). Evolution in the structure and function of aspartic proteases. Journal of Cellular Biochemistry, 33(1), 53–63.
[7]Protease.
(s/f).
Retrieved
May
https://www.uniprot.org/uniprot/O90777
29,
2021,
of
Uniprot.org
website:
[8]Saitou, N. & Nei, M. (1987) The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4:406-425.
[9]Kannan, L., & Wheeler, W. C. (2012). Maximum Parsimony on Phylogenetic networks.
Algorithms
for
Molecular
Biology :
AMB,
https://almob.biomedcentral.com/articles/10.1186/1748-7188-7-9
7,
9....