Project Part A - Protease PDF

Title	Project Part A - Protease
Course	Bioinformàtica Aplicada
Institution	Universitat de Girona
Pages	15
File Size	1.6 MB
File Type	PDF
Total Downloads	6
Total Views	911

Preview

CLICK TO PREVIEW PDF

Summary

Applied Bioinformatics ProjectPart AUniversity of GironaYear 2020/Biology GradeGene Name: HIV‐1 ProteaseAccession number: GQMartí JunyentSara GuzmánPart 1: Provide a brief gene description & structure description. What's thegene length in number of nucleotides and amino acids?HIV-1 isolate ABPv3...

Description

Applied Bioinformatics Project Part A University of Girona Year 2020/21 Biology Grade

Gene Name: HIV‐1 Protease Accession number: GQ118492

Martí Junyent Sara Guzmán

Part 1: Provide a brief gene description & structure description. What's the gene length in number of nucleotides and amino acids? HIV-1

isolate

ABPv3.15

from

USA

protease (pol) gene, partial cds. with

accession number GQ118492. (pol) indicates that our gene, protease (PR), is encoded in the bigger polymerase (pol) sequence of the HIV-1 virus, meaning that it is a section of the larger pol gene of the virus. [1][2] We can see this on the image below [i1], where we find the pol sequence situated in the 3rd reading frame. The pol sequence encompases other HIV-1 structural type genes, one of them being the PR gene as mentioned above. In more specific detail we can find the Pro gene located between the p6 gene and the Reverse transcriptase (RT) gene, situated in the C-terminus of PR and N terminus of the PR (transframe region) respectively. [i1][3]

Image 1 [i1]: Genome map of the HIV-1 Virus. The protease sequence has a total length of 297bp, the

product of which is a

99aa protease enzyme. [4] To understand the mechanism of protease transcription we must understand how the HIV virus enters the cell, when the viral HIV-RNA enters the cell, it does

so

accompanied

by

a

reverse

transcriptase,

an

integrase

and

most

importantly a mature HIV-1 protease. The reverse transcriptase will convert viral RNA into DNA which will be subsequently incorporated to the host-DNA thanks to the integrase. The viral DNA will then either remain dormant or begin transcription into mRNA followed by being translated by the host cell.

Putting this information into context, we can see that the transcription of the protease gene is englobed on a bigger more complex process [m1]. In the image above we can see that the Gag gene and the Pol gene are found in different reading frames. When transcription begins, there is a certain motive in the gag mRNA that in some circumstances causes the ribosome to shift frame, in this context we will introduce the Gag-Pol precursor, meaning that in about 5% of the time, the ribosome will shift frame and begin the transcription of the pol gene. This means that the pol gene will always be transcribed in the context of the Gal-Pol precursor. Implying that the PR gene will be transcribed only after the frame shift occurs. In the human immunodeficiency virus type-1 (etiologic agent of the acquired immunodeficiency syndrome, AIDS) during the stages of virus assembly, the precursors that will encode the structural proteins and enzymes that form the final mature virus, are processed by this viral protease. This protease plays many vital roles in viral replication, and without it, the final mature virus product wouldn’t present infectious activity. This protease (HIVP), as in all of the retroviruses, belongs to the group of aspartic proteases, which only function as a dimer. [5] Aspartic proteases are a group of proteolytic enzymes of the pepsin family which

usually

function

in

acid

solutions,

they’re

found

in

different

organisms. [6]

Part 1.1: Does the gene codify for a protein? What number of exons and introns does it have? The gene does codify for a protein, the HIV-1 Protease with accession number GQ118492.1. This protease will work in a variety of processes as seen before. However this particular gene does not have any introns present, meaning the number of exons is 1, the sequence itself.

Part 1.2: Choose five sequences that are homologous with HIV‐1 Protease gene. For each sequence, provide its access number and explain the reason for choosing it. The sequences chosen are as follow: >HIV-1 isolate M from India protease (pol) gene, partial cds JX256205 >HIV-1 isolate N from India protease (pol) gene, partial cds JX256207 >HIV-1 isolate O from India protease (pol) gene, partial cds JX256206 >HIV-2 isolate 2001 from Mali protease (pol) gene, partial cds AY688934 >HIV-2 isolate V05-06108 from Burkina Faso protease (pol) gene, partial cds EF090175 >Simian immunodeficiency virus STLV-III(AGM) proviral genome protease Y00295 The

reasoning

behind

this

choice

was

to

facilitate

a

phylogenetic

reconstruction. Based on sources [7] and [i2] we can see that the Human Immunodeficiency Virus first appeared in apes. Through history there have existed a great variety of Simian Immunodeficiency Viruses, just as currently we find different varieties of HIV in humans, as seen on the image below. The

most

transmitted

widespread to

variant,

humans

through

HIV-1 a

M

has

zoonotic

been event

traced

to

from

Pan

its

origin,

troglodytes

troglodytes, in a remote area in the southeast corner of Cameroon. [7] Knowing this we have decided to take the sequences of different HIV varieties which seemed appropriate, (HIV-1: M group, O group and N group; as well as HIV-2 and SIV), which will allow us to do a phylogenetic analysis as all of these are homologous sequences.

Image 2 [i2]: Phylogenetic tree of SIV and HIV viruses.

Part 1.3: Define your INGROUP and OUTGROUP in order to describe a phylogeny of the gene. In order to define our INGROUP and OUTGROUP we must keep in mind the genera of each homologous sequence that we have chosen. Knowing this, we have decided that our INGROUP is made up of the following sequences that belong to the Homo genera: >HIV-1 isolate M from India protease (pol) gene, partial cds JX256205 >HIV-1 isolate N from India protease (pol) gene, partial cds JX256207 >HIV-1 isolate O from India protease (pol) gene, partial cds JX256206 >HIV-2 isolate 2001 from Mali protease (pol) gene, partial cds AY688934 >HIV-2 isolate V05-06108 from Burkina Faso protease (pol) gene, partial cds EF090175 Otherwise, our OUTGROUP is made up of one sequence that belongs to the Pan genera even though it is closely related to our INGROUP sequences: >Simian immunodeficiency virus STLV-III(AGM) proviral genome protease Y00295

Part 1.4: Perform a multiple alignment of your sequences. Once we have performed the multiple alignment of our sequences, we obtain an alignment of 962 base pairs. To perform the multiple alignment we used the BioEdit program, specifically the accessory application ClustalW Multiple Alignment. The sequence alignment in fasta format is shown below:

Next, we have created a consensus sequence, it is also shown below in fasta format:

Regarding the number of variable positions, there are a total of 139/962 variable positions among the 6 homologous sequences we have studied. As we can see, the six sequences are quite similar, as we only find 139 variable positions out of a total of 962 positions studied.

Image 3. Number of variable positions for the six sequences obtained with the Mega program.

To finish, we have created a distance matrix. The MEGA program has been used to perform the Pairwise Distances matrix. The distance matrix is shown in the table below:

Image 4. Distance matrix obtained at MEGA program. Table 1. Distance matrix obtained using MEGA for AMS sequences. Below the diagonal are the calculations of distance between sequences and above the diagonal the variances. Seq->

1

HIV-1 isolate M from India protease

2

3

4

5

6

0.013

0.012

0.028

0.028

0.031

0.012

0.028

0.028

0.031

0.028

0.028

0.031

0.018

0.025

HIV-1 isolate N from India protease

0.051

HIV-1 isolate O from India protease

0.044

0.047

HIV-2 isolate 2001 from Mali protease

0.368

0.355

0.368

HIV-2 isolate V05-06108 from Burkina Faso protease

0.370

0.367

0.367

0.110

Simian immunodeficiency virus STLV-III

0.408

0.400

0.408

0.197

0.024 0.169

As this is a matrix of distances, the most similar sequences will be those with a lower value. Thus, the most similar sequences are those corresponding to

HIV-1

isolate

M from India protease and HIV-1 isolate O from India

protease with a value of 0.044. On the other hand, the less similar sequences are HIV-1 isolate M from India protease and Simian immunodeficiency virus STLV-III; and HIV-1 isolate O from India protease and Simian immunodeficiency virus STLV-III, both with a value of 0.408. Knowing this, we can say that we have chosen our Outgroup correctly.

Part 1.5:Perform a phylogenetic reconstruction with at least two methods. From

the

multiple

alignment

of

the

chosen

sequences,

we

performed

a

phylogenetic analysis in order to analyze the kinship relationships between the sequences. First,

we

are

going

to

perform

a

phylogenetic

reconstruction

with

the

Neighbor-Joining tree as the first method (Bootstrap method), based on a distant matrix. The principle of this method is to find pairs of operational taxonomic units (OTUs) that minimize the total length of the branch in each stage of grouping of OTUs while obtaining a rootless tree.[8] The phylogenetic tree we have obtained through the Bootstrap method is shown below.

Image 5. Neighbor-Joining tree obtained for the six sequences of the AMS. For the second method, we are going to use the Maximum Parsimony tree, this method uses for a given topology (branching pattern of a tree), the sum of the minimum number of possible nucleotide substitutions. Consequently, it generates what is known as the tree of length (usually without root) and the topology

of

a

tree

of

minimum

length is known as the tree of maximum

parsimony.[9] The phylogenetic tree we have obtained through this method is also shown below.

Image 6. Maximum Parsimony tree obtained for the six sequences of the AMS.

The results obtained by the phylogenetic reconstructions (Image 5 and 6) fit quite well with respect to those obtained by the distance matrix (Table 1). In this way it is reconfirmed that the most similar sequences are HIV-1 isolate M from India protease and HIV-1 isolate O from India protease, and the most different sequence is Simian immunodeficiency virus STLV-III, being able to consider this last one like an outgroup.

Part 1.6:Make a biological description of the gene using GO terms (Gene Ontology). To

obtain

the

GO

terms

for

our

gene

first

we

will

need

the

UniProt

identifying number, in our case no.O90777. We find 6 related annotations with our gene product, all of them related to the HIV-1 virus protease. · 2 of these annotations relate to the aspartic-type endopeptidase activity, which is enabled by the protease, in the category of molecular function. · 2 of these annotations are involved in the proteolysis process, in the category of biological process ·

And

the

latter

2

relate

to

the

peptidase

and

hydrolase

activity

respectively, both are enabled by the protease, belonging in the category of molecular function. These GO terms fit the explanation given at the start of the project, given as our molecule is a protease (of aspartic type) it is well described by the found GO terms. By grouping the findings using the Basket tool we get the following Ancestor chart, which situates the less specific terms at the top.

The chart visually represents how the aspartic endopeptidase activity (which forms part of the peptidase activity) plays a role in the proteolysis. This proteolysis will take part in many more general roles involving proteins, we can relate this to the HIV-1 maturation process. The GO terms closely relate to the role played by the HIV-1 protease, which is a key player in the virus life cycle. During infection, the HIV virus generates long chains of proteins through its mRNA. The role of the HIV-1 protease will be to assemble a new virus particle, it will do so by dividing and cutting these long protein chains into individual ones, at this point the shorter

proteins

join

with

other

virus

RNA

copies

forming a new virus

particle. The long protein chain is mostly needed at early life stages of the virus, given the fact that they form the non-mature virus capsid, in the infection process the protein chain must be cut to remodel this capsid into the ideal shape to mature the virus, allowing infection to happen.

This obviously involves the hydrolase activity presented by aspartic-type endopeptidases,where

the

dimerized

HIV-1

protease

functions

through

the

aspartyl group complex to perform hydrolysis. All of these processes and functions will have to happen in a coordinated manner to allow for the correct virus life cycle to happen.

With this knowledge we can determine that the function of the protease will happen within the virus, around the virus capsid. When the virus particle is endocytosed

by

the

new

host

cell

the

protein

envelopes

(including

the

protease) get removed, only leaving the viral RNA. The next part where we can locate the protease is in the continuation of the replication process, where the protease will be located assembling the new virus particle.

Bibliography [i1]By Thomas Splettstoesser (www.scistyle.com) - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=33943759 [m1]Mark Birkhead. (1st of June 2017). HIV genomic structure and function. Youtube. https://www.youtube.com/watch?v=0hg_U3WSqeA [1] HIV-1 isolate ABPv3.15 from USA protease (pol) gene, partial cds Nucleotide

-

NCBI.

(s/f).

Retrieved

May

27,

2021,

of

Nih.gov

website:

https://www.ncbi.nlm.nih.gov/nuccore/285027826

[2]Gene Products, pol - MeSH - NCBI. (s/f). Retrieved May 27, 2021, of Nih.gov

website:

https://www.ncbi.nlm.nih.gov/mesh?Db=mesh&Cmd=DetailsSearch&Term=%22Gene+Prod ucts,+pol%22%5BMeSH+Terms%5D

[3]Mirambeau,

G.,

Lyonnais,

S.,

Coulaud,

D.,

Hameau,

L.,

Lafosse,

S.,

Jeusset, J., … Le Cam, E. (2007). HIV-1 protease and reverse transcriptase control the architecture of their nucleocapsid partner. PloS One, 2(7), e669. [4]Protease, (s/f).

partial

Retrieved

[Human immunodeficiency virus 1] - Protein - NCBI. May

29,

2021,

of

Nih.gov

website:

https://www.ncbi.nlm.nih.gov/protein/285027827

[5]Zhang, S., Kaplan, A. H., & Tropsha, A. (2008). HIV-1 protease function and structure studies with the simplicial neighborhood analysis of protein packing method. Proteins, 73(3), 742–753.

[6]Tang, J., & Wong, R. N. (1987). Evolution in the structure and function of aspartic proteases. Journal of Cellular Biochemistry, 33(1), 53–63.

[7]Protease.

(s/f).

Retrieved

May

https://www.uniprot.org/uniprot/O90777

29,

2021,

of

Uniprot.org

website:

[8]Saitou, N. & Nei, M. (1987) The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4:406-425.

[9]Kannan, L., & Wheeler, W. C. (2012). Maximum Parsimony on Phylogenetic networks.

Algorithms

for

Molecular

Biology :

AMB,

https://almob.biomedcentral.com/articles/10.1186/1748-7188-7-9

7,

9....