BMS Lab 2
BMS Lab 2
EXPERIMENT 01
HOMOLOGY MODELLING
Theory: Protein data bank is a resource contains archive-information about the 3D shapes of proteins,
nucleic acids, and complex assemblies. PDB is the most reliable source to download the structures to
perform molecular dynamics simulation. However, one needs to understand the PDB file format
throughly to comment on the crystallisation process of the selected biomolecule. The following list
of points are extremely important while considering the crystal structure from PDB.
1. Always look for missing residues in the structure file. They should be added back to the structure
before proceeding further with simulation.
2. Understand the use of crystal water present in the structure file. If the crystal water do not play any
role in the simulation, then remove them to avoid modelling complications.
3. Sometimes, to get good quality crystals, the crystallographers mutate the structure in the non-
functional site of the protein. This information can be fetched by reading the corresponding literature.
If the structure is mutated, then they should be back mutated before proceeding with the dynamics.
To add missing residues and to back mutate, we have to perform protein modelling. Here, we use the
application of MODELLER software to model the protein.
Useful links:
1) [Link]
2) [Link]
3) [Link]
4) [Link]
Note: The first line contains the sequence code, in the format ">P1;code". The second line with ten
fields separated by colons generally contains information about the structure file, if applicable. Only
two of these fields are used for sequences, "sequence" (indicating that the file contains a sequence
without known structure) and "AMYA" (the model file name).
Selecting a template:
• Download template structure and save it in modeller folder
• Open the pdb flat file and remove header and heteroatoms
Python based script is required run the modeller with .py file format. Modeller is not having any GUI
interface, it will run based on command prompt. (Sample script file is available in modeller’s example
folder)
Conclusion:
The protein 2obd was modelled using modeller tool. Any one of the modelled structures
or files can be used as target for further analysis such as Protein preparation, Ligand Preparation etc.
EXPERIMENT 2
PROTEIN PREPARATION AND PROTEIN MODELLING USING MODELLER
Aim: To prepare the protein using modeller and other tools.
Theory: Protein data bank is a resource contains archive-information about the 3D shapes of proteins,
nucleic acids, and complex assemblies. PDB is the most reliable source to download the structures to
perform molecular dynamics simulation. However, one needs to understand the PDB file format
throughly to comment on the crystallisation process of the selected biomolecule. The following list
of points are extremely important while considering the crystal structure from PDB.
1. Always look for missing residues in the structure file. They should be added back to the structure
before proceeding further with simulation.
2. Understand the use of crystal water present in the structure file. If the crystal water do not play any
role in the simulation, then remove them to avoid modelling complications.
3. Sometimes, to get good quality crystals, the crystallographers mutate the structure in the non-
functional site of the protein. This information can be fetched by reading the corresponding literature.
If the structure is mutated, then they should be back mutated before proceeding with the dynamics.
To add missing residues and to back mutate, we have to perform protein modelling. Here, we use the
application of MODELLER software to model the protein.
Useful links:
1) [Link]
2) [Link]
3) [Link]
4) [Link]
Procedure
Part 1:
# Target and Template identification:
1) Search for protein in NCBI databases by choosing appropriate search options.
→ 456Amino Acids
7) What is the percentage of Identity and Query coverage in the first hit of the search result.
→ Percentage identity of the first hit: 99.78% and Query coverage: 100%
Download FASTA sequence file and PDB structure file to your local desktop.
10) How many chains are present in the Resultant structure file → 1
11) Which chain is required to proceed further for model building? → Chain A
13) Is the crystal water is having any catalytic role in the enzyme? Comment on the same. →No
15) List out the missing residues in the selected chain of the protein.
16) Prepare protein structure file by deleting unwanted sections in the PDB file.
Part 2:
# Download and install modeller in your system
Part 3:
Working with modeller
1. Preparation of structure file.
The structure file should be in the form of .pdb file format. The PDB file of the Template will serve
as a structure file in Modeller.
Alignment file should be in .ali extension file format. The aligned sequence in the alignment file
should be in PIR format.
[Link] down the list of newly created files. [Link], [Link], [Link]
3. Understand each and every file generated and comment on the same.
→[Link]: it is a gromacs readable file and exactly similar to a pdb file.
→[Link]: it is a position restrain file that contains the information of the force needed to be applied
on protein.
→[Link]: it contains the information of bond parameters such as bond length, bond angle, dihedral
angle, force parameters and non-bonded interaction parameters fetched from .itp file.
Result:
The protein 1BP1 was modelled using modeller tool. One of the modelled structures was used as
target for further analysis and converted the protein file .pdb [Link] and got three output files are
[Link], [Link] and [Link].
EXPERIMENT 3
LIGAND PREPARATION AND GENERATION OF LIGAND FORCE FIELD
Aim: To prepare the ligand for molecular dynamics simulation and generation of the force field
Theory: Ligands are the most important aspect in understand the kinetics of an enzyme. The
ligands can be classified generally as substrates or inhibitors or enhancers of the enzymes.
However, it's important to understand the role of ligands in enzyme action. In molecular
dynamics simulations, the structure of the ligands is obtained from any ligand database.
PubChem is one of the important repositories of the chemical structures.
In molecular dynamics simulation, the force filed (library of parameters) for proteins is very
standard and it was developed by many scientific communities around the world. It is very
straight forward to design the force field of proteins, as every protein in nature is made of 20
amino acids. If one develops the parameters of 20 amino acids, which can easily adopted for all
proteins. But, in case of ligand this is not the case. Ligands are structurally very diverse and we
don’t have the force field (Library of parameters) ligands. Thus, one has to develop the force
filed of ligand of interest before use. Automated topology builder (ATB server) is one of the
important tool in designing the force field of ligands. Thus, SMILES notation form the PubChem
can be used in ATB server to develop the force filed of ligands.
Useful links:
1. [Link]
2. [Link]
3. [Link]
Procedure Part 1:
# Identification of ligand: The protein of interest is crystal structure of BPI, The human
bactericidal permeability-increasing protein
1. Identification of the suitable ligand against SARS-CoV-2 (COVID-19) main protease using
literature survey.
Ans: PC1
2. The main point while searching the ligands is to screen the possible leads, which are found to be
effective against the family of proteases.
3. Identification of the ligand using PubChem: Go to PubChem server, enter the ligand name in the
search box.
17. Suitably edit the residue names in both .itp and .pdb files. (Do not proceed on your own: Ask your
instructor about this step)
18. Add hydrogen molecules to the docked ligand file. (If needed)
19. Retain the topology of ATB ligand file and confirmation of the docked file to which hydrogens
added.
20. Suitably edit the atom numbers of the [Link] (Do not proceed on your own: Ask your
instructor about this step)
21. Arrange the ligand coordinates according to the [Link] file.
22. Run the following command to convert the ligand file from
pdb to .gro /usr/local/gromacs/bin/gmx editconf -f [Link] -o [Link]
Part-2
Questions
1) What is ATB?
Ans: ATB stands for Automated Topology Builder. ATB server is used to fetch the force field
for the specific ligand
2) Which confirmation did you choose in ATB and why?
Ans: I selected the conformation with the RMSD value, which is having lower deviation when
compared to the other available conformations
3) What does the .itp file contain?
Ans: It is an independent topology file which contains the information of the particular protein
or ligand.
4) What is force field?
Ans: Force field is a library of parameters where MD algorithm fetches required parameters
depending on the system of study.
Results
Ligand preparation was completed. Force field compatible for both ligand and protein and ligand
parameters were downloaded from the ATB server. Gro file of the ligand was generated using the
editconf command.
EXPERIMENT 4
PREPARATION OF THE PROTEIN LIGAND COMPLEX, VACUUM MINIMIZATION,
PERIODIC BOUNDARY CONDITION, SYSTEM SOLVATION, ADDING IONS AND
ENERGY MINIMIZATION
Aim: To prepare protein ligand complex followed by vacuum minimization, periodic boundary
condition, system solvation, adding ions and energy minimization.
Theory: Once the protein and ligands are ready, it is important to prepare the protein-ligand complex.
The force filed of protein alone or the force filed of ligand alone cannot be used for the simulation of
the complex. Thus, we need to have the force field of protein-ligand complex. This, force field can
be obtained from ATB server. Once the topology of Protein ligand complex is ready, we have to
perform vacuum minimization.
In the field of computational chemistry, energy minimization (also called energy optimisation,
geometry minimization, or geometry optimisation) is the process of finding an arrangement in space
of a collection of atoms where, according to some computational model of chemical bonding, the net
inter-atomic force on each atom is acceptably close to zero and the position on the potential energy
surface (PES) is a stationary point (described later). In general, finding global energy minimised state
of a protein ligand complex.
Here, we perform energy minimisation of the complex under vacuum condition followed by
minimisation under solvent condition. To have solvent condition we add water molecule in the
defined periodic boundary and neutralise the system with Na+ and Cl- ions. After minimization the
system will be ready for equilibration.
Procedure:
Part 1: # Complex preparation
1. Create a new file in the text editor as [Link].
2. Paste [Link] (complete) and [Link] (exclude first two lines and last line of [Link]) in
[Link] file.
3. Update the total number of atoms in the [Link] and retain the cartesian coordinates in the
last line.
4. SAVE.
5. How many atoms are present in the [Link] file? → 4668 atoms
Part-2
#Vacuum Minimization
1. Files required for vacuum minimization is [Link] file, topology files, .gro file.
2. .mdp file is referred as molecular dynamics parameter file which hold the parameters of dynamics
run.
3. Comment on the various parameters present in the .mdp file.
/usr/local/gromacs/bin/gmx grompp -f [Link] -c [Link] -p [Link] -o protein-EM-
[Link]
Figure: Files generated after running the command and vacuum minimization
Part-3
#System solvation, adding ions and energy minimization with solvent.
1. The required files system solvation are .mdp file, .gro file, [Link] file
2. To solvate the system use the following command.
5. Can you see the solvation in the periodic boundary box? → Yes
6. Can you see the protein/DNA of interest along with water? → Yes
7. How many water molecules are added in this step? (To get the answer refer [Link] file)
10. How many ions are added in this step? (To get the answer refer [Link] file)
QUESTIONS
1) Expand VMD and PBC.
→ VMD- Visual Molecular Dynamics
PBC- Periodic Boundary Condiotions
3) What was the difference observed when [Link], [Link], [Link] and solv_ions.gro were
visualised in VMD?
→ [Link] file contains only information about protein-ligand topology and it is not fitted in
PBC box.
[Link] contains protein-ligand topology but it is fitted in the centre of PBC box.
[Link] file contains the information of protein-ligand and number of water molecules added
solv_ions.gro file contains the information of protein-ligand topology, water topology and the ions
topology added to [Link] file
5) What is -bt, -d, -c, -p, -cp and -cs indicated in the command?
→ -bt: box type
-d: equal
-c: call
-p: call [Link]
-cp: call and copy
-cs: call and save
6) Did you observe any changes in the topology file after the solvation and neutralisation?
→ Yes, the number of water molecules and ions added will be updated and noted at the end of files.
Water molecules added: 178379, NA: 502, CL: 514
Result:
The protein-ligand complex is prepared using [Link] and [Link] files. The ligand topology is
added in [Link] by including [Link] and periodic boundary box is built with system at centre
using gromacs commands. The system is exposed to vacuum minimization and is solvated using
solvate command. The ions are then added to the system using genion command to maintain pH. The
system’s energy is then minimized successfully.
EXPERIMENT 5
SYSTEM EQUILIBRATION USING NVT AND NPT ENSEMBLE SYSTEM AND
PERFORMING MD RUN
Theory: In MD simulations, atoms of the macromolecules and of the surrounding solvent undergo a
relaxation that usually lasts for tens or hundreds of picoseconds before the system reaches a stationary
state. The initial nonstationary segment of the simulated trajectory is typically discarded in the
calculation of equilibrium properties. This stage of the MD simulation is called equilibration stage.
Equilibration protocols are still largely a matter of personal preference. Some protocols call for very
elaborate procedures involving gradually increasing temperature in a step-wise fashion while other
more aggressive approach simply use a linear temperature gradient and heat the system up to the
desired temperature.
In our example, we'll follow the protocol of equilibration in two stages. In the first stage, we will start
the system from a low temperature of 100 K and gradually heat up to 300 K over 10 picosecond of
simulation time. We will perform this stage of equilibration with the volume held constant. This type
of equilibrium is referred as NVT equilibrium. In the second stage we gradually maintain the required
atmospheric pressure and keep pressure as constant throughout the equilibration phase. This type of
equilibration is referred as NPT equalisations. Also, we use the position restrain on the atoms initially,
which are gradually reduced to zero over multiple NPT equilibration simulations.
Useful links:
1. [Link] Procedure
Part-1
#System equilibration
1. Understand the concept of ensemble, NPT ensemble and NVT ensemble systems.
2. Download [Link] file and [Link] file from suitable GROMACS tutorial.
3. Glance through both .mdp files.
4. Comment on the parameters of .mdp file and understand its use while running GROMACS.
5. Position restraining is an important aspect in MD simulations.
6. To position restrain the atoms, 1000 KJ/mol of external energy is used.
7. In MD simulation, we initially keep position restrain for all atoms followed by step-by-step release
of restrain on atoms. Finally, all atoms will be set free to run without any restrain
[Link] the position restrain file of ligands using the following command.
Figure: Merging protein and ligand file using the command which generates [Link] file
11. System equilibration in NVT ensemble (position restrain 1000 KJ/mol is maintained in NVT
ensemble)
12. Use the following command to run NVT ensemble.
➢ /usr/local/gromacs/bin/gmx grompp -f [Link] -c [Link] -p [Link] -n [Link] -o [Link]
-r [Link]
➢ /usr/local/gromacs/bin/gmx mdrun -deffnm nvt -v
13. The output of NVT ensemble equilibration will be used as input for NPT ensemble
14. To run NPT ensemble use following command
15. As mentioned earlier, the restrain should be released slowly.
16. For this, we perform NPT ensemble simulation for multiple times by reducing the position restrain
force gradually.
➢ /usr/local/gromacs/bin/gmx grompp -f [Link] -c [Link] -t [Link] -p [Link] -n [Link]
-o [Link]
➢ /usr/local/gromacs/bin/gmx mdrun -deffnm npt-1000 -v
Part-2
#Production run
1. The production run should be performed in NPT ensemble for the whatever nano second is
required.
2. For this, download [Link] file form GROMACS tutorials.
3. However, the parameters of [Link] and [Link] file will be almost similar.
4. Finally use the following command for production run.
5. Use servers for the production run.
4) Are there any variations present in the microscopic entities when the protein is in its static state?
→ Yes, the velocity changes as the bond length varies.
5) By keeping NPT and NVT constant, what are the other macroscopic parameters you can calculate?
→ By keeping NPT constant we can measure the variation in Energy, Volume and Chemical potential.
Whereas by keeping NPT constant we can measure the variation in Pressure, Energy and Chemical
Potential.
Result:
The system will be present minimized energy (Previous experiment) and then subjected to
equilibration using NVT and NPT ensembles. In NVT ensemble, the number of atoms, volume and
temperature are kept constant with 300K whereas in NPT ensemble, the number of atoms, pressure
and temperature are kept constant. The force is slowly released in NPT equilibration steps from
1000KJ/mol to 0KJ/mol. Finally the system is ready for production run and maintain the steps of
1130 in [Link].