DelPhiPKa User Manual
Developed
by Dr. Lin Wang
Computational
Biophysics & Bioinformatics
Prof.
Emil Alexov Lab at Clemson University
References:
The
following references should be cited if the use of DelPhiPKa results to a
publication.
In
particular, the first reference describes the methodology and the second
reference describes the web server.
1.
Lin
Wang, Lin Li, and Emil Alexov. "pKa predictions for proteins, RNAs and DNAs
with the Gaussian dielectric function using DelPhiPKa." Proteins. (2015) Sep 26, doi:
10.1002/prot.24935
2.
Lin
Wang, Min Zhang and Emil Alexov. "DelPhiPKa Web Server: Predicting pKa of
proteins, RNAs and DNAs." Bioinformatics.
(2015) Oct 29, doi: 10.1093/bioinformatics/btv607
Table of Contents
1.
Introduction
1.1
What
is DelPhiPKa
1.2 What is DelPhiPKa Web Server
2.
Installation
2.1 The compilation environment
2.2 How to compile the
program
3.
Basic Tutorial
3.1 What are the files in param folder
3.2 Edit the runtime control
file run.prm
3.3 How to run the program
3.4 Results and output files
4.
Advanced Tutorial
4.1 Edit the topology file
4.2 Edit HETATM in PQR
format
1. Introduction
1.1 What
is DelPhiPKa
DelPhiPKa is a DelPhi
based open source C++ program, allowing to predict pKa's
of ionizable groups of proteins, RNAs and DNAs. Some
the unique approaches and features include:
_
Use
Gaussian based dielectric function to mimic conformational changes associated
with ionization changes.
_
Calculate
the electrostatic energy without defining the molecular surface.
_
Provide
an option of various force field parameters.
_
Provide
different hydrogen conformations.
_
Protonate
the structure at particular pH using calculated pKa
values for ionizable residues
1.2 What
is DelPhiPKa Web Server
The DelPhiPKa
web server is built on DelPhiPKa program and distributed
on the Palmetto supercomputer cluster held in Clemson University. The web
server allows researchers to use the pKa calculation
program without installing the standalone code.
Since DelPhiPKa
implements MPI library, the web server allows users to submit the job for
parallel computing on 8 and up to 24 CPUs.
The web server provides the download of
the pKa calculated results, the titration curves and
the protonated structure in PQR format based on pKa
predictions and user specified pH.
2. Installation
2.1 The compilation
environment
The
DelPhiPKa program is designed to be compiled and run
on Linux/Unix and Mac OSX operation system. To compile the code, a C++ compiler
and several libraries are required:
1.
C++ Compiler (https://gcc.gnu.org)
We used GNU GCC to compile the code. The compilation is tested with Clang and Intel compilers on OSX. Use version 4.4 or above to compile, which includes C++11 features.
2. Boost Library (http://www.boost.org)
Boost library is used in DelPhi C++ code. Since DelPhi C++
is a part of DelPhiPKa program, boost library is
required for compilation. Use version 1.55.0 or above.
3. OpenMPI (http://www.open-mpi.org)
The DelPhiPKa program implements MPI library to parallelize the
energy calculation module and the titration module. To obtain the best
efficiency, use Open-MPI library to compile the code. Use version 1.8.1 or
above.
If you do want
to compile the sequential code, do the following:
a.
Edit prime_environment.h file
in src/delphiPKa directory.
b.
Delete or comment out these two lines:
#define MPI_PARALLEL
#include <mpi.h>
c.
Edit Makefile and change CC=mpic++ to CC=g++ or CC=c++, which depends on your compiler
4. GSL Library
(http://www.gnu.org/software/gsl/)
The GSL
library is used for fitting the titratation curves
and it is required for compilation. Use version 1.15 or above.
5. Command Line Tool and Xcode package (For OS X user only)
For users of OSX 10.8 and above, you need
to download and install Command Line Tool and Xcode
(optional) to compile the program. Clang compiler is the default C++ compiler
comes with Xcode package and has been fully tested.
2.2
How to compile the program
With the required libraries and C++
compiler above, run
make
in the directory contains Makefile to
compile the program.
3. Basic Tutorial
3.1
What are the files in param
folder.
Files in param folder are force-field
parameter file and topology file. Currently it contains AMBER, CHARMM, PARSE,
GROMOS force fields. The format is designed to be identical as DelPhi utilized *.crg atomic
charges and *.siz atomic radii files.
Topology file contains heavy atom bond
connectivity, hydrogen positions, residue types. It also contains reference pKa value for each ionizable
residue group.
The force field parameter files and
topology file are required to run the DelPhiPKa
program. Individual entry can be edited for specific purpose.
3.2
Edit the runtime control file run.prm
The entries in control file run.prm set the runtime parameters used in the
program. Four entries must be edited before running the program. You can leave
the rest of parameters as default, or edit them as you desire.
Required editing:
PDB
file name
Specify the PDB name. Currently, DelPhiPKa only supports standard PDB format. If other
format is used, for example PQR format is used, the program will only read xyz
coordinates and charge/radius values will be skipped.
Charge
parameter
Specify the atomic charges parameter
file. If the param
folder is located in other directory, you need to specify the corresponding
directory.
Radius
parameter
Specify the atomic radii parameter file.
Modify the directory if needed as above.
Topology
parameter
Specify the topology parameter file.
Modify the directory if needed as above.
Other control entries (can be left as
default):
Remove
HETATM
Remove all HETATM information in PDB
file, making those HETATM not involved in the calculations. Default is T
(true).
Remove
water molecule
Remove all water molecule in PDB file.
Default is T (true).
HETATM
in PQR format
If you want to take into account ions or
ligands (HETATM) involved in the calculation, make this entry as T (true) and
make Remove HETATM option as F (false). Thus, ions and ligands will be treated
as permanent charges. The program will not output pKa
values for those, however their existence as permanent charges will affect the pKa's of ionizable residues on
the macromolecule. If you want to use the feature, the corresponding HETATM
lines in the PDB file have to be modified in PQR format. As the charges and
radii information for these atoms are not included in topology file, users are
responsible for editing them in PQR format. For more details, refer to the
Advanced Tutorial section.
Do
Protonation
If you want to generate the protonated
structure in PQR format, make this entry as T (true).
Do
Energy Calculation
Run the energy calculation module and
generate energy.txt and pairwise.txt files, which calculates the
electrostatic polar energy (in energy.txt),
the desolvation energy (in energy.txt), the charge-charge pairwise interaction energy (in pairwise.txt). The default is T (true).
If you have previously calculated energy.txt
and pairwise.txt output files, and
make this entry as F (false), then the program will skip running this module
and read energy terms from those two files and continue to do pKa calculations.
Do
pKa's Calculation
Generate the titration curves and
calculate pKa values. Default is T (true).
Output
PQR file (with Topology)
Add hydrogens to the PDB and add the corresponding
atomic charge and radius to each atom (PQR file). This step does not need to do
energy and pKa calculation. It is the fast way to
obtain the protonated structure. Default is F (false).
Output
PQR file (with pKa result)
Similar as the previous entry, but
protonate each ionizable residue based on its
calculated pKa value at the user defined pH. At particular pH, each ionizable
residue can be either in its protonated or deprotonated state depends on the pKa value. Default is F (false).
At
given pH value
Associated with the previous entry, a
user defined pH value is considered.
Gaussian
surface
Set "1" to use the smooth Gaussian
dielectric model to calculate electrostatic potentials; "0" to use homogenous
dielectric model.
Variance
of Gaussian distribution
This is sigma in the Gaussian distribution formula, which determines how
the Gaussian function assigns the dielectric constant for the protein and
protein-water interface. The protocol is based on how atoms are packed. If
atoms are tightly packed, a low value is assigned for the epsilon; if atoms are
loosely packed, a high value is assigned for the epsilon. The assigned value is
between Internal Dielectric and External Dielectric in the next entry.
According to our benchmark results against experimental data, 0.70 is set as
default, because with this value we obtained the best RMSD for surface residues
on native proteins. If your target is buried residue or mutant protein
(mutation site is buried), set the value to 0.90-0.95. Currently, there is no
unique value for this entry.
Internal
Dielectric
The reference dielectric constant in the
Gaussian distribution formula for the protein interior. Default is 8.0
according to our benchmark results against experimental data.
External
Dielectric
The dielectric constant in the Gaussian
distribution formula for the water.
Cluster
Delimitation Threshold (A)
This is the threshold for the distance
within each network. Its recommended value is greater than 10, but less than 15
for efficiency. Default is 12 (angstrom).
Hydrogen
of GLU Attached to Atom
The hydrogen position to be placed with,
can be either OE1 or OE2 atom of the glutamic acid (GLU). Default is OE1 atom.
Hydrogen
of ASP Attached to Atom
The hydrogen position to be placed with,
can be either OD1 or OD2 atom of the aspartic acids (ASP). Default is OD1 atom.
pH
Initial Value
The initial pH value to start titration.
Default is 0.
pH
End Value
The final pH value to end titration.
Default is 14.
pH
Interval
The pH interval during titration. Default
is 1.0
3.3
How to run the program
With required force field parameter and
topology files, and proper run.prm file, you are able to run the program.
If compiled
with Open-MPI implementation, run: (x is the number of CPUs you want to use)
mpirun –np x delphiPKa
run.prm
If compiled
the sequential version, run:
delphiPKa run.prm
3.4
Results and output files
If the job
runs successfully, it generates several output files.
pKa.csv
This csv file
gives the pKa value for each ionizable
residue with associated energy terms (the unit here is kcal/mol).
The energy terms include electrostatic polar energy for individual residue in
its protonated (+/-) state and neutral state, the desolvation
energy for individual residue in its protonated (+/-) state and neutral state.
titra.csv
This csv file
is the titration curve. It contains the probability of each residue in its
ionization state at particular pH from 0 to 14.
energies.txt
This file
contains the polar energy terms and the desolvation
energy terms (the unit is kcal/mol), they are the
same as in pKa.csv file.
pairwise.txt
This file
contains the charge-charge pairwise interaction energy terms (the unit is kt).
(pdb_name)_1.pqr
This PQR file
is the protonated structure based on the topology parameters.
(pdb_name)_2.pqr
This PQR file
is the protonated structure based on the calculated pKa
value for each ionizable residue.
4. Advanced Tutorial
4.1
Edit the topology file
Topology file is a parameter file
contains information for each residue such as heavy atom bond connectivity,
hydrogen positions, residue types. It also contains reference pKa value for each ionizable
group. Users can access and edit this file for their specific purpose.
The line that starts with "#" is skipped.
The line starts with "$" is read as atom
information. The format is
res: residue type (e.g. ASP)
atom: atom type (e.g. CB)
obtal: atomic orbital type (e.g. sp3 hybrid
orbitals)
conf: structure type (e.g. SD, side-chain)
batm: bond atom type (e.g. CA-CG-HB1-HB2)
Hydrogen naming rule like HB1 and HB2 is
used in the program. If you want to use the naming rule like 1HB, 2HB instead
of the default one, you can modify all HB1 to 1HB in the topology file.
If you want a specific atom (for example
atom XX) to be bonded to CB atom of ASP instead of CG atom in the example, you
can replace CG atom with the one that you desire (the XX atom) and also modify
the CG entry with the name of your desired atom (the XX atom) and the
corresponding bonded atoms in that entry.
The line starts with "*" is read as the
reference pKa value for individual ionizable side-chain. The default value is set according to
our benchmark results against experimental data. For specific purpose, users
can access and edit these values.
4.2
Edit HETATM in PQR format
DelPhiPKa is able to treat ions and ligands (which
are HETATM in PDB) as permanent charges and calculate their effects on protein ionizable residues. This can be applied to model cases
involving structures with HETATMS. To achieve so, you need to modify your PDB
file and make HETATM (ions/ligands) into PQR format. Here is an example, the
original PDB file contains zinc and calcium ions:
And they are modified into PQR format as:
As the atomic charge for ZN is 2.0000 and
for CA is 2.0000. The atomic radius for ZN is 1.7300 and for CA is 2.3000.
Be cautious with this feature, because
making the entries into PQR format is crucial for the calculation. If you do
these entries incorrectly or leave as the original (for example), the program
will read 1.00 as the atomic charge and 8.95 as the atomic radius for Zinc ion,
which would cause serious errors in the calculations.
______________________________________________________________________________
Last Updated: October, 2015. Dr. Lin Wang,
Computational Biophysics and Bioinformatics, Department of Physics, Clemson
University.