compbio-logo

PKAD-R: curated, expanded and redesigned database of experimental pKa values in proteins

Professor Emil Alexov Group

compbio-logo
We encourage/ask experimental investigators to submit their pKa data for inclusion in the database via email to: Ana Damjanovic <adamjan1@jhu.edu>

About PKAD-R

Understanding pKa values in ionizable protein residues is critical for understanding fundamental protein properties, such as structure, function and interactions. We present a new version of PKAD, named PKAD-R, which is a curated database of experimentally determined protein pKa values. The database builds upon its predecessors, PKAD and PKAD-2, with significant updates and improvements through: (1) careful data curation to remove incorrect entries and consolidate redundant entries by offering alternative structures and pKa values for each unique residue (2) database redesign, to enhance its usability by adding additional information such as protein and species names, detailed notes, as well as sequence identity (3) database expansion through identification of 214 new (128 non-redundant) pKa entries from the literature. The database currently contains 877 unique pKa entries for wild type structures and 147 for mutant structures, however, we aim to keep updating the database with new entries. The PKAD-R database is available as a stand-alone downloadable file as well as web servers. The database is designed to provide both a set of pKa entries for unique residues suitable for machine learning applications, as well as modularity by providing alternative pKa values and structures, allowing the user to decide which entries to include

A dataset of entries with known pKa values but without suitable PDB structures is also provided for download for interested users.

Please see the publication for greater details and cite to acknowledge the use of PKAD-R in your work:

Ada Y. Chen1, Shailesh Kumar Panday2, Kaoru Ri3, Emil Alexov2, Bernard R. Brooks1, Ana Damjanovic4,5,1,* PKAD-R: curated, redesigned and expanded database of experimental pKa values in proteins Journal of Computational Biophysics and Chemistry 2025, https://doi.org/10.1142/S2737416525500164


Laboratory of Computational Biology, National Heart, Lung and Blood Institute, NIH, Bethesda, MD 20892
Department of Physics and Astronomy, Clemson University, Clemson, SC 29634
Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21287
Department of Biophysics, Johns Hopkins University, Baltimore, MD 21218
Department of Physics and Astronomy, Johns Hopkins University, Baltimore, MD 21218

 Search/Filter: 
 
 pKa Data: 
“Protein Name” and “Species”
These two columns provide protein names and species for each entry, enabling quick and efficient searches. Users can quickly determine whether a pKa value from the literature is already included in the database by searching for the protein name and species.
“pKa Classification”
The column categorizes each entry as “Main,” “Alt. pKa,” “Alt. pKa (mutant),” or “Alt. pKa (state).” Entries labeled as “Main” represent the most recommended pKa value for a given residue, paired with the most appropriate PDB structure. The label “Alt. pKa” indicates another pKa measurement for the same residue in the same protein and for the same species as the “Main” entry. “Alt. pKa (mutant)” refers to a pKa measured for the same residue in a mutated version of the protein. “Alt. pKa (state)” denotes a pKa measured in a different state of the protein, such as deoxyhemoglobin versus oxyhemoglobin. For more details on selection of “Alt. pKa (mutant)” and “Alt. pKa (state)” see the description in the “Database Curation” section in the paper.
“Alternative PDBs”
The column lists additional PDB structures that are also available for use and similar to the primary PDB structure listed in the "PDB" column.
"Sequence Identity > 30%" and "Sequence Identity > 90%"
These two columns list chains in the format PDB-ID.Chain-ID (e.g., "1EX3.A") included in this database that have sequence identities greater than 30% and 90%, respectively, compared to the sequence of the current entry's chain. Sequence identity is calculated using the PairwiseAligner class from the Bio.Align module within the Biopython library. We include a tag, 'mutation_on_site,' for chains with a mutation directly on the measured ionizable site, such as some of the SNase variants, serving as a warning to alert users that these entries should not be discarded based solely on sequence identity.
“ResID in PDB” and “ResID in pKa paper”
The column "ResID in PDB" lists the residue ID as it appears in the corresponding PDB structure, while "ResID in pKa paper" indicates the residue ID referenced in the original pKa publication when it differs from the PDB.
“Notes”
This column provides, for each residue, details about the selection of the most appropriate pKa value and PDB structure for the "Main" entry, as well as any additional relevant information.
“Warning”
icolumn labels entries under specific conditions: 1) when the pKa is a range or an approximation (labeled as “pKa: range or ~”); 2) when the residue is the C-terminus or N-terminus (labeled as “C/N-term”); 3) when the residue does not exist in the PDB structure but is present in the protein (labeled as “ResID NOT exist”), likely due to its high flexibility and disorder, which makes accurate structural definition difficult; 4) when a mutated structure is used for a wildtype protein, but the mutation is distant from the targeted residue, allowing the structure to be approximately treated as wildtype (labeled as 'approx. WT'). This column helps users quickly filter out entries that may not meet their needs, such as those unsuitable for direct use in machine learning studies.
All othe columns
The column names are self-explanatory.
 
Copyright © Computational Biophysics and Bioinformatics - Emil Alexov Group.