About SAMPDI-3Dv2
SAMPDI-3Dv2 is an updated version of SAMPDI-3D, trained on a larger dataset of mutations in proteins and DNA within protein-DNA complexes. It incorporates an extended feature set and employs a gradient boosting decision tree machine learning algorithm to predict changes in binding free energy caused by mutations in proteins (single amino acid) or DNA (single base-pair).
This method uses two distinct models:
-
A model trained on single amino acid mutations in proteins and their associated free energy changes, utilizing a variety of features derived from protein-DNA complex structures to predict the effects of single amino acid mutations.
-
A model trained on single base-pair mutations in DNA and their associated free energy changes, also leveraging features from protein-DNA complex structures to predict the impact of single base-pair changes.
SAMPDI-3Dv2 surpasses all existing state-of-the-art methods in terms of both Pearson correlation coefficient and root-mean-squared-error parameters across cross-validation datasets. The dataset used for model development (training and 5-fold cross validation) is made available to
download.
Users can also download the standalone SAMPDI-3Dv2 code (
available here) for local use. To do so, they must acquire, install, and configure the required software (mkdssp-v2.0.1, x3dna-dssr-v2.4.5, Scwrl4 v4.0.2 and psiblast-v2.10.0) and protein-sequence database UniRef50 from their respective vendors.
Cite the following publication to acknowledge the use of SAMPDI-3Dv2 in your work:
Rimal, P.; Paul, S.K.; Panday, S.K.; Alexov, E. Further Development of SAMPDI-3D: A Machine Learning Method for Predicting Binding Free Energy Changes Caused by Mutations in Either Protein or DNA. Genes 2025, 16, 101.
https://doi.org/10.3390/genes16010101