compbio-logo

SAMPDI-3Dv2: Predicting protein-DNA binding free energy change upon mutations

Emil Alexov Group

compbio-logo

Question: What is the purpose of the SAMPDI-3Dv2 server?
Answer: SAMPDI-3Dv2 is a new version of our SAMPDI-3D machine learning based method for predicting the effect of mutation in protein or DNA on the binding affinity of protein-DNA complex. It is trained on a larger (approx 42\% larget dataset with mutations in proteins) and approx 9\% larger dataset of mutations in DNA using an exapnded set of features for both protein and DNA. It is a very fast and currently outperformes all state-of-the-art methods, which suites best for the genome level protein-DNA interaction investigations.

Question: What is methodology used in the SAMPDI-3Dv2 server?
Answer: Similar to SAMPDI-3D, SAMPDI-3Dv2 also uses a gradient boosting decision tree machine learning algorithm with features as physicochemical properties, structure of mutation site, protein-DNA interactions, mutation induced perturbations to protein-DNA interactions and evolutionary composition preference and conservation of the mutation site to predict the change of binding free energy.

Question: What kind of mutations effect can be predicted using the SAMPDI-3Dv2?
Answer: The SAMPDI-3Dv2 webserver is designed to predict effect one of the following two types of mutations at a time.
  • Protein mutation: mutation of a single amino acid in the protein
  • DNA mutation: mutation of a base/base-pair of single-stranded/double-stranded DNA.

Question: What inputs are required to predict the effect of protein mutation(s) using SAMPDI-3Dv2?
Answer: To predict the effect of protein mutation(s) using SAMPDI-3Dv2, user required to provide following inputs in single and batch(multiple) modes.
Single mode
  • A protein-DNA complex structure in PDB v3.0 file format.
  • A string generated by concatenating the chain-ids of the protein-DNA complex biological assembly, all other chains are ignored. For exaample if the PDB has chains: A, B, C, D, E and F, while chains: A,C and D are involded in protein-DNA complex when one should type 'ACD' in the input field.
  • A jobname which may be useful in keeping track of calculations while working on a dataset of mutations.
  • Mutation information: chain in which the residue to be mutated exists, its residue id in structure, name of the residue and the mutated residue.
Multiple/batch mode
  • A protein-DNA complex structure in PDB v3.0 file format.
  • A string generated by concatenating the chain-ids of the protein-DNA complex biological assembly, all other chains are ignored. For exaample if the PDB has chains: A, B, C, D, E and F, while chains: A,C and D are involded in protein-DNA complex when one should type 'ACD' in the input field.
  • A jobname which may be useful in keeping track of calculations while working on a dataset of mutations.
  • Mutations information: a text (.txt) file containng one mutation per line. The mutation chain in which the residue to be mutated exists, wildtype residue code (single letter code), its residue id in structure, mutated residue (single letter code).

Question: What inputs are required to predict the effect of DNA mutation(s) using SAMPDI-3Dv2??
Answer: To predict the effect of DNA mutation(s) using SAMPDI-3Dv2, user required to provide following inputs in single and batch(multiple) modes.
Single mode
  • A protein-DNA complex structure in PDB v3.0 file format.
  • A string generated by concatenating the chain-ids of the protein-DNA complex biological assembly, all other chains are ignored. For exaample if the PDB has chains: A, B, C, D, E and F, while chains: A,C and D are involded in protein-DNA complex when one should type 'ACD' in the input field.
  • A jobname which may be useful in keeping track of calculations while working on a dataset of mutations.
  • Mutation information: chain in which the forward strand base to be mutated exists, residue ids of forward & backward strand base-pairs in structure(pair of single letter codes), name of the residue and the mutated base-pair.
Multiple/batch mode
  • A protein-DNA complex structure in PDB v3.0 file format.
  • A string generated by concatenating the chain-ids of the protein-DNA complex biological assembly, all other chains are ignored. For exaample if the PDB has chains: A, B, C, D, E and F, while chains: A,C and D are involded in protein-DNA complex when one should type 'ACD' in the input field.
  • A jobname which may be useful in keeping track of calculations while working on a dataset of mutations.
  • Mutations information: a text (.txt) file containng one mutation per line. The mutation chain in which the forward strand base to be mutated exists, wildtype base-pairs (pair of single letter code), its residue id in structure, mutated base-pair.

Question: How much time SAMPDI-3Dv2 takes to predict the effect of single mutation?
Answer: Most often our SAMPDI-3Dv2 webserver predicts effect of single DNA mutation within a few seconds. However, for mutations in protein, the processing time can be several minutes per protein-DNA complex. Whenever there are multiple single-amino acids mutations in protein of a single protein-DNA complex using the multiple-mutation/batch mode is recommended to predict them all in one go and save time.

Question: How does SAMPDI-3Dv2 treats non-standard amino acids and/or small molecules or ions if any are present in the complex structure?
Answer: SAMPDI-3Dv2 will ignore non-standard amino acids or small molecules while predicting binding free energy change due to mutation in DNA, the inclusion of them in the structure will not affect the prediction result. However, in case predicting binding free energy change due to mutation in protein, some commonly occuring non-standard amino acids will be replaced by their parent amino acids as preprocessing step by the SAMPDI-3Dv2 and pre-processed structure will be used for making prediction.

Question: What shall I do if i find some problem using SAMPDI-3Dv2 or apperently I discover a bug?
Answer: In any such case kindly contact us at delphi@g.clemson.edu. We will do our best to resolve your query as soon as possible.

 
Copyright © Computational Biophysics and Bioinformatics - Emil Alexov Group.