Question: What is the purpose of the SAAFEC-SEQ server?
Answer:
SAAFEC-SEQ predicts the effect of single amino-acid substitutions on protein thermodynamic stability. Specifically, it predicts the mutation-induced change in protein folding free energy, ΔΔG, using only the protein sequence.
Question: What kind of method is SAAFEC-SEQ?
Answer:
SAAFEC-SEQ is a sequence-based machine-learning method. It uses a gradient boosting decision tree algorithm to predict changes in protein folding free energy caused by single-point missense mutations. Unlike structure-based methods, SAAFEC-SEQ does not require a three-dimensional protein structure.
Question: Does SAAFEC-SEQ require a protein 3D structure?
Answer:
No. SAAFEC-SEQ requires only the amino-acid sequence of the protein. This makes it useful for proteins without experimentally solved structures and for large-scale variant analysis.
Question: What does the predicted ΔΔG value mean?
Answer:
In SAAFEC-SEQ, ΔΔG is defined as:
ΔΔG = ΔGWT - ΔGMT
where
ΔGWT is the folding free energy of the wild-type protein and
ΔGMT is the folding free energy of the mutant protein.
Folding free energy is often a negative quantity. A more negative folding free energy generally indicates a more favorable and more stable folded state, while a less negative value indicates reduced stability.
Question: Why does a negative ΔΔG indicate destabilization in SAAFEC-SEQ?
Answer:
Because SAAFEC-SEQ uses the convention
ΔΔG = ΔGWT - ΔGMT.
For example:
ΔGWT = -10 kcal/mol
ΔGMT = -6 kcal/mol
ΔΔG = -10 - (-6) = -4 kcal/mol
In this example, the mutant has a less negative folding free energy than the wild type. Therefore, the mutant folded state is less favorable, and the mutation is predicted to be
destabilizing.
With the SAAFEC-SEQ convention:
- Negative ΔΔG indicates a destabilizing mutation.
- Positive ΔΔG indicates a stabilizing mutation.
Question: What is the unit of the predicted ΔΔG?
Answer:
The predicted ΔΔG is reported in kcal/mol.
Question: What types of mutations can be analyzed?
Answer:
SAAFEC-SEQ is designed for single-point missense mutations, where one amino acid in the protein sequence is substituted by another amino acid.
Question: What inputs are required for a single-mutation prediction?
Answer:
For a single-mutation prediction, users should provide:
- A protein sequence in FASTA format, either uploaded as a FASTA file or pasted into the input box.
- The wild-type amino acid.
- The mutation position in the submitted sequence.
- The mutant amino acid.
Question: Can I submit multiple mutations in one job?
Answer:
Yes. In batch mode, users can submit one protein sequence and a mutation-list file containing one single-point mutation per line. Each mutation should be specified relative to the submitted protein sequence.
Question: What features does SAAFEC-SEQ use?
Answer:
SAAFEC-SEQ uses sequence-derived and knowledge-based features, including:
- Physicochemical properties of the mutation site.
- Sequence-neighbor features around the mutation position.
- Evolutionary information from position-specific scoring matrix based features.
- Neighbor mutation conservation scores around the mutation site.
Question: What are PsePSSM features?
Answer:
PsePSSM stands for pseudo-position specific scoring matrix. It is used to capture evolutionary information and sequence-order information from the protein sequence. In SAAFEC-SEQ, PsePSSM-based features are among the most important feature groups for prediction.
Question: What are neighbor mutation conservation scores?
Answer:
Neighbor mutation conservation scores describe the evolutionary conservation of the mutation site and nearby residues. SAAFEC-SEQ considers residues surrounding the mutation site to capture how conserved the local sequence environment is.
Question: What physicochemical properties are used by SAAFEC-SEQ?
Answer:
The method uses physicochemical properties related to the mutation site, such as volume, hydrophobicity, flexibility, chemical property, size, polarity, hydrogen-bonding behavior, and mutation type.
Question: What dataset was used to train SAAFEC-SEQ?
Answer:
SAAFEC-SEQ was trained using the S2648 dataset, which contains 2648 experimentally measured single-point missense mutations from 131 proteins. These data were collected from the ProTherm database.
Question: How was SAAFEC-SEQ tested and validated?
Answer:
SAAFEC-SEQ was evaluated using repeated five-fold cross-validation on the S2648 dataset and was further tested on independent datasets, including S350, S276, p53, PTEN, TPMT, and the CAGI5-related datasets.
Question: How accurate is SAAFEC-SEQ?
Answer:
In the SAAFEC-SEQ study, the method achieved a Pearson correlation coefficient of approximately 0.75 and a mean squared error of approximately 0.95 kcal/mol on the S2648 dataset using repeated five-fold cross-validation. It also performed competitively or better than several other sequence-based methods on independent benchmark datasets.
Question: Why are many mutations predicted to be destabilizing?
Answer:
Natural protein sequences have generally evolved to fold into stable functional structures. Because of this, many random amino-acid substitutions are expected to make folding less favorable. Experimental protein-stability datasets, including ProTherm-derived datasets, also contain more destabilizing than stabilizing mutations.
Question: Can SAAFEC-SEQ be used for genome-scale studies?
Answer:
Yes. Since SAAFEC-SEQ requires only protein sequence information and does not require a 3D structure, it is suitable for large-scale or genome-scale studies of missense variants.
Question: What should I check if my mutation is rejected?
Answer:
Please verify that:
- The submitted protein sequence is correct.
- The mutation position corresponds to the submitted sequence numbering.
- The wild-type amino acid in the mutation input matches the amino acid at that sequence position.
- The mutation is a valid single amino-acid substitution.
- The FASTA and mutation-list formats are correct.
Question: Can SAAFEC-SEQ predict effects of insertions, deletions, or multiple simultaneous mutations?
Answer:
No. SAAFEC-SEQ is intended for single-point missense mutations. Insertions, deletions, stop-gain mutations, frameshifts, and combined multi-residue mutations are outside the intended scope of the method.
Question: Can SAAFEC-SEQ predict protein function directly?
Answer:
No. SAAFEC-SEQ predicts the effect of a mutation on protein thermodynamic stability. Although stability changes can influence protein function and disease mechanisms, the predicted ΔΔG should not be interpreted as a direct measurement of protein activity, binding, expression, or pathogenicity.
Question: What should I do if I encounter a problem or discover a bug?
Answer:
Please contact us at
delphi@g.clemson.edu. To help us reproduce the issue, please include the input sequence, mutation information, job id, error message, and, if available, a screenshot.