MBC logo Molecular Bioinformatics Center
  National Chiao Tung University  
Home About (PS)2 CASP8 Sample output      
(PS)2-v2: Protein Structure Prediction Server
 
Documentation for (PS)2
 
Overview

    (PS)2 is an automatic homology modeling server. The method uses a new substitution matrix, S2A2, that combines both sequence and secondary structure information for the detection of homologous proteins with remote similarity and the target-template alignment. The final three dimensional structure is built using the modeling package MODELLER. After generated a predicted model, the programs ProQ and ProQres were used to evaluate the quality of this model based on the LGscore and MaxSub scores. Finally, the predicted model was displayed by AstexViewer and automatically sent to users.



Figure 1. Overview of the (PS)2 server using the protein sequence of telomere replication protein Est3 in Saccharomyces cerevisiae as query. (A) Input format of the (PS)2 server. (B) Search results of a query protein, comprising target name, sequence, predicted secondary structure, the graph of the aligned regions and the hits list of the templates of the query. (C) The selected template, target-template alignment and prediction structure of Est3. (D) The visualization of the predicted structure for Est3. (E) The model-quality evaluation.


Template(s) selection

  • Option "Automatic": Server will actomatically select the modelling template(s).
  • Option "Manual": Users can select the modeling template(s) by themself.
  • Option "Use this template": Users can use a specific PDB as the tempalete.


    S2A2 substitution matrix

        The S2A2 is a 60x60 substitution matrix based on secondary structure propensities of 20 amino acids. It is an effective substitution matrix for the detection of remote homologs and target-template alignment.


    Figure 2. The S2A2 substitution matrix.


    MODELLER

        MODELLER is used for homology or comparative modeling of protein three-dimensional structures. The user provides an alignment of a sequence to be modeled with known related structures and MODELLER automatically calculates a model containing all non-hydrogen atoms.

  • Sali A & Blundell TL: Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 1993, 234: 779-815.


    ProQ

        ProQ was proposed by Wallner. It is a neural network based predictor that based on a number of structural features predicts the quality of a protein model. ProQ is optimized to find correct models in contrast to other methods which are optimized to find native structures. Two quality measures are predicited LGscore and MaxSub.
    Different ranges of quality: 
    Correct           Good              Very good
    LGscore > 1.5     LGscore > 3       LGscore > 5
    MaxSub  > 0.1     MaxSub  > 0.5     MaxSub  > 0.8
  • Wallner B & Elofsson A: Can correct protein models be identified?. Protein Sci. 2003, 12: 1073-1086.


    ProQres

        ProQres was proposed by Wallner. It is a neural network based predictor that based on a number of structural features predicts the quality of different parts of a protein model. The quality ranges from 0 for to 1 for a perfect prediction. The predicted scores are the S-score=1/(1+(rmsd/5)2) for each residue. The sum of this score is used MaxSub, LGscore and TM-score.

  • Wallner B. & Elofsson A.: Identification of correct regions in protein models using structural, alignment, and consensus     information. Protein Sci. 2005, 15: 900-913.


    Glossary

    SW-score
        SW-score is reported as the Smith-Waterman score. It is a row alignment score which is calculated as the sum of substitution (S2A2-matrix) and gap scores.

    Bit-score
        The value bit-score is derived from the raw alignment score S (SW-score) in which the statistical properties of the scoring system used have been taken into account. Because bit-scores have been normalized with respect to the scoring system, they can be used to compare alignment scores from different searches.

    E-value
        Expectation value. The number of different alignents with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E-value, the more significant the score.

    GDT_TS score
        The Global Distance Test Total Score (GDT_TS) of Ca atoms was used to assess the correctness of the predicted model. GDT_TS has been commonly used in modeling studies and in the CASP community. GDT_TS is defined as

                

    where N in the total number residues of a target, GDTd is the number of aligned residues whose Ca-atom distance between the native structure and predicted model is less than d A (angstrom) after superposition of the two structures; and d is 1, 2, 4, and 8 A (angstrom).

  • Zemla A: LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003, 31: 3370-3374.

    Model reliability
        Figure 3 shows the correlation between E-values and GDT_TS scores for 121 targets in CASP8 and the Pearson correlation coefficient is 0.65. According to GDT_TS scores, our server often yields reliable predicted structures (i.e. GDT_TS score >= 60%) if the E-value <= 10-2.
                
    Figure 3. The correlation between E-values and GDT_TS scores for 121 targets in CASP8. Our server often yields reliable predicted structures if the E-value is less than 10-2.
  •  
                      Contact
    Chen CC, Hwang JK, Yang JM: (PS)2-v2: template-based protein structure prediction server. BMC Bioinformatics 2009, 10:366