Skip to main content

Table 3 Ancestral sequence reconstruction accuracy by different programs

From: Reconstruction of ancestral protein sequences and its applications

Root Seq. Tree Leaf Node Num. Methods
    ANCESCON PAML PHYLIP $ PAUP*
    α ML α AB -α +α -α +L +α -L +α +L -α -L -α  
1em2 pii1 25 0.45 0.32 0.35 0.41 0.37 0.29 0.27 0.21 0.29 0.26
1g9o pii1 25 0.56 0.46 0.47 0.53 0.53 0.51 0.54 0.40 0.51 0.47
1rgg pii1 25 0.60 0.42 0.47 0.60 0.62 0.47 0.58 0.32 0.56 0.47
1sgt pii1 25 0.38 0.34 0.33 0.33 0.32 0.32 0.33 0.27 0.33 0.32
1zm2 pii1 25 0.33 0.29 0.3 0.28 0.25 0.21 0.25 0.21 0.27 0.16
2a8v pii1 25 0.62 0.45 0.42 0.56 0.55 0.44 0.46 0.28 0.50 0.36
2ctb pii1 25 0.53 0.40 0.39 0.41 0.38 0.24 0.24 0.21 0.29 0.22
Average accuracy 0.496 0.383 0.390 0.446 0.431 0.354 0.381 0.271 0.393 0.323
2ctb gef 27 0.54 0.37 0.38 0.35 0.35 0.29 0.17 0.24 0.22 0.22
2ctb LacI 54 0.66 0.64 0.57 0.44 0.37 0.49 0.35 0.42 0.33 0.34
2ctb pdz 39 0.54 0.41 0.42 0.44 0.39 0.22 0.34 0.18 0.32 0.22
2ctb ph 30 0.79 0.74 0.75 0.53 0.55 0.45 0.25 0.43 0.37 0.32
2ctb pii1 25 0.53 0.40 0.39 0.41 0.38 0.24 0.24 0.21 0.29 0.22
2ctb ptb 29 0.58 0.39 0.43 0.39 0.38 0.29 0.23 0.26 0.24 0.23
2ctb sh2 34 0.61 0.42 0.40 0.43 0.40 0.30 0.22 0.20 0.27 0.22
2ctb sh3 43 0.83 0.82 0.80 0.62 0.55 0.69 0.45 0.66 0.46 0.54
2ctb GST 140 0.76 0.73 0.73 @ @ # # 0.47 0.38 0.33
Average accuracy& 0.635 0.524 0.518 0.451 0.421 0.371 0.281 0.325 0.313 0.289
1em2 pdz 39 0.45 0.35 0.36 0.44 0.44 0.29 0.43 0.23 0.4 0.24
1g9o pii1 25 0.56 0.46 0.47 0.53 0.53 0.51 0.54 0.40 0.51 0.47
1rgg sh2 34 0.64 0.48 0.46 0.61 0.61 0.56 0.59 0.34 0.6 0.41
1sgt gef 27 0.49 0.39 0.40 0.48 0.44 0.42 0.44 0.36 0.45 0.41
1zm2 ptb 29 0.66 0.47 0.48 0.57 0.57 0.53 0.51 0.32 0.52 0.41
2a8v ph 30 0.81 0.78 0.81 0.71 0.74 0.60 0.61 0.50 0.65 0.50
2ctb LacI 54 0.66 0.64 0.57 0.44 0.37 0.49 0.35 0.42 0.33 0.34
Average accuracy 0.610 0.510 0.507 0.540 0.529 0.486 0.496 0.367 0.494 0.397
ProbabilityΔ   0.0026 0.0023 0.0248 0.0328 0.0007 0.0168 0.0001 0.0143 0.0005
  1. All root sequences are taken from PDB database and the names listed in the table are PDB IDs.
  2. Tree topologies for gef (guanine nucleotide exchange factor), LacI (PurR/LacI family of bacterial transcription factors), pdz, ph, pii1 (a signal transduction protein), ptb, sh2, sh3 and GST (glutathione S-transferase) are inferred from multiple sequence alignments chosen from Pfam database (version 7.3).
  3. All tree topologies are generated from real alignments and the distances are rescaled in order to make the trees comparable.
  4. The value in this table represents the accuracy of reconstruction, i.e. the fraction of correctly reconstructed sites for the root sequence. The best reconstruction accuracy in each test is shown in bold.
  5. α ML means that the site-specific rate factors were estimated by maximum likelihood method.
  6. α AB means that the site-specific rate factors were estimated by our empirical equation based on the given alignment (for details see Methods).
  7. -α means that the rate factors were not considered in reconstruction.
  8. +α means that the rate factors were considered in reconstruction.
  9. +L means that branch lengths of the input tree were used in reconstruction, while -L means that branch lengths were estimated by the reconstruction program itself.
  10. @: tree topology for GST had 140 leaf nodes that were too many for PAML to run through.
  11. $: rate factors estimated by PAML were used by PHYLIP in ancestral sequence reconstruction.
  12. #: tree topology for GST had 140 leaf nodes, which were too many for PAML to estimate rate factors for GST.
  13. &:GST is excluded in calculation of the average.
  14. Δ: paired t-test method [40] was used to estimate the one-tail probability between ANCESCON and the other three reconstruction methods.