Skip to main content

Table 3 Ancestral sequence reconstruction accuracy by different programs

From: Reconstruction of ancestral protein sequences and its applications

Root Seq.

Tree

Leaf Node Num.

Methods

   

ANCESCON

PAML

PHYLIP $

PAUP*

   

α ML

α AB

-α

+α

-α

+L +α

-L +α

+L -α

-L -α

 

1em2

pii1

25

0.45

0.32

0.35

0.41

0.37

0.29

0.27

0.21

0.29

0.26

1g9o

pii1

25

0.56

0.46

0.47

0.53

0.53

0.51

0.54

0.40

0.51

0.47

1rgg

pii1

25

0.60

0.42

0.47

0.60

0.62

0.47

0.58

0.32

0.56

0.47

1sgt

pii1

25

0.38

0.34

0.33

0.33

0.32

0.32

0.33

0.27

0.33

0.32

1zm2

pii1

25

0.33

0.29

0.3

0.28

0.25

0.21

0.25

0.21

0.27

0.16

2a8v

pii1

25

0.62

0.45

0.42

0.56

0.55

0.44

0.46

0.28

0.50

0.36

2ctb

pii1

25

0.53

0.40

0.39

0.41

0.38

0.24

0.24

0.21

0.29

0.22

Average accuracy

0.496

0.383

0.390

0.446

0.431

0.354

0.381

0.271

0.393

0.323

2ctb

gef

27

0.54

0.37

0.38

0.35

0.35

0.29

0.17

0.24

0.22

0.22

2ctb

LacI

54

0.66

0.64

0.57

0.44

0.37

0.49

0.35

0.42

0.33

0.34

2ctb

pdz

39

0.54

0.41

0.42

0.44

0.39

0.22

0.34

0.18

0.32

0.22

2ctb

ph

30

0.79

0.74

0.75

0.53

0.55

0.45

0.25

0.43

0.37

0.32

2ctb

pii1

25

0.53

0.40

0.39

0.41

0.38

0.24

0.24

0.21

0.29

0.22

2ctb

ptb

29

0.58

0.39

0.43

0.39

0.38

0.29

0.23

0.26

0.24

0.23

2ctb

sh2

34

0.61

0.42

0.40

0.43

0.40

0.30

0.22

0.20

0.27

0.22

2ctb

sh3

43

0.83

0.82

0.80

0.62

0.55

0.69

0.45

0.66

0.46

0.54

2ctb

GST

140

0.76

0.73

0.73

@

@

#

#

0.47

0.38

0.33

Average accuracy&

0.635

0.524

0.518

0.451

0.421

0.371

0.281

0.325

0.313

0.289

1em2

pdz

39

0.45

0.35

0.36

0.44

0.44

0.29

0.43

0.23

0.4

0.24

1g9o

pii1

25

0.56

0.46

0.47

0.53

0.53

0.51

0.54

0.40

0.51

0.47

1rgg

sh2

34

0.64

0.48

0.46

0.61

0.61

0.56

0.59

0.34

0.6

0.41

1sgt

gef

27

0.49

0.39

0.40

0.48

0.44

0.42

0.44

0.36

0.45

0.41

1zm2

ptb

29

0.66

0.47

0.48

0.57

0.57

0.53

0.51

0.32

0.52

0.41

2a8v

ph

30

0.81

0.78

0.81

0.71

0.74

0.60

0.61

0.50

0.65

0.50

2ctb

LacI

54

0.66

0.64

0.57

0.44

0.37

0.49

0.35

0.42

0.33

0.34

Average accuracy

0.610

0.510

0.507

0.540

0.529

0.486

0.496

0.367

0.494

0.397

ProbabilityΔ

 

0.0026

0.0023

0.0248

0.0328

0.0007

0.0168

0.0001

0.0143

0.0005

  1. All root sequences are taken from PDB database and the names listed in the table are PDB IDs.
  2. Tree topologies for gef (guanine nucleotide exchange factor), LacI (PurR/LacI family of bacterial transcription factors), pdz, ph, pii1 (a signal transduction protein), ptb, sh2, sh3 and GST (glutathione S-transferase) are inferred from multiple sequence alignments chosen from Pfam database (version 7.3).
  3. All tree topologies are generated from real alignments and the distances are rescaled in order to make the trees comparable.
  4. The value in this table represents the accuracy of reconstruction, i.e. the fraction of correctly reconstructed sites for the root sequence. The best reconstruction accuracy in each test is shown in bold.
  5. α ML means that the site-specific rate factors were estimated by maximum likelihood method.
  6. α AB means that the site-specific rate factors were estimated by our empirical equation based on the given alignment (for details see Methods).
  7. -α means that the rate factors were not considered in reconstruction.
  8. +α means that the rate factors were considered in reconstruction.
  9. +L means that branch lengths of the input tree were used in reconstruction, while -L means that branch lengths were estimated by the reconstruction program itself.
  10. @: tree topology for GST had 140 leaf nodes that were too many for PAML to run through.
  11. $: rate factors estimated by PAML were used by PHYLIP in ancestral sequence reconstruction.
  12. #: tree topology for GST had 140 leaf nodes, which were too many for PAML to estimate rate factors for GST.
  13. &:GST is excluded in calculation of the average.
  14. Δ: paired t-test method [40] was used to estimate the one-tail probability between ANCESCON and the other three reconstruction methods.