Skip to main content

Table 1 Validation of our methodology on 10 deep phytogeny problems. Organism abbreviations are shown in Table 3, and the accepted clades are shown with parentheses. The column labeled "# Clades" gives the number of accepted clades to be found. The column labeled "# Genes" gives the number of genes used. The Trees column gives the number of gene trees that find all the accepted clades; results for representative proteins are on the left, and results for randomly picked ubiquitous proteins are on the right. For each gene, the most conserved 300-residue sequence was used, and randomly picked proteins were matched to the representative proteins in overall conservation level. Consensus gives the number of accepted clades found over all gene trees; an asterisk indicates that the consensus tree (computed using CONSENSE from the PHYLIP package [52]) finds all the accepted clades. Concatenation gives the number of clades found in 100 bootstraps from a concatenated alignment of all genes; an asterisk here indicates the success of the consensus over bootstrap trees. In problem 6 for example, there are 5 accepted clades, 8 single-gene trees, and 100 bootstrap trees, so a perfect "Consensus" score would be 40, and a perfect "Concatenation" score would be 500.

From: Automatic selection of representative proteins for bacterial phylogeny

Organisms

# Clades

# Genes

Trees

Consensus

Concatenation

1. (Borr, Trep) (Chlor, Bac) (Campy, Bruc)

3

8

8*

2

24*

12

299

112

2. (Neiss, Rals) (Xyl, Haem) (Rick, Meso)

3

8

5*

3

21*

19

247

207

3. (Clost, Lacto) (Mycob, Bifid) (Campy, Rick)

3

8

6*

4*

18*

18*

294

283

4. (Buch, Rick) (Mycob, Bifid) (Staph, Mycop)

3

8

2

1*

13

15*

235

297

5. (Urea, Mycop) (Strep, Lacto) (Staph, List)

3

8

8*

5*

24*

21*

300

300

6. (Syn, Pro) (Rick, Buch) (Chlor, Bac) (Staph, Strep) (Borr, Trep)

5

8

7*

2*

37*

26*

481

472

7. ((Rick, Bruc) ((Vib, Esch, Haem), Neiss) (Heli, Campy)) (Syn, Pro) (Clost, Staph) (Borr, Trep)

8

17

3*

3

129*

108

762

741

8. ((Caul, Meso), Esch) (Chlor, Bac) (Pro, Nos)

4

8

7*

3*

30*

27*

400

398

9. ((Geo, Desulf), (Wol, Campy), (Caul, Rick)) (Borr, Lep) (Chlor, Bac)

6

8

1

2

31*

32

554

512

10. (Chlor, Bac) (Mycop, Strep, Clost) (Mycob, Bifid)

3

8

1*

2*

15*

13

255

245