Skip to main content

Table 1 Sequence and cluster data for each taxon

From: Inferring angiosperm phylogeny from EST data with widespread gene duplication

Taxona

Releaseb

Original TCsc

MaxORFsd

Clusterse

Final TCsf

Arabidopsis thaliana

12.1

28900

23737

343

729

Glycine max

12.0

31928

13930

538

1065

Lotus japonicus

3.0

12485

3116

365

452

Medicago truncatula

8.0

18612

12254

528

852

Oryza sativa

16.0

36381

25842

199

418

Pinus g

6.0

23531

13949

159

315

Solanum tuberosum

10.0

21063

12625

378

705

Total

 

172900

105453

577

4536

  1. a Taxon as given by TIGR for the EST collection assembled in the Gene Index Database.
  2. b Versions used in this paper, current as of 18 February 2006.
  3. c The 363,971 sequences in the database for these taxa were screened to include only those sequences assembled by TIGR into Tentative Consensus (TC) sequences.
  4. d TCs were trimmed to the largest sense-direction ORF that was at least 500 nt in length; shorter sequences were discarded.
  5. eNumber of clusters in which the taxon is represented, after screening for phylogenetic informativeness (at least three taxa and at least four sequences).
  6. f Total number of sequences from each taxon in the final set of clusters.
  7. gTIGR assembled this library from several species of Pinus.