Skip to main content

Table 1 Details of FlowerPower validation dataset.

From: FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function

Seed ID

Seed length

Domain Architecture

Total GH

Total NH

Total IND

ARGA_ECOLI

443

AA_kinase (26–269); Acetyltransf_1 (338–414)

26

898420

3559

BIR5_HUMAN

142

BIR (18–88)

23

901954

28

BLK_MOUSE

498

SH3_1 (54–109); SH2 (117–198); Pkinase (234–486)

76

893470

8459

CRKL_MOUSE

303

SH2 (14–88); SH3_1 (126–181); SH3_2 (239–294)

15

900640

1350

I1BC_HUMAN

404

CARD (2–91); Peptidase_C14 (163–401)

33

901717

255

MY88_MOUSE

296

Death (31–109); TIR(163–292)

4

897880

4121

NARL_ECOLI

216

Response_reg (7–128); GerE (153–210)

1757

884810

15438

PNP_ECOLI

711

RNase_PH (12–144); RNase_PH_C (147–211); PNPase (242–320); RNase_PH (323–456); RNase_PH_C (459–529); KH_1 (555–612); S1 (618–690)

121

885592

16292

SPOP_HUMAN

374

MATH (38–163); BTB (190–297)

68

901237

700

  1. SwissProt identifiers are shown in the first column. The number of global homologs (GH; proteins sharing the same domain structure) for a seed was determined by the total number of proteins in the SwissPFAM dataset that shared the same domain structure as the seed. Non-homologs (NH) are those having an obviously different domain structure. Indeterminate sequences (IND) are those whose global homology to the seed or lack thereof could not be rigorously determined. See Methods for details.