Identification of postovulatory coat proteins
Protein sequences from Casey et al [2] were used to search GenBank databases using the tBLASTn algorithm with low-stringency search parameters. The alignments of sequences from Casey et al [2] with an expressed sequence tag [GenBank accession EG617409] derived from the reproductive tract of the brushtail possum is shown in Figure 1. The major sequence from Band 5 (14 kDa) matched closely the translated possum EST, while minor sequences from Bands 3 (22 kDa), 4 (17 kDa) and 5 also showed identity. We named this protein uterine secreted microprotein (USM).
The translated sequence of the possum EST was used to identify exons in the tammar wallaby (Macropus eugenii) whole genome shotgun (WGS) database, which revealed an apparent four-exon structure with an open reading frame spanning Exons 2-4 (Figure 2). A second, more divergent homologue was also identified bioinformatically. We refer to these genes respectively as tammar USM1 and USM2. Conserved exons in the opossum genome were identified as a homologue of USM.
Exon 1 of tammar USM1 was determined by 5' RACE and differed from that of the brushtail possum EST. In the opossum genome, Exons 2-4 of the brushtail possum EST map to Chromosome 1 whereas the "first exon" maps to Chromosome 8, immediately upstream of the third exon of another gene, WASH1. To resolve this discrepancy and to characterise fully the genomic locus of tammar USM1, we isolated and sequenced a tammar genomic BAC clone containing the gene. In a single assembled 89.8-kb contig of BAC sequence [GenBank accession JN251945], no sequence matching the "first exon" of the brushtail possum EST was present in the 17.1 kb upstream of Exon 2, however Exon 1 as identified by 5' RACE was located upstream of Exon 2, as expected. USM2 was located downstream of USM1 in the BAC sequence and in the same orientation. USM2 also contains exons homologous to Exons 1-4 (Figure 3). We conclude that the "first exon" of the brushtail possum EST represents an anomaly or an artefact of cDNA library construction. Downstream of USM2 and in the same orientation as USM1 and USM2 in the BAC sequence, we identified the first 8 exons of ELP3 (Figure 3), which also flanks USM in the opossum genome. This confirmed that opossum USM is orthologous to the tammar USM1/USM2 cluster. However, unlike in the tammar, no duplicate of USM was found at this locus in the opossum.
A signal peptide cleavage site was predicted at the same position in all three species (Figure 2 and additional file 1: Protein_alignment.pdf), strongly indicative of a secretory protein and consistent with a role in the extracellular postovulatory coats. In the brushtail possum, the predicted cleavage site also immediately precedes the major sequence from band 5 (compare Figure 1 and additional file 1: Protein_alignment.pdf), which was obtained by N-terminal sequencing.
In eutherian sequence databases, the highest translated sequence identity with the USM genes was found in orthologues of MSMB and another related gene, MSMP (also called PSMP). USM is not an orthologue of either of these genes, however, as other genes corresponding respectively to orthologues of MSMB and MSMP were identified in both tammar and opossum genomes. Thus USM is a novel mammalian gene that is absent in the eutherian lineage.
The four-exon structure of marsupial USM genes is similar to that of MSMB, including a predicted translation initiation codon in the three 3'-most nucleotides of Exon 1 (Figure 4). USM and MSMB both differ from MSMP, which is comprised of only three coding exons, homologous to USM/MSMB Exons 2-4. Thus for ease of comparison, the exons of MSMP are hereafter referred to according to their homology with USM/MSMB exons. The full coding region and splicing structure of tammar USM1 was confirmed by RT-PCR followed by cloning and sequencing (not shown).
The USM/MSMB/MSMP gene family
To examine the evolution of the MSMB/MSMP/USM gene family, we performed low stringency tBLASTn searches of GenBank databases and identified numerous homologues in vertebrate genomes as well as in those of lower deuterostomes, including Ciona spp. (Urochordata), Branchistoma lanceolatum (Cephalochordata) and Stronglyocentrotus purpuratus (Echinodermata), and of protostomes, including those recently reported in the phyla Mollusca and Rotifera [24]. (See additional file 2: Sequence_sources.pdf for sources of all sequences used in this study.) Most of the identified genes were previously unreported. Alignment of a large number of translated sequences (not shown) suggested a complex pattern of evolution with rapid sequence changes and gene duplication events. As previously reported [23], MSMP showed the strongest conservation among vertebrates. Because of the large number of amino acid substitutions, the phylogenetic relationship between family members from distantly related species was not readily resolved by standard bootstrapping methods, with the exception of MSMP-like genes (not shown). Nevertheless, three broad sub-families appeared to be represented among vertebrates: MSMP-like, MSMB-like and USM-like(see additional file 3: Tree.pdf).
Conserved synteny among MSMB/MSMP/USM family members
To clarify the relationships among MSMB/MSMP/USM family members, we examined their conservation of synteny with flanking genes. We focussed particularly on MSMB-like and USM-like genes as they showed the most sequence variability. The most informative syntenic groups are summarised in Figure 5.
In the opossum genome, USM flanks ARID5A while 14 tandem copies of MSMB flank FAM21C and ANUBL1. In the chicken, 3 tandem copies of MSMB also flank FAM21C and ANUBL1, which are located near ARID5B (a paralogue of ARID5A) on chromosome 6, whereas no MSMB/USM-like gene is located near ARID5A on chromosome 22. By contrast, USM-like genes lie close to ARID5A in the lizard genome and arid5a in the zebrafish genome. This suggests that the same duplication event that generated ARID5A and ARID5B also generated USM-like and MSMB-like genes, respectively. This duplication event can be traced to prior to the divergence of the teleost fish lineage (which has also undergone its own genome duplication event [25]) and is associated with the generation of other paralogous pairs (ANTRX1/ANTRXL and others not shown) that variably cluster with ARID5A/ARID5B in vertebrate genomes (Figure 5). Both of these paralogous syntenic groups variably contain a homologue of PPYR1, with some lineages (such as lizard and zebrafish) containing a homologue in both syntenic groups.
USM homologues in other vertebrates
The presence of paralogous syntenic clusters conserved throughout vertebrates allowed orthologues of USM to be clearly identified. Among non-mammalian tetrapods, the most similar sequence to USM found was from the genome of the green anole lizard (Anolis carolinensis). Multiple copies of USM-related genes in the lizard flank ARID5A, as does opossum USM and a cluster of three USM-like genes in the zebrafish genome (Figure 5). These genes appear to be orthologous with respect to their origin, thus we refer to the lizard genes as USMH1 to -7 (USM homologue 1 to 7) and the zebrafish genes as usmh1 to -3 (USM homologue 1 to 3). They may not have equivalent function to USM, however, as the lizard and zebrafish genes have retained a complete Exon 4 encoding the C-terminal domain, in contrast to marsupial USM genes in which the open reading frame encoding this domain is greatly truncated. They also have four coding exons (Figure 4), which is supported by transcript evidence (see additional file 2: Sequence_sources.pdf). Published cDNA sequences from the Habu snake (Trimeresurus flavoviridis) [26] are similar to the lizard USMH genes. Like the lizard and zebrafish USMH genes, the snake genes are more similar to each other (not shown), suggesting that they also represent a lineage-specific expansion in copy number. These genes encode small serum proteins, SSP1-5, which appear to have a role in protection against the snake's own venom rather than in reproduction [13, 14]. USMH-like sequences were also found in cDNAs derived from mixed tissues of the channel catfish (Ictalurus punctatus). According to the NCBI UniGene EST expression profiles, zebrafish usmh transcripts are found largely in the reproductive tract and consist mostly of usmh2 and usmh3 transcripts. Although it is possible that USMH proteins in other vertebrates also contribute to postovulatory coats, this appears unlikely due to their apparent additional expression in non-reproductive tract tissues. Furthermore, no specifically USM-like sequences were identified in the two sequenced avian genomes, chicken (Gallus gallus) and zebra finch (Taeniopygia guttata), in which conservation of postovulatory coat proteins might be expected. Thus it appears that a USMH gene evolved a novel role in the postovulatory coats of a common mammalian ancestor.
An apparent orthologue of USM is also present in the genome of the platypus (Ornithorhynchus anatinus), based on sequence similarity of Exons 1-3 and proximity to an orthologue of Flj1008, which also flanks Arid5a on mouse chromosome 1 (Figure 5). However platypus Exon 4 could not be identified either manually or using gene prediction software, thus it could not be determined whether it encodes the same C-terminal truncation as marsupial USM. Our failure to detect Exon 4 argues that it probably has a truncated open reading frame and therefore platypus USM is likely to be functionally equivalent to marsupial USM rather than other vertebrate USMH genes.
MSMB paralogues in birds and marsupials
Our phylogenetic analysis of avian MSMB homologues revealed three distinct but previously unrecognised paralogous groups. The three chicken paralogues, which we term avian MSMB1, MSMB2 and MSMB3, flank each other on chromosome 6. Thus unlike in New World monkeys [27], in which multiple copies of MSMB appear to have arisen independently, avian MSMB paralogues are apparently conserved. Furthermore, each avian MSMB paralogue shows high conservation in its translated sequence with its respective orthologues. Previously characterised sequences from chicken and ostrich (Struthio camelus) designated as MSMB [15, 28] correspond to MSMB1, while the gene currently annotated as MSMB by the NCBI "Gene" database http://www.ncbi.nlm.nih.gov/gene?term=msmb%20gallus corresponds to MSMB2. A partial transcript of MSMB3 from chicken [GenBank accession DT655693] is annotated as being derived from reproductive tract ("testis, ovary and oviduct"), while a full transcript from duck (Anas platyrhynchos) [GenBank accession HO188240] was derived from a screen for genes expressed in the epithelium of the magnum (part of the reproductive tract) and correlated with high egg hatchability [29].
Avian MSMB1-3 genes differ markedly from each other in their degree of conservation. Translated sequence identities of the MSMB paralogues of zebra finch (order Passeriformes) were compared with their respective orthologues from Anseriformes (duck) and/or Galliformes (chicken, duck and turkey), the latter two orders forming a monophyletic clade [30]. Conservation is notably higher among MSMB3 orthologues (81-82% amino acid identity) compared with MSMB1 (54-60%) and MSMB2 (53-56%) (see additional file 4: avian_MSMBs.pdf for table of sequence identities and similarities). The pattern was similar when comparing within Galliformes (chicken versus turkey): 83% amino acid identity for MSMB1, 89% for MSMB2 and 98% for MSMB3. Mouse Msmb is more similar to avian MSMB2 (32-36% amino acid identity) than either MSMB1 (25-28%) or MSMB3 (23-25%). These data suggest that avian MSMB3 has acquired a novel, specialised role in birds distinct from that of other vertebrate MSMB homologues. Considering the tissue source of the only two known transcripts, this role is likely to be related to reproduction. Ostrich MSMB1 was originally identified in the pituitary gland [15], supportive of a previously proposed role in the pituitary-gonadal axis [17, 19, 20], although this role was later refuted [31, 32].
In the opossum, we identified fourteen paralogues of MSMB, which we termed MSMB1 to -14, flanking each other on chromosome 1 (Figure 5). Similarly in the tammar, we identified at least ten presumed MSMB paralogues, although not all exons could be identified and their synteny could not be confirmed. One tammar homologue, designated MSMB1, is very similar to opossum MSMB1 and presumably orthologous to it. MSMB1 from tammar and opossum are strongly divergent from the other MSMB homologues of both species and are significantly longer within Exon 3 (not shown). Thus only the MSMB paralogues that flank MSMB1 (presumably in tammar as well as opossum), but not MSMB1 itself, have undergone multiple duplications independently within each lineage.
The above conclusions in birds and marsupials are supported by phylogenetic analysis (Figure 6). Significant (> 70%) bootstrap values were obtained supporting orthology of avian MSMB1, -2 and -3, respectively, between chicken and zebra finch, and of marsupial MSMB1 between opossum and tammar. No marsupial homologues other than MSMB1 showed significant bootstrap values between tammar and opossum, whereas homologues from the same species tended to cluster together within the phylogenetic tree, indicative of lineage-specific duplication events.
MSMB-like genes in protostomes
Protostomal MSMB-like genes were identified mostly from the phylum Mollusca, including bivalves, gastropods and cephalopods, with one sequence from Rotifera. Additional identified transcript sequences were from a cDNA library derived from floral bulbs of Lewis' monkeyflower (Mimulus lewisii), a flowering plant. These are assumed to have arisen from contamination of the floral buds by a terrestrial gastropod (a slug or a snail) (H.D. Bradshaw, pers. comm.).
Phylogenetic analysis of all the protostomal MSMB-like translated sequences did not entirely reflect the species' taxonomic relationship (Figure 7), suggesting that not all the sequences are orthologous to each other. Most notably, the similarity between a sequence from a cephalopod, Euprymna scolopes, and the "Mimulus lewisii" (presumed terrestrial gastropod) sequence is stronger (100% bootstrap) than between any other sequence pairs, including between congeneric species (92% for Loligo spp.; 25% for Mytilus spp.). This suggests that the former sequences represent a gene that is subject to more evolutionary constraints, similarly to MSMP in vertebrates. Indeed, the sequences from Euprymna scolopes and "Mimulus lewisii" appear to share some features that are highly conserved in vertebrate MSMPs, such as a serine-alanine motif near the C-terminus (not shown). Thus it is possible that these two sequences represent distant orthologues of MSMP.
The California sea hare (Aplysia californica; Gastropoda) is the only mollusc currently with a WGS sequencing project. The sea hare MSMB-like sequence is located on the same genomic scaffold (Scaffold 217 of genome build Broad 2.0/aplCal1) as a member of the KLHL (Kelch-like) gene family. KLHL genes also respectively occupy the syntenic groups that include MSMB or USM/USMH in vertebrates (not shown). Together these data suggest a divergence between the MSMP and USM/MSMB lineages in a bilaterian common ancestor.
Predicted tertiary structure of marsupial USM
Almost all USM/MSMB/MSMP family members share a conserved pattern of ten disulfide-forming cysteine residues, whereas marsupial USM has only eight cysteine residues due to a truncated reading frame in Exon 4 (Figure 8a). The disulfide bond pairings of cysteine residues has been partially determined [15] and then later refined [33, 34]. A recent crystallographic analysis of human MSMB [35] showed that the N-terminal domain consists of six β-strands (β1-6) arranged in a Greek key motif, while the C-terminal domain consists of four β-strands (β7-10). Three disulfide bonds (6 cysteine residues) give rigidity within the N-terminal domain and one disulfide bond (2 cysteine residues) gives rigidity within the C-terminal domain. A fifth, single disulfide bond between C37 and C73 links the N-terminal and C-terminal domains. In marsupial USM, the cysteine residue homologous to C37 (= C40 in tammar secreted USM1) is conserved, despite the absence of C73. However, an additional cysteine residue (= C59 in the tammar secreted protein) is present within the short, six-residue C-terminal domain of all marsupial USM orthologues. Modelling of the tammar USM1 tertiary structure showed that this cysteine residue would lie very close to C37 and substitute for the missing C73 (Figure 8b).
The crystal structure of MSMB also revealed a mechanism for dimerisation, whereby the β1 and β10 strands of one molecule lie end-to-end to form a straight edge which lies antiparallel and in contact with the β1 and β10 strands of a second molecule [35]. The involvement of both β1 (N-terminal domain) and β10 (C-terminal domain) strands suggests that dimerisation cannot occur in USM, which lacks sequence homologous to β10. This might be integral to a divergent role for USM compared with MSMB and USMH. However, the molecular masses of bands 3-5 in the original protein gel of [2] (Figure 1), which each contained USM sequence, were estimated by the authors as 22, 17 and 14 kDa, respectively. Bands 3 and 5 are thus approximately three- and two-fold, respectively, the predicted molecular mass of monomeric secreted USM (7 kDa). It thus remains possible that USM can form multimers despite its C-terminal truncation. It is noteworthy that the immunoglobulin-binding property of MSMB may depend on dissociation of dimers to monomers in response to reducing conditions or low pH [35, 36]
The precise role of USM in the marsupial postovulatory coats is an intriguing question considering the various roles that have been proposed for MSMB. While MSMB was first identified almost three decades ago as a component of human seminal plasma with FSH-inhibiting activity [16], there has been a recent resurgence in interest due to a demonstrated genetic link with prostate cancer susceptibility [37–40]. Perhaps more relevant to the present context, MSMB has been shown to inhibit sperm binding and the acrosome reaction [11, 41], suggestive of a possible role in blocking polyspermy in marsupials.
Expression of USM, MSMB and MSMP in tammar tissues
To elucidate distinctions in the roles of USM, MSMB and MSMP in the tammar, RT-PCR was performed on a variety of tissues (Figure 9). USM1 expression was detected solely within endometrium and not in other tissues, including oviduct. This is consistent with a highly specific role for USM1 as a component of the postovulatory coats. MSMB expression was detected in both pituitary gland and testis, while MSMP expression was restricted to testis only. Expression of USM2 was not detected in the tissues tested.
Expression of USM1 was examined by quantitative RT-PCR during gestation (Figure 10). Transcript levels were moderate-to-high during pre-diapause (days 0-7 after birth of previous young), diapause, and early post-diapause (until day 17-18 after blastocyst reactivation by removal of pouch young (RPY)). Levels were higher during diapause than pre-diapause and the earliest post-diapause stages, although these differences were not significant. By contrast, the second peak at around d10-15 RPY was significantly higher than d4-d6 RPY and d20-25 RPY. After d15 RPY there was a rapid reduction in USM1 transcript levels, which coincides with shell breakdown at around d18-19 RPY [7].
The dynamic temporal expression pattern of USM1 during gestation suggests that it may be at least partly regulated by progesterone - indeed we identified a progesterone receptor binding site within the first intron that was conserved in both tammar and opossum (not shown). In the tammar, progesterone receptors are highest at around d5 RPY, together with oestrogen receptors, coinciding with the progesterone and oestrogen pulses that occur at this time [42]. Interestingly this is exactly when USM1 is at its lowest level before increasing again. USM1 expression is also low after d20 of pregnancy (Figure 10), at the time when progesterone concentrations in the corpus luteum [43], in the peripheral circulation [44], and in the utero-ovarian circulation [45] are highest, but progesterone receptor levels are very low [42]. Thus, USM1 expression appears to follow the profile of progesterone receptor levels rather than of progesterone.
USM as a component of the marsupial postovulatory coats
The mucoid layer is deposited during passage through the oviduct whereas shell coat material is secreted from endometrial glands within the uterus and the uterotubal junction. The association of MSMB expression with mucosal epithelia [46] suggests that USM might contribute to the mucin layer, however, the brushtail possum protein bands isolated by Casey et al. [2] were derived from a mixed pool of coats from early cleavage through to late expansion conceptuses. Coats of the former would be expected to include more mucoid coat while the latter would include more shell coat. The much greater volume of shell coat surrounding late-expansion conceptuses [5] suggests that shell coat material would predominate in any mixed pool of samples. Indeed, expression in the endometrium but not the oviduct revealed by our RT-PCR data is consistent with contribution to the shell coat rather than the mucin layer, which forms first in the oviduct. Nevertheless, there may be some overlap in the components of both layers, with the physical differences between the shell and mucin coat due to a subset of components that are specific to one or the other. Immunolocalisation studies may clarify the relative contribution of USM to each layer.
Many of the known properties of MSMB provide clues as to possible role(s) of USM. In marsupials, intimate contact between conceptus and maternal tissues occurs only late in development, after shell breakdown approximately two-thirds of the way through pregnancy [47, 48]. The binding of MSMB to immunoglobulins [12] suggests that USM, if it shares this property, may interact with the maternal immune system to modulate its action. The apparently synchronised down-regulation of tammar USM1 with shell breakdown suggests that the former may be a necessary step for subsequent successful implantation. Alternatively, USM1 down-regulation might facilitate shell-breakdown itself.
USM could also have an immune role in protection against pathogens within the uterus. We have identified a lysozyme as another component of the postovulatory coats (unpublished data) that may have a similar role in protection against bacteria. Such a role for USM would be consistent with the association of MSMB secretion in mucosal tissues [49]. In eutherians, degradation of mucin on the endometrial surface is associated with a window of receptivity to implantation (reviewed [50]). Thus it is possible that similar events, including down-regulation of USM expression, are associated with placental attachment after shell breakdown in marsupials.
Unlike MSMP, which is relatively well conserved, MSMB is characterised by a rapid rate of evolution. Mäkinen et al. [27] noted that among the multiple copies of MSMB in New World monkeys, the second intron is more highly conserved than the exons and there is no bias towards substitutions in the third nucleotide of codons, which normally preserve amino acid identity. Thus it is possible that rapid change in the primary structure of MSMB, excluding the signal peptide and the cysteine residues, is under positive selection. In another study [51], MSMB was identified in a screen for prostate-expressed genes in primates with a high ratio of nonsynonymous to synonymous substitution rate (dN/dS) - a conservative measure of positive selection. USM is similarly highly divergent among the three marsupial species examined and, like MSMB, its divergence might be due to positive rather than neutral selection. It was previously proposed that MSMB prevents immune attack against allogeneic sperm [12]. A possible extension to this idea may be that the same mechanism also serves to reject heterospecific sperm, as MSMB has been shown to bind sperm and act as an inhibitor of sperm motility and the acrosome reaction [10, 11, 41]. Thus rapid evolution of USM (and MSMB) could be implicated in speciation events by preventing hybridisation with closely related species. A recent report [24] showed that male longfin inshore squid (Loligo pealeii) detect an MSMB-like protein, Loligo β-MSP, in the capsule of eggs laid on the sea floor. Loligo β-MSP triggers hostile behaviour in conspecific males towards other male squid, demonstrating a possible role in species recognition. It is not clear whether this reflects only a secondary role for Loligo β-MSP in the egg capsule, but the parallel with USM as a component of the marsupial conceptus coats is intriguing, although in marsupials internal fertilisation and development rules out any role for USM in mate selection by males. However, it remains possible that USM within the reproductive tract helps to ensure fertilisation by only con-specific sperm.
Some eutherians such as rabbit, horse and some carnivores (reviewed [52–54]) also possess various post-ovulatory conceptus coats, such as a mucoid coat, neozona and gloiolemma (rabbit) or a capsule (horse). It is not known whether any components of marsupial and eutherian mucoid coats are homologous, but our thorough searches in both rabbit and horse genome databases suggest that no orthologues of USM are present. However, other components could be homologous. It also cannot be excluded that MSMB or MSMP have acquired an analogous role in the coats of some eutherians by convergent evolution. It is noteworthy that MSMB expression has also been detected in human endometrium [55].
Very few genes have been identified in marsupials that are absent in eutherian genomes [9]. The identification of USM is thus noteworthy and could be highly relevant to understanding the differences in modes of reproduction between these two major mammalian groups. If USM homologues in non-mammalian vertebrates have a different role to marsupial USM, this would suggest that the latter evolved in concert with mammalian viviparity by supporting in utero development. Conversely, the absence of USM in eutherians suggests the evolution of alternate mechanisms supporting in utero development that caused USM to be redundant, for presumably the same reason that the shell coat became redundant.