Globins in the marine annelid Platynereis dumerilii shed new light on hemoglobin evolution in bilaterians

Background How vascular systems and their respiratory pigments evolved is still debated. While many animals present a vascular system, hemoglobin exists as a blood pigment only in a few groups (vertebrates, annelids, a few arthropod and mollusk species). Hemoglobins are formed of globin sub-units, belonging to multigene families, in various multimeric assemblages. It was so far unclear whether hemoglobin families from different bilaterian groups had a common origin. Results To unravel globin evolution in bilaterians, we studied the marine annelid Platynereis dumerilii, a species with a slow evolving genome. Platynereis exhibits a closed vascular system filled with extracellular hemoglobin. Platynereis genome and transcriptomes reveal a family of 19 globins, nine of which are predicted to be extracellular. Extracellular globins are produced by specialized cells lining the vessels of the segmental appendages of the worm, serving as gills, and thus likely participate in the assembly of a previously characterized annelid-specific giant hemoglobin. Extracellular globin mRNAs are absent in smaller juveniles, accumulate considerably in growing and more active worms and peak in swarming adults, as the need for O2 culminates. Next, we conducted a metazoan-wide phylogenetic analysis of globins using data from complete genomes. We establish that five globin genes (stem globins) were present in the last common ancestor of bilaterians. Based on these results, we propose a new nomenclature of globins, with five clades. All five ancestral stem-globin clades are retained in some spiralians, while some clades disappeared early in deuterostome and ecdysozoan evolution. All known bilaterian blood globin families are grouped in a single clade (clade I) together with intracellular globins of bilaterians devoid of red blood. Conclusions We uncover a complex “pre-blood” evolution of globins, with an early gene radiation in ancestral bilaterians. Circulating hemoglobins in various bilaterian groups evolved convergently, presumably in correlation with animal size and activity. However, all hemoglobins derive from a clade I globin, or cytoglobin, probably involved in intracellular O2 transit and regulation. The annelid Platynereis is remarkable in having a large family of extracellular blood globins, while retaining all clades of ancestral bilaterian globins.


Background
The exchanges of gas, nutrient and waste relying on diffusion are impaired when body size and tissue thickness increase. Most animals develop at least one type of circulatory system to circumvent this limitation. Vascular Open Access *Correspondence: guillaume.balavoine@ijm.fr 1 Institut Jacques Monod, Université de Paris / CNRS, UMR7592, Paris, France Full list of author information is available at the end of the article systems are diverse, representing different solutions to the same purpose in animals with varied body plans.
One major function of the blood vascular system is performing gas exchanges, bringing dioxygen to the tissues and taking back waste products (eg. CO 2 ). To perform this respiratory function efficiently, many species use respiratory pigments that bind dissolved gases in a cooperative way and considerably increase their solubility in the blood or hemolymph. A general picture of the nature of circulating respiratory pigments used in bilaterians (Fig. 1) suggests a complex evolutionary history.
Among the circulating respiratory pigments, hemoglobins in particular display a complex situation and the evolution of globins is still not understood in depth. Globins are ancient proteins present in all groups of living organisms [1]. They are characterized by a unique molecular structure, the "globin fold" made of eight alpha helices which shelters a heme prosthetic group, itself responsible for binding diatomic gases such as O 2 or NO. The globin motif domain is sometimes found in composite proteins that have evolved by fusion of pre-existing protein domains but all respiratory "hemoglobins" derive from proteins with a stand-alone globin motif. The oxygen transport function is well described in specific groups (vertebrates, annelids) but the globin superfamily is prevalent across the animal kingdom and Fig. 1 Distribution of circulatory systems and respiratory pigments in the metazoan consensus tree. Some species have no known respiratory pigments despite having a circulatory system or at least a fluid filled cavity (nematodes, most echinoderms, urochordates, cephalochordates). It is assumed that in these species, gases either diffuse freely though thin layers of tissues or are freely dissolved in the hemolymph. Some groups have circulating hemoglobins ("red blood"). The status of these hemoglobins is very diverse. They can be extracellular hemoglobins dissolved in the blood, as in many annelids. They can be contained in red cells as in the vertebrates or in some annelids such as capitellids [25]. Some groups have circulating dissolved hemerythrins ("pink blood"), like priapulids, brachiopods and some annelids (Sipunculidae, Magelonidae) [73][74][75]. Mollusks on one side and many arthropods on the other side have circulating dissolved hemocyanins ("blue blood") of different molecular origins [76]. Circulating respiratory pigments of different types are generally not present together in the blood, suggesting that the recruitment of each type of pigment for the respiratory function occurred multiple times independently in the evolution of bilaterians [76]. HG hemoglobin, HC hemocyanin, mHC molluscan "hemocyanin", HE hemerythrin, BVS blood vascular system, CCS coelomic circulatory system. Synthetic information on respiratory pigments can be found in [77] the functions of globins are still poorly known [2]. Many phylogenetic studies have focused on the hemoglobins of mammals and vertebrates in general, starting with the seminal work of Zuckerkandl and Pauling [3]. At a much larger phyletic scale, there has been studies on the evolution of globins at the level of the tree of life [1,4], and at the level of the eukaryotic tree [5]. These studies emphasize the universality of the globin fold motif present since the last common ancestor of all living cells (LUCA, last universal common ancestor) and the likely functions of these proteins as enzymes or sensors. Within metazoans, many works have analyzed globin evolution in specific metazoan groups such as annelids [6,7], deuterostomes [8], echinoderms [9], pancrustaceans [10], insects [11], cephalochordates [12], chordates [13] or agnathans [14,15].
However, a smaller number of studies have focused on the evolution of globins at the metazoan scale [16][17][18][19]. These studies have shown that vertebrate hemoglobins and the related myoglobins are only a small branch in a vast tree of animal globins, some of which are used for respiratory functions but many others have still unknown functions.
One of these sub-families, the neuroglobins (Ngb) are hexacoordinate intracellular globins first discovered in the mammalian nervous system and whose function(s) remain poorly understood [20][21][22]. Neuroglobin orthologues have been discovered in a number of animal groups other than vertebrates as well as in choanoflagellates. Expressions of neuroglobins in mammals, cnidarians and the acoel Symsagittifera [18] all occur in neural or sensory cells.
Neuroglobins are proposed to be the most ancient clade of globins occurring in animals and hemoglobins have been suggested to be derived from these ancestral neuroglobins by independent, convergent functional divergence [18].
Another study [23] has identified globin X. Globin X proteins are widespread among metazoans, present in teleost fishes and amphibians but not in amniotes. Another panmetazoan clade of globins but missing in all vertebrates, called X-like, is also identified [16]. Globin X proteins are expressed in neural cells in vertebrates [24]. Globins X and X-like often present one or two acylation sites at the N-terminus, either myristoylation or palmitoylation, suggesting that these proteins are linked to cell membranes. The functions of globin-X proteins are not known, although a role in protecting the cell from reactive oxygen species (ROS) is suggested. Blank and Burmester [16] suggest that the blood globins of vertebrates may derive from ancient membrane-bound globins, via the loss of acylation sites, opening, among other things, the possibility to form multimers.
Last, a broad study on a large number of marine species transcriptomes including many annelids focused on the question of extracellular globins [19]. Extracellular globins form soluble hemoglobin multimers. Extracellular globins had been known in a small number of bilaterian groups and were thought to have evolved independently in these lineages. Authors claim that extracellular globins similar and orthologous to the ones already well known in many annelids can be identified in Echinodermata, Hemichordata, Brachiopoda, Mollusca, Nemertea, Bryozoa, Phoronida, Platyhelminthes, and Priapulida. They come to the conclusions that extracellular hemoglobin may exist in many more bilaterian groups than previously thought and that the last common ancestor of bilaterians may have possessed this hemoglobin.
Despite bringing forward interesting conceptions, these previous works on metazoan globins all suffer from biased sampling. Gene sequences are identified from animals whose complete genome was not available at the time of the study and therefore represent only a subset of the diversity that must exist in metazoans. The sampling of globins of Blank and Burmester [16] was made by searching NCBI Genbank with terms related to globins, while the same database was screened with a Ngb probe in Lechauve et al. [18]. Belato et al. 's study [19] focuses only on extracellular globins and uses only transcriptomes. A larger sampling of complete metazoan genome sequences is now available and allows an update.
In this work, we explored the globin content of the genome of the marine annelid Platynereis dumerilii. Many annelids, including Platynereis, are remarkable for the similarities of their blood vascular system (BVS) with the vertebrate BVS. Annelid BVS are usually closed. Annelid blood vessels, devoid of endothelium, are located in between spacious pairs of coelomic cavities in each segment and have a metameric organization along the trunk of the animal. The blood of many annelids is red, containing a high concentration of respiratory hemoglobin. In contrast to vertebrates, this circulating hemoglobin is often extracellular instead of enclosed in red blood cells. Only a few annelid species have circulating red blood cells with intracellular hemoglobin [25]. The extracellular globin structure is very different from the hemoglobin of vertebrates. Instead of a heterotetramer of globin sub-units, many annelids possess a giant hexagonal bilayer hemoglobin molecule (HBL-Hb, originally referred to as "erythrocruorin"), organized in an assemblage of no less than 144 individual globin peptides with the help of linker proteins [26][27][28]. Platynereis possesses red blood indicating that it is composed of dissolved hemoglobin. Despite its status of emergent model animal for evolution and development studies [29][30][31], the composition of the blood of Platynereis has not been yet characterized, although it can be inferred that they possess the annelid extracellular giant hemoglobin. We identified the multigenic family encoding for the globins in Platynereis publicly available sequenced genomes and transcriptomes. We combined in situ hybridizations, electron microscopy and 3D reconstructions to identify and characterize the cells that produce these extracellular globins.
We analyze the evolutionary relationships of Platynereis globins at the metazoan-wide phylogenetic scale, using a sampling of metazoans for which sequenced genomes are available. We wanted in particular to put to the test the idea that animal hemoglobins are derived convergently from "neuroglobin-like" proteins, involved in a neuronal function [18].

Identification of globin genes in the Platynereis genome and cluster mapping
We identified in the transcriptomes and genome of Platynereis 19 different genes coding for proteins with a single globin motif (Fig. 2a). Nine of these genes code for a peptide with a secretion signal at their N-terminus, (Pdu -B2), making them likely to represent extracellular globins (Additional file 1). Six of these genes, distinct from the predicted extracellular globins, code for peptides that are likely to be membrane-bound, thanks to N-terminal acylation sites. All 19 sequences were identified both in the genome assemblies and in the transcriptomes, indicating that they are not cross species contaminations. In addition, all sequences were mapped to different loci in the latest assembly of the Platynereis genome ( Fig. 2b and Additional file 2). They therefore represent genuine paralogues instead of divergent alleles. Six of the nine potential extracellular globins (Pdu Egb-A1d-α, -A1d-β, -A1d-γ, -A2, -B1, -B2) are comprised in a chromosomal cluster (Fig. 2b) and two other closely related genes (Pdu Egb-A1a, -A1c) form a tandem pair in a distinct contig. All 10 other genes, coding for putative intracellular globins, as well as the extracellular Pdu Egb-A1b are found as isolated genes on rather large contigs.

Phylogenetic analysis of the Platynereis globins at the metazoan scale
To understand better the origins of the multiple globin genes present in Platynereis genome, we carried out a phylogenetic analysis of metazoan globins. Except for extracellular globins in a couple of annelid species, we used exclusively species for which a complete genome sequence is available. We chose species representative of all major metazoan clades (phyla). We designed a screening technique with a concatenated globin probe and a low E-value cut-off in order to detect even the most divergent globins. We decided however not to treat in this study the globin domains that are part of multidomain proteins, such as the androglobin of vertebrates. We also did not take into account the globins with multiple serially arranged globin folds as occurring for example in echinoderms [9,32]. We included the pancrustacean Daphnia pulex. In Daphnia, 11 proteins have a double globin domain organization, suggesting multiple tandem duplications of an ancestral didomain globin. We chose to separate the N-and C-terminal globin domains in each of these peptides and to treat them as individual OTUs in our trees. Our complete genome surveys uncovered a highly variable number of globin sequences in the different metazoan species, ranging from none in the ctenophore Pleurobrachia bachei and the rotifer Adineta vaga, to 32 in the annelid Capitella teleta (Additional file 3). We also searched all metazoan genomes for the specific linker proteins that are associated with the assembly of the HBL-Hb. These proteins are only present in the four annelid species that have also extracellular globins (Additional file 3).
We conducted a maximum-likelihood phylogenetic analysis of this sampling of 272 protein sequences (sequence alignment in Additional file 4; complete tree in Additional file 5; simplified version in Fig. 3). A major issue in phylogenetic analyses is to determine the root of the tree, as this directly impacts all the evolutionary interpretations that can be deduced from the tree. In this particular analysis, there are no obvious molecules that could serve as outgroups for globins inside the metazoans. We also decided not to use globins from other eukaryotic groups, as we were concerned that using very divergent outgroups with a small number of alignable positions could result in an artifactual rooting of the tree. Instead of determining where the root of the tree could possibly be, we decided to deduce where it could not be. To this goal, we took into account the solid groupings found in the unrooted tree, as indicated by their aLRT scores, and their composition in species to determine which groupings are natural clades, which we describe in details below. We also excluded a root position within well supported mono-species clades. The root of the tree that is represented in Fig. 3 (and Additional file 5) is thus chosen arbitrarily among the positions that we consider possible based on that criterium. We looked for groupings that would indicate pan-metazoan or pan-bilaterian genes, because they contain globins that are derived from a large sampling of metazoans species and do not contain sub-trees that are themselves composed of a large panel of metazoan species. These solid clades are likely to derive from a single ancestral gene. We found four well supported clades (aLRT > 0.95), that we named clade I-IV, that contain a broad sampling of bilaterian species, but no non-bilaterian species. These are likely descendants of single pan-bilaterian globin genes. In addition, one species rich clade containing both bilaterian and non-bilaterian genes (Porifera, Placozoa, Cnidaria) was found but with only a marginal aLRT support. We named this clade number V as we consider that it likely contains descendants of a single pan-metazoan gene. Last, distinct of these five pan-bilaterian or pan-metazoan clades, we found a well-supported clade containing only spiralian genes (Mollusca, Annelida), which we termed Sp-globins (Sp-gb). In addition to these multi-species clades, two other smaller clades are visible. One of them contains mostly globins of the cnidarian Nematostella vectensis as well as a very derived globin of Drosophila melanogaster that might be there because of long branch attraction. The other clade contains only globins from the nematode Caenorhabditis elegans. These mono-species clades presumably represent relatively recent gene radiations of a single divergent globin and remain unclassified. These eight well supported clades are connected by nodes with low support. Note that our arbitrary root chosen for representation in Fig. 3 (and Additional file 5) is outside of these eight clades. We conducted bayesian phylogenetic analyses of the same sampling of genes and found good support for most clades (clades I, II, III, IV, Sp-gb), but not for clade V (Additional file 5).
We mapped on our ML tree the globins that are thought to be membrane bound because they harbor predicted dual N-terminal acylation signals (Additional file 6; Additional file 5, magenta rectangles) [16,33]. None of the clade I globins is predicted to be membranebound. This is in contrast with the clade II, III, IV and Sp-gb, that all contain numerous predicted acylated globins. We retained as likely membrane-associated only those peptides that show strong predictions of both N-terminal myristoylation and palmytoylation. It is likely however that more globins of these clades are actually membrane-bound, for example by combining myristoylation with a basic motif. Based on the large proportion of predicted acylated peptides, there is a strong probability that the ancestral globins of clades II and IV, at least, were themselves linked to the membranes.
This analysis thus suggests that all bilaterian globins derive from five ancestral genes that were present in Urbilateria, thus defining five clades of bilaterian globins. Interestingly, Platynereis globins are distributed in all five clades, plus the spiralian-specific clade. In the genomes of LG model of amino acid substitution was used and a SH-like test of a data set of 272 sequences from 24 metazoan species. To improve readibility and facilitate annotations, a simplified tree version is used (extensive version of the tree in Additional file 5). Gene names do not appear and branches have been color-coded according to the clade of the metazoan tree in which they occur (colour code in the box on the right). The tree is arbitrarily rooted outside of the natural clades. Green and red diamonds indicate a number of nodes that are supported by aLRT values superior to 0.75 and 0.95, respectively. The tree shows 5 groups (clades I-V, ), including 4 with well-supported nodes (clade I-IV), encompassing sequences of most bilaterian phyla. An additional well-supported node groups together only spiralian species sequences. Platynereis sequences are indicated by their names and thicker red branches non-bilaterian metazoans, a very variable number of globin genes are found. To analyse further the pre-bilaterian evolution of globins, we performed a second ML analysis with an extended sampling of non bilaterian metazoans (sequences: Additional file 7; tree: Additional file 8), two choanoflagellates and one acoel. This extended analysis supports the presence of a single Clade V globin in the last common ancestor of metazoans (Additional files 6 and 8). The situation remains unclear in cnidarians as hydrozoans and one ctenophore appear to have clade IIIrelated sequences. Additional complete genomes of non bilaterians and resolving the controversy of their phylogenetic relationships will be needed to understand fully the complement of globins in the last common ancestor of metazoans.
All blood respiratory globins, or suspected to have a respiratory function, either intracellular (human) or extracellular (annelids, Daphnia) are found in clade I in the ML tree ( Fig. 3). They also form species-specific clades inside clade I. All human red cell hemoglobin genes are found in a strongly supported clade with the human myoglobin and the human cytoglobin. All the Daphnia globin domains of their large family of didomain proteins are found in two separate clades, the N-domains and the C-domains, again suggesting a radiation from a single ancestral didomain-coding gene. All extracellular annelid globins are found in one moderately supported clade, containing two globin "A" and "B" clades. These extracellular globins are present in four of the six annelid species selected (P. dumerilii, Arenicola marina, Lumbricus terrestris and Alvinella pompejana), representing groups far apart in the annelid tree. This grouping reinforce the interpretation that P. dumerilii putative extracellular globins have indeed a respiratory function, as this function is demonstrated by a number of work in the three other annelid species [27,28,34]. It also shows that the acquisition of a secretion signal and the gene A/B duplication that gave birth to these respiratory proteins must have taken place early in annelid evolution, possibly before the last common ancestor of all living annelids. To strengthen this interpretation, we also searched two other annelid genomes that do not present any putative extracellular globins. One is the leech Helobdella robusta in which no clade I globins are detected. The other is Capitella teleta that present a large family of strongly related clade I intracellular globins, that are suspected to have a respiratory function [35]. All the families of circulating respiratory globins present in distant groups in our analyses (vertebrate red cell globins, Daphnia extracellular globins, annelid extracellular globins, Capitella red cell globins) thus correspond to independent event of gene recruitment and duplications, seemingly from a single ancestral clade I globin.

Expression profiling of Platynereis extracellular globins
While our phylogenetic analysis suggests a circulating respiratory function for Platynereis extracellular globins, we wanted to obtain strong evidence of the linkage between extracellular globins and the development of vascular system of Platynereis. We particularly wanted to establish which cells are responsible for the production of Platynereis putative hemoglobin. To this end, we compared and combined two types of analyses: an expression analysis of globin genes by in situ hybridization in toto (WMISH) on a series of Platynereis developmental stages and a transmission electron microscopy study of juvenile stages to explore the vascular system development and characterize the globin producing cells. Platynereis shows a succession of larval and juvenile stages, as well as a spectacular sexual metamorphosis, called epitoky (Additional file 9). As the larvae, juvenile and mature worms have transparent tissues, the accumulation of hemoglobin in the blood is readily visible at all life cycle stages. Larvae and small juveniles, that are colorless show no sign of expressing hemoglobin. The blood becomes visibly red only when the worm reaches a certain size, around 10-15 segments. Then, the quantity of blood increases progressively as the worm grows longer and thicker, peaking in mature swarming worms.
Specific RNA probes were produced for each globin gene. It must be noted however that the Pdu-Egb-A1dα, -A1dβ and -A1dγ genes have very similar nucleotide sequences, as well as being close chromosomal neighbors. The signal obtained with the Pdu-Egb-A1dα probe could be a mix of the expressions of all three genes. None of the globin probes gave a significant signal on early larval stages (24 hpf trochophore, 48 hpf metatrochophore, 72 hpf nectochaete). The 10 intracellular globins gave no signal or non-specific signal in silk glands after long staining periods (not shown). Only six of the seven extracellular globin probes showed specific patterns. They all display expressions in the same cells, but not necessarily at the same time. Typical expression patterns for Egb-A2 are shown in Fig. 4. In particular, we saw no expression of Egb-A1c in juvenile worms. The expression for this gene starts only in worms that are approaching sexual maturity and are about to enter epitoke metamorphosis. For all the most precocious extracellular globins, the expression starts when the worm is about 5 segments (not shown) in the posterior-most segment. In older juveniles, the expression is located along presumptive lateral vessels in the trunk (Fig. 4aʹ, red arrowheads). The expression shows a graded pattern along the anterior-posterior axis: stronger in the mid-body segments and absent in the rostral most and caudal most segments. This graded expression continues as the worms continue to develop. In bigger worms, the expression starts in the appendages of the mid-body along putative vessels that irrigate the part of the appendage that serves as a respiratory gill. The central most segments display expression in the parapodia when more rostral and caudal segments still display expression around lateral vessels in the trunk (Fig. 4cʹ, red arrowheads). There is a more intense expression in the mid-body segments and seemingly a progressive spatial shift of the hemoglobin-producing activity from the trunk to the parapodia ( Fig. 4c; Additional file 10). In worms engaged in sexual metamorphosis, we find a decreasing intensity of expression of all extracellular globins in parapodia (Fig. 4e, f ). In fully mature swarming worms, hemoglobin content is peaking but there is no mRNA of any blood globin left (Fig. 4g).
In addition to the spatial pattern evolution, our WMISH analyses suggest that different extracellular globin isotypes might be expressed at different times. To establish further evidence for globin isotype switching during the life cycle, we turned to quantitative PCR analysis. The stages for qPCR were divided as follows: 48 hpf larvae, 6 weeks larvae, 50 segments larvae, sexual metamorphosis stage I, sexual metamorphosis stage II, mature swimming worms. We designed specific primer pairs for each Egb gene, but again it is expected that primers for Pdu-Egb-A1dα are going to capture expressions of the two closely related genes -A1dβ and -A1dγ as well. The relative expression levels of the extracellular genes with respect to reference genes show important variations from one biological triplicate to the other (Additional file 11). Different individuals of the same apparent stage can display substantially different levels of globin mRNA. We have no clear explanation for this important variability but globin expression at the mRNA level is known to be highly responsive to physiological conditions, in particular the amount of O 2 available in the environment [36]. Notwithstanding, the general trend is that blood production picks up at 6 weeks, culminates between 50 segments and the beginning of the sexual metamorphosis (75-80 segments) and collapses rapidly after, ending by the time the worm starts swimming, which will be followed by reproduction and death (Fig. 4i). Individual gene expressions, despite variability, show trends that confirm our WMISH observations for some genes. The globins A2, B1 and B2 ( Fig. 4i) are expressed at levels that follow the general tendency described above and at higher levels than the four A1 globins. Four paralogous A1 globins (Fig. 4i), meanwhile, are expressed at different levels. While A1a and "A1d-α" follow the general tendency, A1c is expressed significantly only in sexually maturing animals maybe representing an adult-specific HBL-Hb sub-unit. In contradiction with at least some of our WMISH results, A1b was detected only at very low levels in all biological replicates.

Characterization of Platynereis hemoglobin-producing cells
Next, to understand how the Platynereis BVS develops and to localize the extracellular globin-producing cells and analyse their cytological properties, we observed stained semi-thin sections of worms at various developmental stages (example in Fig. 5k, l) and made 3D reconstructions of their blood vessels (Fig. 5a-j). We also observed ultra-thin sections of worms by electron transmission microscopy ( Fig. 5m-r). The Platynereis BVS is built progressively during development. The first signs of a BVS appear when a feeding juvenile with 3-4 segments has settled on the substrate around 15 days after fertilization, in the form of a pulsatile dorsal vessel and a noncontractile ventral vessel (Fig. 5a, b). Then the juvenile starts to add segments at the posterior tip of the body and this sequential addition last during most of the benthic life of the animal (Fig. 5c-f ). New metameric BVS units are put in place in each new segment added. Each metameric unit communicate with the contiguous segments by the pulsatile dorsal vessel, the quiescent ventral vessels and derivations of the lateral vessels. New vessels (See figure on next page.) Fig. 4 Expression profiling of Platynereis extracellular globins. a-h Expression patterns of Pdu-Egb-A2. The expression patterns of other extracellular P. dumerilii globins are spatially very similar and located in the same hemoglobin-producing cells of transverse trunk vessels and parapodial vessels (Additional file 10). a 18-segment juvenile show expression in transverse vessels in the trunk in a series of adjacent segments (red arrowheads). There is no expression in the rostral-most and caudal-most segments. aʹ Magnified view of three central segments. b, c 22-segment juvenile and 35-segment juvenile show expression in the parapodia (segmental appendages) for central segments and expression in lateral vessels in a few segments more rostrally and caudally located. cʹ Magnified view of a few anterior segments of worm (c) showing some residual expression in transverse vessels (red arrowheads). d Drawing of a pre-mature worm explaining the dissection and mounting of parapodia. e-g decreasing expression of Egb-A2 in parapodia of worms undergoing sexual metamorphosis. Notice the delineation of vessels by HPC in the dorsal part of parapodium (e). i Relative abundance of mRNA of P. dumerilii extracellular globin genes, detected by real-time qPCR, at different life cycle stages. The sub-adult is a 12-week old worm. Metamorphosis stage 1 is defined according to the following criteria: eyes starting to bulge, parapodia starting to transform. Metamorphosis stage 2: eyes grown to maximal volume, parapodia fully transformed, but the animal is not displaying active swimming behaviour yet. Reference genes are rps9 and sams. For stage 50 segments, stage I pre-matures, stage II pre-matures and swarming worms, the data points correspond to the median value of biological triplicates appear in the growing trunk, especially at the level of the segmental appendages, which serve both as legs and branchiae (gills).
The numbers of putative hemoglobin-producing cells (HPC) increase progressively as the worm ages, in correlation with the overall quantity of blood observed. One stage corresponds to juvenile worms with around 25-30 segments (Fig. 5g, h). In these worms, the BVS metameric unit is still fairly simple with dorsal and ventral vessels and a couple of lateral vessels (Fig. 5g, h, red tracks). The lateral appendage vasculature is still little developed. At this stage, sheaths of characteristic putative HPC (hpc, in blue) are already present around a pair of transverse blood vessels (tbv, Fig. 5g, h). The other stage corresponds to pre-mature worms with around fifty segments. These worms have much more developed lateral appendage network with lateral vessels colonizing the gill-like parapodia (Fig. 5i, j, red; movie in Additional file 12). In the mid body segments of these worms, the putative HPC sheaths are still visible around the transverse vessels (tbv) but new HPC appear around parapodial vessels, more densely on the dorsal side.
Putative HPC semi-thin and TEM images are illustrated in Fig. 5k-r. On semi-thin sections, they appear as thick sheaths of cells surrounding the lateral vessel lumen (Fig. 5k, l). The HPC are organized in a simple epithelium with basal sides facing the vessel lumen and apical sides facing the coelomic cavity (Fig. 5m). A clear basal lamina delineates the vessel lumen on the basal side (Fig. 5o). HPC are therefore meso-epithelial cells in clear continuity with the meso-epithelium that surrounds the coelomic pockets. They show all the characteristics of an intense protein synthesis and secretion activity ( Fig. 5n-q). The cytoplasm is mainly filled with a dense rough endoplasmic reticulum (Fig. 5n, o). Electrondense bodies are visible in the cytoplasm of HPC and are interpreted as hematin vesicles (Fig. 5p). Last, secretion vesicles are clearly identified on the basal side (Fig. 5q), indicating intense exocytosis processes. The vesicles contain a uniformly granulated matrix identical to the aspect of blood. These small granules are thought to represent the giant particles of HBL-Hb, characteristic in many annelid species. These HPC are remarkably similar to the previous descriptions of hemoglobin secreting cells in the heart-body (an organ inside the dorsal vessel) of several sedentarian annelids [37]. Small juvenile worms with 4-6 segments already have HPC, visible in the posterior most segments as small groups of cells (blue) (Fig. 5a-f ).
In juveniles of various stages, we also identify secretory cells with clear signs of apoptotic degenerescence (Fig. 5r). We believe they represent dying HPC. Their location is correlated with the globin-production shift we detected in the WMISH analysis. We find degenerating cells around the transverse vessels in the mid-body segments at the time when globin production shifts from the trunk to the appendages. We find these degenerative cells as early as in small juvenile with 6 segments (Fig. 5e, f, in the forlast posterior segment, green).

Discussion
We report the first exhaustive screening for globin genes in an annelid and more generally in a protostome animal. We performed a broad phylogenetic study of the metazoan globins based on complete genome sequences. This study helps in understanding the evolution of the globin superfamily in metazoans. Combined with the existing knowledge, it gives some hints in the evolution of the red blood itself. Bilaterian hemoglobins evolved several times independently, but from a single intracellular clade I stem globin or cytoglobin [38] as it is termed in mammals. This stem cytoglobin appears to be still present in most bilaterians. Our study strongly contradicts earlier suggestions that annelid-like extracellular globins existed in bilaterian ancestors [19] or that bilaterian hemoglobins evolved by functional shift of neuroglobin-like proteins [18].

A phylogenetic nomenclature of metazoan globins
In this work, we have screened only whole genome sequences of a representative sampling of metazoans, using a concatemer sequence probe representing the diversity of human and Platynereis sequences. We are thus confident that our sampling of globins represents the most complete description to date of globin diversity in animals. In trees based on maximum-likelihood and Bayesian algorithms, a number of robust nodes emerged that correspond to globins present in a large sampling of bilaterian species, thus likely to represent ancestral bilaterian genes.
The existing classification of globins is mostly mammalian and function oriented, which is of course relevant in certain research contexts. In this article, we want to give a different viewpoint by providing a classification that is based on gene phylogeny at the metazoan level, exactly as it exists for other conserved gene families. Previous publications have tended to develop additional nomenclature by keeping strong mammal-and hemoglobin-centered (hence an evolutionary derived function) points of view. Globins evolved in functional contexts that were entirely different from the eventual evolution of a blood function in a few animal groups and this is what our phylogenetic nomenclature reflects.
The five ancestral groups of globin molecules we identified (Fig. 6a) are the following: -Clade I: this includes the totality of circulating blood globins, either extracellular or carried by red cells, found in metazoans. In addition, this group also includes many non-circulating globins. These additional non-blood clade I globins are present in animals with hemoglobin blood, with a non-hemoglobin blood or with no blood at all. The large majority of bilaterian taxa are represented in this group but no globin of non-bilaterian taxa (cnidarians or sponges) is present, indicating that this clade originated in an ancestor of bilaterians. Globin proteins can display two different chemistries, depending on whether the heme is attached to the globin by one or two histidine side chains, respectively referred to as "pentacoordinate" or "hexacoordinate" [39]. The vertebrate hemoglobins and myoglobins, responsible for conveying and storing oxygen are pentacoordinate. Cytoglobins and Drosophila Glob1 however are hexacoordinate suggesting that the vertebrate blood globins could descend from a hexacoordinate globin. -Clade II: this clade contains multiple globins from all bilaterian superphyla (deuterostomes, ecdysozoans, spiralians). It has been called formerly "globin X-like" [16]. Clade II globins are not identified in non-bilaterian taxa, indicating that this clade originated in an ancestor of bilaterians. No clade II globin is present in vertebrates (although it is present in cephalochordates), indicating that a clade II globin was present in the common ancestor of chordates and lost secondarily in vertebrates. -Clade III: this clade contains globins from all bilaterian superphyla. It corresponds to "globin X" [16]. Clade III globins are not identified in non-bilaterian taxa in our main sampling, but the presence of Clade III-like genes in hydrozoans and a ctenophore indicates that this clade may have originated in an ancestor of eumetazoans (Fig. 6b) Interrogation marks indicate missing data because the whole genome is not available for the relevant species. On the right, the animal tree shows the inferences of gain and loss of globin gene clades in key metazoan ancestors. Colored disks indicate the gain of a new gene clade by duplication from existing ancestral genes while crossed disks illustrate the numerous secondary losses of globin clades. The green circles with a question mark illustrate the uncertainty on the appearance of a clade III gene in metazoan ancestors, as some cnidarians and a ctenophore possess genes with possible clade III affinities (Additional file 8). In the deuterostome branch, the extinction of clade III (green circle with a cross) occurred independently in Branchiostoma, Ciona and Homo lineages because clade III genes (globin X) have been identified in several vertebrate species [40] and also from all ecdysozoans except Priapulida. This clade has gone undetected in previous studies. Clade IV globins are not identified in non-bilaterian taxa, indicating that this clade originated in an ancestor of bilaterians.
-Clade V: this clade is the least statistically supported of all. It contains however globins from a broad variety of metazoans, including sponges and cnidarians, as well as choanoflagellates, one of the closest eukaryotic sister group of metazoans (Additional files 7 and 8). This is the only globin type whose existence is strongly suggested in the last common ancestor of metazoans. Paradoxically, it is also the most "dispensable" globin. It has disappeared from the genomes of the majority of the species in our sampling (especially all ecdysozoans). Clade V is represented in human by the neuroglobin protein and corresponds to some of the "neuroglobins" identified in previous studies. This is however a much narrower group than the "neuroglobin-like" proteins previously described [18]. These authors included many globins in a broad "neuroglobin-like" class, that does not appear to be a clade in our study because they are split between clade II, III and V. -In addition, a strongly supported clade contains only globins from spiralian species (Sp-gb). It may have appeared by a gene duplication of any of the other globin clades.

The animal globin superfamily has an ancient history independent from the history of blood
The existence of these clades clearly indicates that at least five globin genes existed in Urbilateria. We call these five hypothetical molecules the "bilaterian stem-globins". Few globin sequences fall outside of these five well-defined clades (Fig. 6b). These additional globins fall into two cases. In one case, these sequences may derive from the stem globins but could have evolved rapidly so as to blur sequence similarities. This is what we would suppose for the large clade of C. elegans globins found alongside a single clade I globin. The second case concerns the clade of sea anemone globins, which includes almost all the sea anemone globins in our data set (except from a single clade V globin). Most of the members of this group are membrane-bound, suggesting that it could be related to any of the bilaterian stem globins that were potentially membrane-bound (clade II, III and IV). Alternatively, the sea anemone membrane-bound globins may derive from an ancestral Eumetazoan membrane-bound globin that would also have given birth to bilaterian clade II, III and IV. Reconstituting metazoan globin history prior to bilaterian diversification is thus at this stage still tentative because of the small sampling of species. The deep relationship between bilaterian stem globins is not solved by our trees (that are arbitrarily rooted in any case). The idea that bilaterian blood globins (and all other clade I globins as defined by our trees) may derive from molecules that were initially membrane-bound [16] is thus not corroborated by our study. It is in our view equally possible that clade I derives from an ancestral duplication of the cytoplamic clade V, while clade II, III and IV would have acquired membrane tethering independently.
What were the functions of these numerous non-blood globins derived from the Urbilaterian stem-globins? Here again, very few functional studies have been performed and we can only speculate. Burmester and Hankeln [21] have reviewed the possibilities for the special case of neuroglobins (clade V), but the same possibilities seem to exist for the other clade of non-blood globins (clade I-IV) [40]. These intracellular globins, cytoplasmic or membrane-bound, or in some cases located in the nucleus [41], can display a "classic" function, similar to the vertebrate myoglobin (Mb) in storing O 2 in hypoxic conditions or facilitating the delivery of O 2 to the mitochondria. New functions have also emerged in the literature, such as the regulation of reactive oxygen species (ROS) [42] or reactive nitrogen species (RNS) that can be deleterious to the cell. It has been also been speculated that they could function as O 2 or redox sensors by interacting with other proteins such as G proteins [43] or cytochromes to transmit a signal.
The five Urbilaterian stem globins have undergone remarkably dissimilar fates in different bilaterian groups (Fig. 6b). They have all been conserved in the annelid Platynereis dumerilii and the mollusks Pinctada (bivalve) and Lottia (snail). In other bilaterian groups, one or more of the ancestral clades have been lost. Amniotes have kept only clade I and clade V globins. The urochordate Ciona has only clade I globins left but six clade I paralogues are present, maybe compensating for the loss of the other types. The most extreme case was however the rotifer Adineta, in which we could not identify any globin-related gene. This species lineage has thus lost all five ancestral globins.

A novel scenario for the evolution of blood globins
Oxyphoric circulatory globins have evolved several times during the course of metazoan evolution. All circulatory multigenic families in our dataset derive from a likely unique clade I globin, possibly after recruitment for the circulatory function and several rounds of gene duplications. These multigenic families encompass for example the human hemoglobin family as well as the extracellular globin families of annelids and Daphnia.
In the case of the hemoglobins of vertebrate erythrocytes (RBC), heterotetramers comprising alternate globin isotypes circulate depending on the development stage. Adult human hemoglobins form heterotetramers comprising sub-units of two isotypes, α and β, coded by paralogous genes [44]. During the course of human embryogenesis and foetal development, different isotypes of hemoglobins are produced (δ, γ, ε, ζ), coded by other paralogous genes located in two clusters of α and β-related paralogues. The crucial emerging property of the hemoglobin tetramer is cooperativity: the binding of an O 2 molecule to one of the sub-unit induces a conformational change in the tetramer that makes the binding of O 2 easier to the three remaining sub-units.
Elsewhere among metazoans, other gene radiations correlated with hemoglobin-based blood evolution have occurred. Three of the concerned groups are sampled in our tree: four species representing a wide diversity of annelids, the special case of the annelid Capitella teleta, the pancrustacean Daphnia.
In annelids, our phylogenetic analysis is in accordance with the scenario that early gene duplications, probably in the last common ancestor of modern annelids, produced the initial two families (A, B) encompassing the four extracellular globin sub-families A1, A2, B1, B2 that form the HBL-Hb. A recent work based on a massive sampling of annelid species transcriptomes recovered a variable number of extracellular globins, ranging from 1 to 12 depending on the species [7,19]. Gene trees made in this later work suggest that, while the early gene duplication A/B indeed occurred prior to annelid diversification, the sub-families A1, A2, B1 and B2 are not recognizable at the scale of the annelid tree. This is in accordance with what we obtain at the much smaller scale of our annelid sampling. This fact might be interpreted as evidence that the annelid last common ancestor has only two clade I paralogous globin genes, A and B, and that additional gene duplications occurred independently in many annelid lineages. Alternatively, four paralogous genes (A1, A2, B1, B2) may have existed early on in annelid history, accounting for the HBL-Hb found in divergent annelid groups, but in this case the gene sequences are not informative enough to reconstitute early duplication steps. Our work differs however in a striking way from previous results [19]. These authors identify a number of extracellular globins related to the annelid A and B extracellular globins from a large and diversified subset of bilaterian species including echinoderms, hemichordates, brachiopods, nemertines, mollusks and priapulids. We find no evidence for the existence of globins related to the annelid extracellular globins in bilaterian complete genomes, other than annelids. We argue for a thorough reconsideration of transcriptome screens [19], as many protein globin sequences these authors found in non-annelid bilaterians are strikingly similar if not identical to individual annelid sequences (as illustrated in their Fig. 4). This is suggesting multiple tissue or DNA contaminations between species transcriptomes. Despite Platynereis blood hemoglobin having never been directly characterized by spectrometric tools, our EM and sequences similarities show that Platynereis must exhibit HBL-Hb. In Platynereis, additional duplications have affected the A1 gene, making several paralogue sub-units. These sub-units, according to our in situ hybridization and qPCR data, may be used in producing alternative composition of circulating hemoglobin that would be used for physiological adaptations as the worm develops, grows and metamorphoses.
The clade I stem-globin evolved into a family of intracellular globins in some annelid species. In the case of the sedentarian worm Capitella, which does not have blood vessels, the hemocoel contains many red cells with intracellular globins with oxygen binding properties [25]. We have identified 17 intracellular globin paralogues, all closely related, in the genome of Capitella SpI and no extracellular globins. It appears as a case where the ancestral annelid extracellular globins were lost secondarily and replaced by RBC globins.
Although hemocyanin is the main respiratory pigment in pancrustaceans [21], branchiopods such as Daphnia, a few malacostracans and a few Insects (such as Chironomus larvae) rely on hemoglobin for respiration. Daphnia hemoglobins have been thoroughly studied [45], as a model of adaptation to the environment by the extensive use of alternative globin isotypes. Daphnia possesses a large family of extracellular two-domain globin-coding genes. It is proposed that the hemolymph of Daphnia contains a crustacean hemoglobin made of 16 identical globin peptides with two hemes each [46]. For our phylogenetic analysis, we treated the two globin fold sequences independently, resulting in trees indicating that the multigenic family of Daphnia globins has emerged from the tandem duplications of an ancestral didomain globin gene.
However, globin gene radiations are not necessarily linked to recruitment for a function in the blood. This is exemplified by urochordates and cephalochordates. The ascidian Ciona intestinalis possesses 6 clade I globins, all closely related. The cephalochordate Branchiostoma has eight clade I globins found in two clusters in this analysis. Both species have nevertheless a colorless blood and to date, no functional study has shown either oxygen binding properties or their presence in the circulatory system.
All clade I globins likely derive from a single ancestral clade I sequence. This ancestral clade I globin was duplicated several times in different clades independently to give small families of blood respiratory proteins. This raises two questions: What was the ancestral function(s) of this clade I globin that made it so well pre-adapted for a role in blood function and is there a gene, in current bilaterian species, that still carries this ancestral function(s)?
We can only speculate at this point because little is known on the function of the non-circulating clade I globins in the vast majority of bilaterian species. But the existing knowledge can at least help defining the direction of future research. Part of the answer may come from vertebrates that possess both circulatory and tissueexpressed clade I globins. The "tissue" clade I globins in vertebrates are myoglobins and cytoglobins. Myoglobins are monomeric clade I globins present in all vertebrates. Their function is well known and very specific: they are expressed in muscles and are capable of storing large quantities of O 2 necessary for sustained muscular effort. They also facilitate intracellular transport of O 2 [47]. Cytoglobins were first identified as distinct intracellular globins in amniotes [38,48]. They harbour a structural difference with hemoglobins (and myoglobin) in having a hexacoordinated heme iron, rather than pentacoordinated. They form homodimers, also by contrast to the former. Phylogenetically, they are firmly identified as clade I globins in our study. Contrary to myoglobins, they are expressed in a large range of tissues and organs, with maybe higher expressions in cells producing large amount of cell matrix such as fibroblasts and chondroblasts [49]. Information on the function (or functions) of cytoglobins remains scarce to this date. It is known that they bind dioxygen, carbon monoxide but also nitric oxide with high affinity [50]. It has thus been proposed that cytoglobins could play multiple roles in intracellular homeostasis. The first suggestion is of course related to dioxygen binding including roles in oxygen buffering, sensing, transport and storage. The activity of cytoglobin as nitric oxide deoxygenase, a very ancient function of globin proteins in eukaryotes, is also proposed to play an important role in the biology of the cell. Last, oxidative stress is another circumstance in which cytoglobins may play a role [50].
Remarkably, molecular phylogenies have demonstrated that large families of oxyphoric circulatory hemoglobins have evolved by gene duplications two times in the vertebrates: once in the gnathostomes and once in agnathans [14,51]. Interestingly, in these molecular trees, the cytoglobins of gnathostomes and agnathans are found as sister groups to both their large radiations of circulatory globins, suggesting that cytoglobin existed prior to the origin of red cell globins in vertebrates. Cytoglobins evolve conservatively [13], suggesting that they keep important functions. There are good reasons to propose that cytoglobins (rather than the more specialized myoglobin) may be close to a clade I globin with ancestral functions, as was suggested before [16].
The fruitfly Drosophila, like other insects, has a tracheal respiration and no circulatory globins. Yet, genome screens have revealed three globin genes in the Drosophila genome [52]. Two of these globins possess derived sequences. But the third one (Dme_524369; "Glob1" in Burmester et al. 2006) is found solidly clustered with clade I globins in our tree (Fig. 3, Additional file 5). This globin is hexacoordinated and may represent a derived cytoglobin-like molecule. It is found expressed in many tissues in the developing embryo and larva [53] but prominently in the tracheal cells. Gene knock-downs [53,54] lead to reduced survival of flies under hypoxic conditions. This suggests a role in intracellular O 2 homeostasis.
Both cytoglobins and Drosophila Glob1 may thus be close to the functions of a clade I stem globin. It has to be noted from the ML tree (Fig. 3) that several species (arthropods and mollusks) have a single clade I globin that may represent the unduplicated descendants of the stem globin. Few species have completely lost the clade I (only the annelid Helobdella and the cephalopod Octopus). This may reflect the initial functional importance of clade I stem globin.
Why have multiple gene duplications affected the clade I stem globin each time it has been recruited for a blood function? As we have mentioned earlier, cooperativity is an important functional mechanism for a multimeric globin whose role is to store large quantity of O 2 in the blood, yet be able to deliver it on demand to all tissues in need. It may be best developed in a heteromultimeric assemblage and thus requires gene duplication and rapid divergence of the O 2 storing protein. This idea has received crucial support from recent biochemical studies using synthetic ancestral proteins inferred from the diversity of extant vertebrate globins [55]. The second factor is the developmental switch of globins. This is well illustrated by the lamprey and the gnathostome cases. The lamprey Petromyzon possesses 18 hemoglobin genes that are expressed differentially at three different developmental stages, embryo, larva and adult [14]. These developmental switches are similar to those found in gnathostomes. Yet, all the agnathan circulatory globins are found in a single monophyletic group completely distinct from the radiation of gnathostome hemoglobins [51]. The lamprey globin family was convergently recruited for doing the same kind of developmental switch as seen in gnathostomes. We may see to a more limited extent the same developmental switch in Platynereis, with different A1 isotypes specialized for the juvenile and the adult blood functions.

The hemoglobin-producing cells: a specialized type of blood-making cells
Little was known on the nature of the cells that produce and secrete respiratory pigments, outside of vertebrates. We demonstrate in this work the existence of a specialized category of vessel-lining cells in Platynereis, that produces the blood globins, in other words the hemoglobin-producing cells (HPC). We analyzed the development of these cells during the life cycle of Platynereis. They first appear in the posterior most segments in a five-segment worm. The youngest feeding juvenile worms with 3 or 4 segments probably do not have HPC and therefore no hemoglobin in their blood. These are minute worms (less than 500 μm long) that probably perform gas exchanges with the seawater by simple diffusion through their tissue. In intermediate size juvenile worms (25-40 segments, less than 1 cm long), HPC form sheaths around transverse trunk vessels, in most segments of the worm. As the worm grows, the circulatory system structure becomes more complex and a rich network of vessels and capillaries develop inside the parapodia, the worm segmental appendages. These parapodia take on the role of gills. HPC develop around most vessels inside the parapodia. We currently do not know whether these are the cells of the wall of existing vessels that are differentiating in situ to become HPC, or whether HPC colonize the parapodia by emigrating from another location. We also show that, at this stage (worms larger than 40 segments), HPC within the trunk degenerate and are presumably digested by phagocytotic coelomocytes. Worms accumulate more red blood as they get close to sexual metamorphosis. When they have a maximal quantity of blood and start metamorphosing into swarming epitokes, the HPC stop progressively their production. It is possible that these cells degenerate but we have not demonstrated this phenomenon in metamorphosing worms.
To our knowledge, this is the first time the full cycle of these HPC is described in any annelid species and this is also the first time they are found associated with the parapodia serving as gills. HPC with very similar cytological properties have been described in other annelid species. They have been called perivasal cells because of their position around the lumen of vessels or extravasal cells, when they are located in the mesodermal peritoneum covering the gut in earthworms [56], siboglinids [57] or Arenicola [58]. In some sedentarian annelids such as Amphitrite [59], a specialized organ, the heart-body, produces the hemoglobins. The heart-body is a solid mass of tissue that grows inside the lumen of the anterior dorsal vessel [60]. There is thus a remarkable plasticity in the location of this hemoglobin-producing tissue in annelids that will be worth studying further.
How do these HPC relate to the hematopoiesis processes? It should be noted that, in Platynereis and in annelids in general, HPC are not circulating, contrary to vertebrates RBC. They are part of a mesodermal epithelium that is in contact with the coelomic cavity on the apical side and with the blood vessel lumen on the basal side. They are functionally polarized cells that secrete hemoglobin only on the basal side toward the vessel lumen. Besides HPC, Platynereis (and annelids in general) have several classes of coelomocytes/ hemocytes that are freely floating in the coelomic cavity and can cross the mesoepithelium to populate the blood. Can HPC be considered as cells produced by an extended hematopoietic process? This is not only a question of convention, when considering the diversity of HPC in metazoans.
Is there a phylogenetic connection between the different metazoan HPC or more generally speaking between the different metazoan respiratory pigment-producing cells (hemoglobin, hemocyanin, hemerythrin)? In the various bilaterian groups that possess either a hemocoel or coelom/blood compartments, HPC can be either free floating or static. In vertebrates, intracellular globins are massively produced and stored in red blood cells. Accumulating respiratory pigments in circulating cells is also the solution retained in a number of other groups. Within annelids, the bloodworm Glycera has red cells filled with monomeric and polymeric hemoglobins, circulating in the coelom as the BVS is much reduced in these worms [61]. Capitellids have also a reduced BVS and red cells. Also in annelids, Sipunculidae have pink cells filled with hemerythrin, illustrating the evolutionary swapping of pigments which has occurred in several groups. Data on the existence and location of pigment producing cells in other protostome invertebrates are still patchy. In the marine chelicerate Limulus, whose remarkable blue blood has been exploited for preparing bacterial endotoxin tests, the hemocyanin is secreted by cells called cyanoblasts. Cyanoblasts are found floating in the hemocoel and burst to liberate their hemocyanin content [62]. The tissue of origin of these cyanoblasts is unknown. In gastropod mollusks, cells responsible for hemocyanin production have been identified as the pore cells or rhogocytes [63], abundant in the connective tissue. In pulmonate snails that have hemoglobin instead of hemocyanin, it has been shown that these rhogocytes have switched to hemoglobin production [64]. One can hope that in the future single-cell transcriptomics as well as more powerful cell lineage tracing methods will help to solve the interesting question of the homology or convergence of respiratory pigment producing cells.

Conclusion
The annelid Platynereis has retained the complete set (clade I to V) of ancestral bilaterian globin genes plus a spiralian-specific globin. Platynereis may be an excellent model in the future to determine the initial functions of these stem globins, for instance by selective CRISPR-Cas9 inactivation. It possesses nine clade I globins which are all extracellular. This is consistent with the known presence in several annelids of HBL-Hb with respiratory function. Although the presence of HBL-Hb in Platynereis has not been formally proven by crystallography, our results strongly suggest HBL-Hb existence through phylogenetic, expression and morphological evidence. We identified the cells that secrete these extracellular globins. Their location and activity depends on the worm size and life stages. These cells are part of the meso-epithelial walls of some particular vessels, first in the trunk in juveniles and later in the appendages in the pre-mature worms. The blood production level peaks at the onset of the maturation process, likely in preparation for the adult locomotory activity peak. The HPC morphology established with TEM are similar to pigment producing tissues described in other annelids but their location in the appendages vessels serving as gills is original.
The finding of HPC within the gill organs in a marine animal is an important step forward. Platynereis is easily tractable to molecular biology experiments and can be used for obtaining transgenic strains. The molecular characterization of these cells will be very useful for comparative studies and exploring the diversity of HPC in bilaterians. One possibility would be to use singlecell transcriptomics, as HPC should be easily singled out in the data because of their massive production of extracellular globins. However, scRNAseq remains relatively expensive and shallow. Another possibility would be to develop a CRISPR-Cas9 protocol for tagging an extracellular globin gene with a GFP coding sequence. This would allow sorting HPC with flow cytometry and the bulk sequencing of their transcriptomes.
Another goal will be to understand the embryonic origin not only of the HPC but also of the other mesoepithelial cells that are involved in forming the blood vessels. Is there a developmental and lineage connection between these mesoepithelial cells of the vessels, the HPC and also the coelomocytes of Platynereis? In other words, do we have the equivalent of a hemangioblast lineage in the annelid? This question is all the more important if we consider the hypothesis that the BVS evolved in the first place to distribute nutrients and was later on recruited for a gas exchange function.

Survey of Platynereis dumerilii globin genes
Platynereis dumerilii globin genes were identified with consecutive steps of sequence similarity searches. We first used a concatenated probe of all 12 distinct human globin sequences (Ngb + Mb + Cygb + Hb-α2, -θ, -μ, -ζ, -β, -δ, -γ1, -γ2, -ε) as query against expressed sequence tags (ESTs) from Platynereis Resources (4dx.embl.de/ platy/), and Jékely lab transcriptome (https ://jekel y-lab. tuebi ngen.mpg.de/blast /). Complete coding sequences were assembled from EST fragments using CodonCode Aligner (CodonCode Corporation, USA) and consensus sequences were determined due to high level of polymorphism in this species. The sequences were reciprocally blasted against Genbank non redundant protein database. This allowed constituting a first list of sequences that were recognized as bona fide globins. In order to get an exhaustive repertoire of Platynereis globin genes, including highly divergent sequences and sequences that might be missing from the transcriptomes, we performed a tblastn search against the most recent Platynereis genome assembly using as query a concatenation of all Platynereis transcriptome globin sequences. Exons recovered with this method were systematically reblasted on the transcriptome sequences (Additional file 2). No additional gene sequence was detected at this stage. Double or multiple scaffold hits for some gene sequences likely correspond to divergent haplotypes of the same genomic region, the Platynereis strain used for genome sequencing showing some highly polymorphic chromosomal regions. Chromosomal globin clusters were annotated using Artemis (https ://sange r-patho gens.githu b.io/Artem is/Artem is/) and for each Platynereis globin, intron positions were mapped on genomic DNA using CodonCode Aligner.

Survey of globin genes in animal genomes
Gene searches were carried out using the tblastn or blastp algorithms implemented in ngKlast (Korilog V 4.0, Questembert, France). We used a concatenated probe of all 17 Platynereis globins with distinct sequences as a query (Pdu Egb-A1a, -A1b, -A1c, -A1d-α, -A2, -B1, -B2 + Pdu-gb-IIA, -IIB, -IIC, -IID, -IIIA, -IIIB, -IVA, -IVB + Pdu-Sp-gb + Pdu-Ngb). The sequences were reciprocally blasted against Genbank non redundant protein database. This concatenated query sequence covers as much as the molecular diversity of globins as possible and even very derived sequences were recovered in the hit lists. To ensure that all sequences with actual globin similarity are recovered, we reciprocally blasted hits with decreasing E-value until at least ten consecutive nonglobin hits turn up in the list. We used publicly available files of peptide predictions from 22 genome datasets widely distributed among the metazoan phyla: Porifera, Cnidaria, Placozoa, Ecdysozoa, Lophotrochozoa and Deuterostomia (Additional file 13). When available, the screen was complemented with publicly available transcriptomes that allowed for the correction of a few annotation problems. In addition, the globin genes search was performed on the transcriptomes of Lumbricus terrestris, Arenicola marina and Alvinella pompejana.
The giant hemoglobin linker proteins were screened in all genomes in the same fashion as globins, using this time as a probe a concatenation of all linker proteins from the earthworm Glossoscolex found in Genbank. Predicted sequences of Pdu-L1, -L2a and -L2b were deposited in Genbank (MW075674-MW075676).

Phylogenetic analyses
The amino-acid sequences of the identified globin genes in 25 species were aligned with MUSCLE 3.7 [67] as implemented on the LIRMM web (https ://phylo geny. lirmm .fr/phylo _cgi/one_task.cgi?task_type=muscl e) under default parameters and adjusted manually in Bioedit. A selection of aligned positions was produced to eliminate unaligned or ambiguously aligned regions (Additional file 4). This resulted in a alignment of 275 globin sequences displaying 231 phylogeny informative positions. The phylogenetic trees were constructed using two different approaches: the maximum likelihood (ML) and the Bayesian analyses. Maximum likelihood trees were generated using PhyML3.0 (https ://phylo geny. lirmm .fr/phylo _cgi/one_task.cgi?task_type=phyml ) [68], using the LG model of amino-acid substitutions [69]. This model has been shown to perform better than the more widely used WAG model. The Bootstrap test for short sequences such as globins is inappropriate. Therefore statistical support for nodes was assessed using SH-like test (aLRT score) [70]. Bayesian analysis was performed with MrBayes 3.2.6 [71] using either LG or WAG fixed model, run for respectively 32 174 500 generations and 24 972 000 generations, using all compatible consensus and a burn'in value of 0.25. The calculation was performed with 4 chains including one heated chain at temperature 0.5. The average standard deviations of split frequencies were respectively 0.031 and 0.017.

Cloning of extracellular globin genes and in situ probes design
Large cDNA fragments, encompassing at least the whole coding sequences, were cloned by nested PCR using sequence-specific primers on cDNA from mixed larval stages and posterior regenerating segments. PCR products were cloned into the PCR2.1 vector following the manufacturer's instructions (Invitrogen, France) and sequenced. The full list of primers used is provided in Additional file 14. Sequences were deposited in Genbank (MT701024-MT701042). These plasmids were used as template to produce Dig RNA antisense probes for whole-mount in situ hybridization (WMISH) using Roche reagents.

Visualization of Platynereis extracellular globin genes expression patterns by whole mount in situ hybridization
The animals were fixed in 4% paraformaldehyde (PFA), 1 × PBS, 0.1% Tween20 and stored at − 20 °C in methanol 100%.

Quantitative PCR
Total RNA was extracted using RNAeasy kit (Qiagen). The extraction was done from a batch of 48hpf larvae, a batch of 6 weeks worms, whole 50 segments juvenile worms, a stretch of 20 segments starting 24 segments after the head for maturing stages. The tissue was disrupted and homogenized in Qiagen RLT buffer using Eppendorf micropestles. First-strand cDNAs were synthesized using 100 ng of total RNA, random primers and the superscript II Reverse Transcriptase (Invitrogen, Life Technology). Specific primer pairs were designed with Applied Biosystems Primer Express software to amplify specific fragments between 50 and 60 bp (Additional file 15). The composition of the PCR mix was: 3 μl of cDNA (representing cDNA synthesized from 15 ng initial total RNA), 5 μl of SYBR GREEN master mix, 1 μl each of forward and reverse primers (500 nM final concentration), in a 10 μl final volume. qPCR reactions were run in 96-well plates, in real-time Applied Biosystem StepOne thermocycler. The PCR FAST thermal cycling program begins with polymerase activation and DNA denaturation at 95 °C for 20 s, followed by 40 cycles with denaturation for 3 s at 95 °C and annealing/extension for 30 s at 60 °C. After amplification, melting curve analyses were performed between 95 °C and 60 °C with increase steps of 0.3 °C to determine amplification product specificity. The slopes of the standard curves were calculated and the amplification efficiencies (E) were estimated as E = 10^(− 1/slope). qPCR were run with biological triplicates for juvenile worms of 50 segments and maturing worms, and each sample was run with technical duplicates and the ribosomal protein small subunit 9 (rps9) and D-adenosylmethionine synthetase (sams) housekeeping gene as references [72]. Relative expression level of each target gene was obtained by 2^(− ∆Ct sample ) where ∆Ct sample = Ct sample − Ct gene of reference (average of rps9 and sams).

Transmission electron microscopy and 3D reconstruction
For electron microscopy animals were relaxed in 7.5% MgCl 2 , fixed in 2.5% glutaraldehyde buffered in 0.1 M phosphate buffer and 0.3 M NaCl, rinsed 3 times in the same buffer, and postfixed in 1% OsO4 in the same buffer for 1 h. The specimens were dehydrated in ascending acetone series, transferred in propylene oxide and embedded in Epon-Araldite resin (EMS). Ultrathin sections (60-80 nm) were cut with Reichert Ultracut E or Leica Ultracut UCT and counterstained with 2% uranyl acetate and Reynolds lead citrate. Images were acquired using Zeiss Libra 120, FEI Technai or Jeol JEM-1400 transmission Electron microscopes and processed with Fiji and Adobe Photoshop software. For 3D-reconstructions the series of semithin (700 nm) sections were cut with Diatome Histo-Jumbo diamond knife (Blumer et al. 2002), stained with methylene blue/basic fuchsine (D'amico 2009), and digitalized at 40 × magnification using Leica DM2500 microscope with camera. The images were aligned with IMOD and ImodAlign tool. The reconstructions were made with Fiji TrackEM2 plugin.