Endonuclease restriction analysis of Bacillus atticus showed restriction fragments for XbaI and PstI: XbaI produced a ladder-like pattern, with the shorter band at about 350 bp, (XbaI350), which indicates the presence of a tandemly repeated DNA. PstI produced two main fragments, one of about 530 bp (PstI530), the other at about 5.5 kb (PstI5500).
Southern blot analysis (Fig. 1) and sequencing unequivocally revealed that XbaI350 is part of the PstI530 sequence, henceforth named Bag530.
Southern blot analysis also evidenced that the PstI5500 fragment is part of the ribosomal DNA, containing both sequences of the 18S gene and Bag530 (i.e. both XbaI350/PstI530 fragments)(Figure 1). Direct sequencing evidenced that the PstI5500 fragment in B. atticus from Israel includes part of the last Bag530 monomer at the 5' end, while at the 3'ends downstream the 18S gene, therefore including the full ETS sequence (Figure 2). Moreover, in situ hybridization using PstI530 as a probe evidenced that it localizes in the NOR of B. atticus (data not shown). Bag530 and 3'IGS-ETS fragments were then sequenced in B. atticus and B. grandii subspecies, as reported in Table 1.
Structural organization of the IGS-ETS in Bacillus
In B. atticus and B. grandii, the average nucleotide content of 3'IGS-ETS sequences is 52.9% A+T (range 52.5%–53.8%), while the 374 bp belonging to the 5'-end of the structural 18S gene show 53.1% A+T content. On the whole the average length of the 3'IGS-ETS is 2583 bp, including 347 bp of the 18S gene (see additional file 1 material for 3'IGS-ETS alignment and annotation).
Figure 2 summarizes the structure of the IGS-ETS region in Bacillus, as determined by sequencing the PstI5500 and 3'IGS-ETS fragments. The exact position of 5' end of 18S was established by aligning Bacillus sequence to the 18S sequence of Blattella germanica [GenBank:AF005243]. The IGS-ETS region showed the typical basic structure with both repetitive arrays and non-repetitive sequences. Repetitive regions are of two main types: a large cluster of head-to-tail repeats, corresponding to the Bag530 array, and a twofold 388 bp direct repeat (named Bag388a and Bag388b respectively) spaced by a 191 bp unique sequence. From the 3'IGS-ETS alignment (see additional file 2 data) it is evident that the last monomer of the Bag530 is differently truncated, namely at the position 280 in B. atticus and at the 260 in B. grandii. The first unique sequence (590 bp long) characterizes the 5' of the IGS-ETS, downstream the Bag530 cluster. The Bag388a repeat is located in the position 871/1258, while the Bag388b is between 1446/1835. The two Bag388a and Bag388b repeats are separated by a sequence of 191 bp and followed by a unique sequence (428 bp long) upstream the 5' end of 18S.
Although the boundary between the IGS and ETS has not being actually determined, we found a motif similar to the transcription start point (tsp) sequence of other arthropods, downstream the Bag530 cluster. The observed motif (5'-TATATTAGAGGGA-3') well matches to the promoter consensus sequence (5'-TATA>TANGRRRR-3') of several arthropods (see additional file 3 data; [23]), and it has been also confirmed by using the Neural Network Promoter Prediction tool, which predicted the same sequence (5'-TTTTGGGTATATTAGAGGGA-3') with a score of 0.93. Assuming that we did find the real gene promoter, ETS is therefore 1736 bp long in B. atticus and 1727 bp long in B. grandii. This length is quite different from what typically found in other arthropods, where the ETS regions are usually 500–1000 bp long [2], with few exceptions such as Daphnia pulex (1280 bp) [23].
The Bag530repeat
As evidenced by restriction analysis and sequencing, the IGS region of Bacillus is characterized by an arrays of head-to-tail tandemly-repeated Bag530 repeats. In this study we sequenced 64 Bag530 repeats obtained from restriction analysis from B. grandii and B. atticus (see Table 1 for details). Bag530 consensus sequence of 531 bp was obtained by aligning the 64 sequences; we observed a total of 42 indels, 11 of which representing a large deletion in all B. grandii grandii clones (see additional file 3 data). Average sequence length is 516 bp, with variants ranging from 505 bp (GG/Cag137-c2) to 520 bp (GB/Tbe4-c4 and c5).
Closer sequence examination revealed that Bag530 repeats could be further divided into three shorter subunits of 119 bp (A), 276 bp (B) and 120 (A'), respectively: A and A' subunits showed 73.2% of sequence similarity between them. Due to the tandemly-arranged structure of Bag530, and to the fact that repeat boundaries are usually defined arbitrarily by the restriction site, we could argue that the functional sequence of Bag530 subrepeats should actually be B-A'-A, because the last Bag530 monomer of the array appears to be truncated just downstream the A subrepeat (see above). Moreover, a putative promoter sequence (5'-TATATAGGGGGT-3') occurs in each of the Bag530 repeats, suggesting that these sequences are actually spacer promoters. This hypothesis is also supported by the presence of a palindromic motif of 28 bp in each repeat (5'-CCCGGCGATCGAGGCCTCGATCGCCGGG-3'), 88 bp upstream the putative spacer promoter sequence. In fact, the local fold symmetry created by the palindrome is thought to provide the binding site for DNA-binding proteins that are often dimeric, like the UBF factor involved in the machinery of the RNA-PolI [24].
Variability of 3'IGS-ETS and Bag530sequences
In the automictic parthenogen B. atticus the overall mean distances value of 3'IGS-ETS is 0.010 ± 0.002. The distance values among the distinct allozymic and karyological races (see Background) show the same order of magnitude when comparing B. atticus cyprius vs.B. atticus carius (0.016 ± 0.003) and B. atticus cyprius vs. B. atticus atticus (0.012 ± 0.002), while between B. atticus atticus and B. atticus carius the value is lower (0.007 ± 0.001). On the other hand, the bisexual B. grandii shows a nearly 6-fold higher overall 3'IGS-ETS distance value (0.058 ± 0.005) when compared to the unisexual B. atticus. Comparing the different B. grandii subspecies among themselves, the distance values range from 0.039 ± 0.004 (B. grandii maretimi vs. B. grandii benazzi) to 0.083 ± 0.005 (B. grandii benazzi vs. B. grandii grandii).
Variability is not evenly distributed along the 3'IGS-ETS sequence: the sliding window analysis evidenced that nucleotide diversity progressively drops when approaching the 18S gene, which is likely due to selective sweep (Figure 3A). Moreover, where the putative tsp gene promoter was located, the only observed variation involves a single substitution (G > A) in B. grandii benazzii sample (5'-TATATTAGAAGG-3'). The observed high level of sequence conservation in this region gives further evidence about its structural and functional role for transcription.
The pattern of overall variability of Bag530 evidenced in B. atticus unisexuals is considerably lower to that of the sexual B. grandii (0.018 ± 0.003 and 0.045 ± 0.006, respectively). The values of variability between B. atticus subspecies are: B. atticus cyprius vs.B. atticus carius (0.022 ± 0.005), B. atticus cyprius vs. B. atticus atticus (0.018 ± 0.003), B. atticus atticus and B. atticus carius (0.019 ± 0.003). Comparing the B. grandii subspecies, the distance values are: B. grandii maretimi vs. B. grandii benazzi (0.021 ± 0.005), B. grandii benazzi vs. B. grandii grandii (0.072 ± 0.011), and B. grandii grandii vs.B. grandii maretimi (0.078 ± 0.012).
An homogeneous range of variability was found within the populations of B. atticus atticus (Paleochora, 0.013 ± 0.004; Castel di Tusa, 0.012 ± 0.003 and Israel, 0.015 ± 0.003) and B. atticus carius (Neraida, 0.009 ± 0.003) and B. atticus cyprius (Episkopi, 0.009 ± 0.002). Also within B. grandii grandii and B. grandii benazzi,Bag530 has similar levels of variability (0.010 ± 0.003 and 0.007 ± 0.002, respectively), while B. grandii maretimi showed a distance value equal to 0.015 ± 0.003, so that this subspecies seems at the first glance to be more variable than the others. However, it should be noted that such higher value is mainly due to a single Bag530 clone, only GM/Mar1-c3 (see Discussion): excluding it from the analysis, the level of variability falls to the value observed for B. grandii grandii and B. grandii benazzi (0.010 ± 0.003).
It is interesting to note that sequence variability is not uniformly distributed in Bag530: two minima fall within the B subunit, one in the region including the palindromic motif of 28 bp (see above), and the other in the region including the RNA-PolI promoter (Figure. 3B). Actually, the promoter-like sequence shows no variation among all clones, as it has been observed in other organisms. It has been supposed that this sequences have the potential to form strong secondary structures suggesting that the region may be under functional constraints.
Phylogenetic analysis
Neighbor Joining, Maximum Parsimony and Maximum Likelihood trees based either on Bag530 clones or 3'IGS-ETS sequences showed the same basic topology, with clones/sequences of B. atticus and B. grandii falling into two major distinct clades, well supported by bootstrap values. Here, for brevity, we report only Maximum Likelihood trees.
Trees based on Bag530 sequences (Figure 4A) showed two major distinct clades: one is given by B. grandii sequences, the other including B. atticus. Within B. atticus, clones grouped into two well-supported clusters: the first includes Bag530 clones of specimens collected in the western part of the species range (B. atticus carius and B. atticus atticus from Italy and Greece), while the second includes clones from specimens living in the Eastern Mediterranean (B. atticus cyprius from Cyprus and B. atticus atticus from Israel). Within each of the two B. atticus clusters, clones are completely intermingled, without any hint of geographical or population trend.
The B. grandii clones fall into three distinct clades, one for each subspecies, with B. grandii benazzi and B. grandii maretimi being more related. A single clone obtained from B. grandii maretimi (GM/Mar1C3) shows a peculiar clustering, being more similar to B. grandii benazzii variants (Fig. 4A). By comparing clones from B. grandii benazzii and B. grandii maretimi, 12 diagnostic positions were recognized: GM/Mar1C3 shows the first nine diagnostic sites typical of B. grandii benazzii, while the last 3 are of B. grandii maretimi; therefore GM/Mar1C3 may be the result of a gene conversion.
The tree based on 3'IGS-ETS sequences (Fig. 4B) shows the same basic topology as that based on Bag530 clones.