Overview of collected data
For the phylogenetic analysis we retained 179 CYTB sequences at least 700 bp (133 new sequences and 46 sequences from GenBank) representing as complete a geographical distribution of each clade as possible (Additional file 1). The remaining 478 sequences (usually shorter and/or from the same or close neighbouring localities), including 16 sequences obtained by 454 pyrosequencing of old museum samples, were unambiguously assigned to particular MOTU by neighbour-joining analysis in MEGA 5.05 (bootstrap values higher than 95%) and these data were used for mapping the geographical distribution of phylogenetic clades.
We also selected 1–2 individuals from each of the main significantly supported CYTB clades (if the tissues were available) and sequenced them at IRBP gene. The final phylogenetic analyses included 42 sequences of IRBP (32 new sequences and 10 sequences from GenBank; see Additional file 1) from all main species groups except the baoulei group (see below). ML analyses were performed separately for both genes, and because the topology of trees was very similar (although the resolution of IRBP was much lower; Additional file 3), we finally performed both ML and BI reconstructions only using a concatenated CYTB and IRBP dataset produced in SEAVIEW [49].
Phylogeny of African Nannomys
Phylogenetic trees based on the concatenated dataset were well resolved and with very similar topology of 179 ingroup sequences in both ML and BI analyses (Figure 2). Subgenus Nannomys (including “Muriculus” imberbis; see [8]) was strongly supported. There are three long branches representing three ancient mountainous species with unresolved relationships to other groups (M. sp. “Nyika” = MOTU 1, M. imberbis = MOTU 2, and M. sp. “Harenna” = MOTU 3) and five well supported species groups. We call them hereafter triton, setulosus, baoulei, sorella, and minutoides groups, based on the previous use of these names, representing the best known species within particular clades. Each group contains several distinct lineages that may represent separate species; the most diversified is the minutoides group. The relationships among species groups are not well resolved, but in most topologies the triton group is non-significantly clustered with three ancient species, while all other species groups cluster together. Within the latter, the setulosus group separates the first, and the baoulei group is the sister of the sorella group (Figure 2).
Number of potential species and their distribution
The application of the GMYC model provided the delimitation of 49 maximum likelihood entities (hereafter GMYC-species; 95% CI = 42–62 entities) based on the ML estimate of speciation-coalescence threshold at 0.46 (0.27–0.86) Mya. Figure 3a depicts support for the “intraspecific” basal splits as coalescences as well as support for “interspecific” splits as speciation events. In both cases white circles indicate support < 0.95 and black circles > 0.95. Low “intraspecific” support suggests there may be more species present, whereas low “interspecific” support suggests the two sister clades may be in fact conspecific populations. Where two neighbouring “interspecific” and “intraspecific” supports are low, the speciation-coalescence transition is blurred.
K2P distances among the GMYC-species (3.16-20.77%) were not overlapping with “intraspecific” distances (0.12-2,38%) (Additional file 4). The detailed analysis of geographical distribution of GMYC-species showed that many sister groups among them are parapatric, i.e. most probably representing the results of allopatric differentiation and secondary contacts. For example, in the clade corresponding to M. minutoides in previous studies (e.g. [9]), the GMYC method delimited 12 GMYC-species with prevailing parapatric distribution pattern and with “interspecific” K2P distances 3.27-6.96%. Using the threshold value of 7.3%, we grouped these lineages and considered them as phylogeographical differentiation within the single species M. minutoides (see Figure 1f for the distribution of phylogeographical lineages that roughly correspond to “species” identified by GMYC method). Using this combined approach (i.e. analysis of geographic distribution of GMYC-species and threshold of K2P distances), we reduced 49 GMYC-species to 27 highly supported molecular operational taxonomic units (MOTUs, Figure 3a), which are further discussed below. Genetic distances among 27 MOTUs were always significantly higher and did not overlap with those within MOTUs (Additional file 4).
There were 17 MOTUs that exactly matched a single GMYC-species, 11 of them represented by more than one sequence. 7 MOTUs comprised two GMYC-species, 2 MOTUs were composed of three GMYC-species and a single MOTU, MOTU 27 = M. minutoides, comprised 12 GMYC-species (Figure 3a). In 12 cases, however, there was strong support for the presence of multiple species within a single MOTU (marked by black circles left of the GMYC threshold in Figure 3a).
Below we follow the nomenclature of [10] that recognizes 18 valid species. Possible names for newly recognized MOTUs are discussed in the text.
-
(1)
Ancient mountain lineages (Figure 1b):
A tri-phyletic group with very restricted distribution ranges. They are known from only a few individuals captured in the highest East African mountains. They were not included in previous phylogenetic studies of Nannomys and on the phylogenetic tree they form very long branches, in most topologies they are related to the triton group, but not always with significant nodal support.
(MOTU 1) Mus sp. “Nyika”
It is a very distinct ancient lineage of Nannomys, known from a single, relatively large individual (14 g), captured in the high plateau of Nyika Mts. in Malawi (cca 2100 m a.s.l.). Albeit partially broken, the cranium of this specimen clearly shows features that are typical for insectivorous rodents, namely proodont (forward oriented) incisors and slender mandibles. This lineage is sympatric with MOTU 17 (M. neavei) and even syntopic with MOTU 6 (M. cf. callewaerti).
(MOTU 2) Mus imberbis Rüppell, 1842
It is an easily distinguished taxon, very large (sequenced individual weighted 25 g) and with a black dorsal stripe. It has been considered as a separate genus Muriculus, but genetic analysis of a recently captured individual clearly shows that it is an internal lineage of Mus [8]. It is an endemic species of the high plateaux of Ethiopia, known only from a few of individuals (reviewed in [8]).
(MOTU 3) Mus sp. “Harenna”
It is a large species (cca 16 g), very probably endemic to the moist Harenna forest in the Bale Mts. in Ethiopia, a region with very pronounced endemicity [50],[51]. Based on morphometry this taxon was previously reported as M. triton [50] and in most topologies it is also the sister taxon to the triton group. Genetically it is a very distinct lineage (13.5-14.4% K2P distance to taxa of the triton group) with a remarkably different karyotype than M. triton [52]. Earlier studies have already suggested that this taxon represents a valid species [51]. It can be sympatric with M. mahomet, but differs in habitat preferences; M. sp. “Harenna” lives mostly in the forests, while M. mahomet inhabits more open grassy habitats [[53]; L. Lavrenchenko, pers. obs.].
-
(2)
The triton group (Figure 1b):
It is the group of MOTUs of relatively large body size, distributed mostly south of the equator (largely parapatric with the setulosus group - see Figure 1b vs. 1c). Genetic data suggest important cryptic variability (K2P distance among three MOTUs = 8.80-11.05%). Only nominotypical MOTU has a clear valid name, remaining lineages require further taxonomic studies.
(MOTU 4) Mus triton (Thomas, 1909)
This species was described from Mt. Elgon in Kenya and we provide the sequence from the type locality. It is distributed in the Kenyan highlands and northern part of Albertine rift. The same species probably occurs in southern Sudan also (described as M. imatongensis) [54]), but this should be confirmed by barcoding Sudanese specimens.
(MOTU 5) Mus sp. “Kikwit”
This distinct genetic lineage within the triton group was detected in two localities in south-western Democratic Republic of Congo (DRC). It may represent a new species, but more material and analyses are necessary to substantiate this claim. This MOTU supports important biogeographical distinctiveness of the Kikwit region in DRC (see also MOTU 21 from the minutoides group). The type locality of Mus callewaerti (Thomas, 1925) (Kananga, Kasaï occidental, DRC) is relatively near, so it is possible that they are conspecific, but a comparison with the type material will be necessary before a final conclusion can be reached (see also MOTU 6).
(MOTU 6) Mus cf. callewaerti
This taxon forms a well-supported separate lineage within the triton group. Its distribution range comprises a fairly important area situated between the Tanzanian Eastern Arc Mountains, through Southern Rift Mountains and northern Zambia till the Angolan highlands. In miombo woodlands of north-western Tanzania, it may have overlapping distribution ranges with M. triton, but no locality with sympatric occurrence was found in our study. The Angolan specimens were recently reported as M. callewaerti (Thomas, 1925) [14]. It is therefore possible that the whole clade should belong to M. callewaerti, but a comparison with type material will be necessary. The taxon prefers the miombo woodland or montane forest edges. There is important genetic variability within this taxon, with animals from Eastern Arc Mountains forming a distinct clade supported as a separate GMYC-species (Figure 3a).
-
(3)
The setulosus group (Figure 1c):
We recognized five MOTUs within this highly supported monophyletic lineage. It includes relatively large-bodied species, with distribution ranges mostly north of the equator, i.e. largely parapatric with the triton group. Two of these MOTUs were only recorded in Ethiopia.
(MOTU 7) Mus cf. proconodon
It represents a lineage probably endemic to Ethiopia, where it mainly occurs in lowlands of the Rift Valley. We suggest assigning this MOTU to the species M. proconodon Rhoads, 1896, i.e. the Ethiopian taxon that was synonymised with M. setulosus [10] even if genetically it represents the most distinct lineage of the whole setulosus group.
(MOTU 8) Mus setulosus Peters, 1876
This highly supported MOTU from western-central Africa (north-west of the Congo River) represents the true M. setulosus (type locality is Victoria, Cameroon). The western border of its distribution likely lies in the dry region of the Dahomey gap. In the north-east (i.e. southern Central African Republic (CAR)), it is probably in contact with M. bufo (MOTU 10), and it is worthy of further study to analyse the possible contact zone and reproductive barriers between these two taxa in CAR.
(MOTU 9) Mus cf. setulosus “West”
MOTUs 8–11 form a monophyletic group of four strongly supported lineages with roughly parapatric distribution (Figure 1c). Two of them (MOTUs 8 and 9) have been previously named M. setulosus (e.g. [9]). MOTU 8 is distributed in central African forests, while MOTU 9 in western Africa (west of the Dahomey gap). MOTUs 10 and 11 represent valid species M. bufo (Thomas, 1906) and M. mahomet Rhoads, 1896, respectively. The topology and genetic distances (K2P distance = 8.1%) suggest that MOTUs 8 and 9 should be given different names. Because M. setulosus was described from Cameroon (i.e. distribution area of MOTU 8), we suggest that the West African populations of M. cf. setulosus, i.e. MOTU 9, may represent a separate new species, but this claim needs to be substantiated by further taxonomic work.
(MOTU 10) Mus bufo (Thomas, 1906)
The species was described from Ruwenzori Mts. in Uganda and it was considered endemic to the Albertine Rift. There are few sequences identified as M. bufo in GenBank. The first (Acc. no. DQ789905) from Bujumbura in Burundi was reported by [9] as an incorrectly assigned species. Recently, new sequences of M. bufo from Kahuzi-Biega (DRC) were published [14] and all clearly cluster with the new sequences from CAR, DRC and Kenya reported in our study. Furthermore, we obtained a short sequence from the paratype of M. bufo from DRC (locality Idjwi) that also grouped with this clade. Although the morphological comparison with additional type material is necessary, we suggest that M. bufo has a much larger distribution range than previously assumed. This taxon may also involve additional populations of the setulosus group from Eastern Africa, especially those assigned to M. emesi Heller, 1911 (described from Uganda; morphologically similar to M. mahomet, with which it was synonymised [10]), and M. pasha Thomas, 1910 (East-African taxon that was synonymized first with M. proconodon and later on with M. setulosus [10]).
(MOTU 11) Mus mahomet Rhoads, 1896
It is an abundant species with a distribution range restricted to the Ethiopian Plateau. We provide the first sequences of this taxon, confirming its position within the setulosus group as a strongly supported monophyletic lineage. We therefore support the view of [55], who considered M. mahomet as an Ethiopian endemic, contrary to previous opinions merging it with Kenyan and Ugandan populations (i.e. most probably with MOTU 10, which is significantly supported sister group to M. mahomet; Figure 2).
-
(4)
The baoulei group (Figure 1d):
This is a West African clade, until now known as a single species, but with very pronounced divergences between two subclades (mean K2P distance on CYTB = 9.46%) that have partially overlapping distribution ranges in Ghana and Ivory Coast. Only very limited genetic data are available, because the species of the baoulei group are probably rare or difficult to capture [12],[13],[23]. The species of this group occur in the forest-savannah ecotone and are generally larger than other West African species (except M. setulosus) [12]. The baoulei group is a sister lineage to the sorella group (Figure 2), which is also reflected in morphology [56].
(MOTU 12) Mus baoulei (Vermeiren & Verheyen, 1980)
The species M. baoulei was described from Lamto in the Ivory Coast [56]. Two individuals sequenced from the type locality [12] belong to the genetic clade that is distributed mainly in Ghana, Benin and western Nigeria (i.e. the type locality represents the westernmost record of this lineage).
(MOTU 13) Mus cf. baoulei “West”
Specimens from this lineage were found in Guinea and single individuals were sequenced from the eastern Ivory Coast [12] and Ghana [23]. Future more-detailed studies (using more samples, morphology and nuclear markers) are required to resolve whether MOTUs 12 and 13 represent separate species.
-
(5)
The sorella group (Figure 1d):
It is a lineage of relatively large animals living in the Congo Basin’s forest-savannah transit zones, but also reported from south-eastern Africa (Mozambique and Zimbabwe) [57]. While very limited genetic data are available, our sampling shows very divergent sequences that may represent up to four species, but more data are required for taxonomic revision of this group.
(MOTU 14) Mus sp. “Dakawa”
Two sequences from Dakawa (Tanzania) belong to the M. sorella group, but they are very distinct from other lineages of the group (K2P distance = 8.74-9.75%). It is possible that they represent a new species, but more taxonomic research is necessary. There is an existing name, M. wamae, that may be valid for this MOTU. This taxon was described as a member of the sorella group from the Kapiti Plains in southern Kenya [57].
(MOTU 15) Mus sp. “Koi River”
A single specimen from the moist savannah area near Koi River in south-western Ethiopia clearly belongs to the sorella group, but is very divergent at CYTB (K2P-distance between MOTU 15 and other lineages of the sorella group are 9.72-9.83%). Further taxonomic work is necessary to resolve the taxonomic rank of this lineage. This is the first record of the sorella group in Ethiopia.
(MOTU 16) Mus sorella (Thomas, 1909)
The first sequence of this MOTU was published under the name M. sorella by [58] from Sangba (CAR). The species M. sorella was described from hills around Mt. Elgon, an area which has clear biogeographical connections to CAR (see e.g. MOTU 10 or clade C of MOTU27; Figure 1c and f). We obtained one additional short sequence from this lineage by 454 pyrosequencing of a museum specimen from the Garamba National Park in north-eastern DRC, thus connecting Sangba with the type locality. However, it is also possible that these sequences represent another currently valid species described from CAR, i.e. M. oubanguii Petter & Genest, 1970 or M. goundae Petter & Genest, 1970. More samples and detailed analyses are required to resolve this taxonomic problem.
(MOTU 17) Mus neavei (Thomas, 1910)
Even if more morphological comparisons are necessary, hereafter we call this south-east African clade M. neavei and we report the first sequences of this species. The type locality of M. neavei (also morphologically belonging to the sorella group; [57]) is Petauke, Zambia. In our material, this taxon is distributed in hilly areas of southern Tanzania, Malawi and one locality in Zambia (not far from the type locality). It occurs in sympatry with MOTU 6 from the triton group [57] and in the Nyika Mountains in Malawi also with MOTU 1. The records from South African Republic (SAR) are not yet confirmed genetically; the specimen mentioned by [14] was finally identified as M. minutoides and no other sequences of M. neavei were obtained despite intensive recent sampling efforts in SAR (F. Veyrunes, pers. comm.)
-
(6)
The minutoides group (Figures 1e-f):
This is the most diversified group within Nannomys, inhabiting various, mostly open habitats of sub-Saharan Africa. It harbours the real “pygmy” mice, i.e. the rodents with the smallest body size (some of them with body mass < 5 g). Most previous published genetic studies of Nannomys mainly targeted representatives of this group. Our phylogenetic analysis reveals three clear subgroups: subgroup 1 (MOTUs 18 to 20), subgroup 2 (MOTUs 21 and 22), and subgroup 3 (MOTUs 23 to 27).
(MOTU 18) Mus sp. “Zakouma”
A single specimen of this taxon was captured in the Zakouma National Park in south-eastern Chad [11]. It is genetically very distinct from its sister species, M. mattheyi F. Petter, 1969 and M. haussa (Thomas & Hinton, 1920), and further taxonomic work on more material from southern Chad may confirm it as a new distinct species. Together with M. mattheyi and M. haussa, this species forms a monophyletic group that diverged in West African savannahs.
(MOTU 19) Mus haussa (Thomas & Hinton, 1920)
It is a Sahelian taxon, recorded in the belt from Senegal to western Chad [9]. Similarly as in M. mattheyi and other West African savannah species of rodents [59]-[61], there is also indication of longitudinal genetic structure in M. haussa, but more detailed data are needed for more conclusive phylogeographical inferences.
(MOTU 20) Mus mattheyi F. Petter, 1969
M. mattheyi is typical species of Guinean savannah-forest mosaic from westernmost Africa (Senegal) to the Dahomey gap, the relatively dry region separating Guinean and Congolese forest blocks [9]. It is divided into western and eastern phylogeographic subclades with a presumable contact zone in the Ivory Coast (not shown). It is often the most abundant Nannomys in the rodent assemblages [13],[23].
(MOTU 21) Mus cf. kasaicus
Two sequenced individuals from the Kikwit region (DRC) formed this genetically very distinct genetic MOTU. There are also indications from other rodent groups that the Kikwit area is a local centre of endemism (see e.g. MOTU 5 or [62]). There is an existing name, M. kasaicus (Cabrera, 1924), described from Kasaï Occidental Province, Kananga, DRC, for the taxon belonging morphologically to M. minutoides group [10], that may apply to this MOTU.
(MOTU 22) Mus indutus (Thomas, 1910)
M. indutus is a south African species, found in a relatively large area from northern Botswana to southern SAR [11],[14],[63],[64]. Records from Zambia and Malawi are based on genotyping of old museum material [64] and should be taken with caution. It is probably sympatric with M. minutoides Smith, 1834 (= MOTU 27) in most of its distribution range.
(MOTU 23) Mus cf. gratus
Specimens from this taxon were typically captured in forest clearings and the ecotone between forest and open habitats in equatorial Africa. There are three distinct clades with clear west–east geographical structure: (i) a single specimen from lowland tropical forest in Congo (K2P distance to two remaining clades is cca 7%); (ii) the Kisangani region in DRC; and (iii) both montane and lowland tropical forests in southern Kenya and northern Tanzania. More taxonomic work is necessary to link this clade to an existing species; possibly M. gratus (Thomas & Wroughton, 1910), a taxon from the minutoides group described from eastern Ruwenzori, “upper Congo” and Virunga mountains. Again, the comparison with the types will be required to verify this hypothesis.
(MOTU 24) Mus cf. gerbillus
This taxon is distributed in dry Somali-Maasai savannah in Kenya and Tanzania. In all phylogenetic analyses, it is a sister clade to the Ethiopian MOTU 25 (mean K2P distance between these two clades is 8.87%). Further taxonomic work is necessary, but M. gerbillus (G.M. Allen & Loveridge, 1933) (currently the synonym for Tanzanian populations of M. tenellus) is an available name that may apply to this lineage.
(MOTU 25) Mus cf. tenellus
This lineage was found at two close localities in northern Ethiopia - in Hagere Selam and in the Mekelle University campus. It may represent true M. tenellus (Thomas, 1903) described from Blue Nile in Sudan, but the comparison with the type material is necessary. On the contrary, morphological studies of museum material suggested that most published Ethiopian records of M. tenellus were actually M. minutoides [10].
(MOTU 26) Mus musculoides Temminck, 1853
It is a typical species of the Sudanian savannah belt. It was previously reported from western Africa [11],[12],[17] and northern Cameroon [65]. We provide a new very distant record from western Ethiopia, representing the easternmost genetically confirmed locality of the species. Very probably it is also present in poorly sampled countries such as Chad, northern CAR and South Sudan.
(MOTU 27) Mus minutoides Smith, 1834
M. minutoides is a widely distributed species in most of sub-Saharan Africa (probably except continuously forested areas in the Congo Basin and deserts; Figure 1f). This MOTU also includes specimens from southern Ethiopia; some of them were previously called M. tenellus [9],[10]. This species has a very strong intraspecific phylogeographical structure. Median-joining network analysis of 131 sequences from this MOTU resulted in 84 haplotypes that form 11 strongly delimited haplogroups (Figure 4). The mean K2P-corrected distances among haplogroups ranged from 1.21% (TZw vs. KE) to 3.65% (ZA vs. Chin). All haplogroups are connected in the form of a star, suggesting multiple synchronous vicariance events. Allopatric divergences with subsequent expansions are further supported by current parapatric distribution of most clades and frequent, but narrow, secondary contacts among them (Figure 1f). The geographic structure within individual haplogroups is relatively weak, except clade SE, where it is possible to distinguish the separate sublineages from South Africa (h79-h81), Mozambique (h15-h17) and Tanzania (remaining haplotypes). Two haplogroups are only represented by animals from single localities (Minziro and Chingombe in Tanzania), but it is possible that they are more widespread in neighbouring regions in eastern DRC, where the relevant samples are missing.
Divergence dating
The basal split of the extant Nannomys was dated at 5.24 Mya with 95% of the highest posterior density (HPD) between 4.58–5.96 Mya. Successive divergence of the extant major species groups then took place throughout Pliocene, with median estimates of divergence times ranging from 4.9 Mya (split off of MOTU 1 “Nyika”) to 2.44 Mya, i.e. the divergence of MOTU 23 (cf. gratus) and MOTUs 24–27 (i.e. four other species of the minutoides group) (Additional file 2). Posterior estimates of divergence dates at the calibration points are shifted towards past in the case of Apodemus (prior median 5.89, posterior median 7.38) and Arvicanthis-Otomys (6.81 vs. 8.13) but towards the present in the case of Mus (8.00 vs. 7.44). Two other divergence dates are also worth noting: the split-off of Myomyscus yemeni estimated at 6.21 (5.12–7.33) Mya, which is consistent with its migration to Arabian peninsula across the land bridge during the Messinian crisis, and the origin of modern Otomys 3.77 (2.83–4.81) Mya, first appearing in the fossil record around 3 Mya [[66], p.290]. Complete results of the divergence dating analysis are reported in Additional file 2.
The full set of branching times between 27 MOTUs is given in Figure 3a. It is based on the secondary dating of the ultrametric tree for GMYC, but the posterior estimates of divergence dates are consistent with previous analysis (compare Figure 3a and Additional file 2). Main species lineages diverged in lower Pliocene (5.2-4.5 Mya) and an intensive period of speciation is also visible in the lower Pleistocene (2.1-1.6 Mya), when many extant lineages within main species groups appeared.
Biogeographical analysis
Bayesian analysis of discrete traits in BEAST revealed that the most ancestral distribution (98% support) of Nannomys included mountains of Eastern Africa (Figure 3b). This type of distribution is currently present in all three ancient monotypic lineages (MOTUs 1–3), as well as in numerous lineages of the triton and setulosus groups. There are two major habitat shifts in the Nannomys evolution. (1) The lineage leading to the baoulei group colonized the forests (and forest-savannah mosaic) in western Africa cca 4 Mya, where it split to western and eastern sublineages later on; (2) the minutoides group descended from mountains, adapted to more arid open habitats, and started to radiate across the whole sub-Saharan Africa cca 3.5 Mya. In the first radiation phase, MOTUs 18–27 speciated in savannah-like habitats over all of Africa (approx. 3.5-1.6 Mya). Geographically similar, but much more recent (cca 1 Mya) radiation occurred inside MOTU 27, i.e. M. minutoides (Figures 1f, 3a, and 4).
Very similar results were obtained by the maximum likelihood approach in Lagrange (Additional file 5). Most basal splits occurred with the highest probability in the mountains of East Africa, also where most of the MOTUs from the triton group diverged. The first clear shifts to other habitats are visible in the ancestors of the baoulei group (to the forests or forest edges, where both MOTUs from this group occur until today) and in the ancestors of the minutoides group (to the savannah). The most intensive radiation in the latter took place in savannahs, with one shift to the forest habitat detected in MOTU 23 (M. cf. gratus). The estimates of ancestral ranges are less clear in the setulosus and the sorella groups. While the former started to diverge most probably in mountains (with subsequent spreading of two “setulosus” MOTUs to forests of central and eastern Africa), the latter had ancestors occurring with similar support either in savannahs or in hills of Eastern Africa.