Skip to main content

Complete mitochondrial genomes and updated divergence time of the two freshwater clupeids endemic to Lake Tanganyika (Africa) suggest intralacustrine speciation



The hydrogeological history of Lake Tanganyika paints a complex image of several colonization and adaptive radiation events. The initial basin was formed around 9–12 million years ago (MYA) from the predecessor of the Malagarasi–Congo River and only 5–6 MYA, its sub-basins fused to produce the clear, deep waters of today. Next to the well-known radiations of cichlid fishes, the lake also harbours a modest clade of only two clupeid species, Stolothrissa tanganicae and Limnothrissa miodon. They are members of Pellonulini, a tribe of clupeid fishes that mostly occur in freshwater and that colonized West and Central-Africa during a period of high sea levels during the Cenozoic. There is no consensus on the phylogenetic relationships between members of Pellonulini and the timing of the colonization of Lake Tanganyika by clupeids.


We use short-read next generation sequencing of 10X Chromium libraries to sequence and assemble the full mitochondrial genomes of S. tanganicae and L. miodon. We then use Maximum likelihood and Bayesian inference to place them into the phylogeny of Pellonulini and other clupeiforms, taking advantage of all available full mitochondrial clupeiform genomes. We identify Potamothrissa obtusirostris as the closest living relative of the Tanganyika sardines and confirm paraphyly for Microthrissa. We estimate the divergence of the Tanganyika sardines around 3.64 MYA [95% CI: 0.99, 6.29], and from P. obtusirostris around 10.92 MYA [95% CI: 6.37–15.48].


These estimates imply that the ancestor of the Tanganyika sardines diverged from a riverine ancestor and entered the proto-lake Tanganyika around the time of its formation from the Malagarasi–Congo River, and diverged into the two extant species at the onset of deep clearwater conditions. Our results prompt a more thorough examination of the relationships within Pellonulini, and the new mitochondrial genomes provide an important resource for the future study of this tribe, e.g. as a reference for species identification, genetic diversity, and macroevolutionary studies.

Peer Review reports


Lake Tanganyika has experienced a turbulent geological history of lake level fluctuations, shifting shorelines and transient hydrological connections, paving the way for a complex sequence of colonisations that gave rise to a diverse freshwater fauna with a high degree of endemism [1]. The lake was originally formed by lateral expansion of the western branch of the East African rift, crossing the predecessor of the Malagarasi–Congo River around 9–12 million years ago (MYA). Only 5–6 MYA its water levels rose high enough for the sub-basins and the swampy areas in between to fuse into the deep clearwater lake of today [2,3,4]. The lake has experienced large water level fluctuations since then [5, 6]. These events are reflected in the evolutionary history of the organisms inhabiting the lake. For example, the adaptive radiations of the Tanganyika cichlid tribes happened over several stages, with some ancestral species colonizing the lake in the early stages of its formation, while others diversified later when the historical sub-basin lakes fused [7, 8] or following the depression of the northernmost sub-basin around 7–8 MYA [9]. Yet other lineages were initially thought to have colonized the lake at an even later stage, and thus established themselves in an already present adaptive radiation [10, 11]. Recent work, however, suggests that the cichlid radiation unfolded completely within the temporal and spatial confines of Tanganyika [9, 12,13,14].

Next to this textbook example of adaptive radiation, Lake Tanganyika also harbours a small clade of two endemic clupeid species, Stolothrissa tanganicae and Limnothrissa miodon. These clupeids are members of the African clupeid tribe Pellonulini, one of the most diverse freshwater radiations of Clupeiformes with 22 species in 11 genera, most occurring either on the West coast of Africa (distribution from Senegal down to Congo/Angola), or in the Congo River system and its tributaries and lakes [15, 16]. The members of Pellonulini are thought to be derived from a group of sardine-like species whose ancestors originated from the Atlantic West coast of Africa during a period of high sea levels between 30 and 50 MYA [16,17,18]. The exact route this radiation took through the Congo Basin is unknown, and the relationships between pellonuline taxa remain inconsistent in published clupeid phylogenies [17, 19,20,21].

The Tanganyika sardines are the fully pelagic, planktivorous, endemic S. tanganicae and the semi-pelagic, more opportunistic L. miodon, which is originally endemic to Lake Tanganyika but has also established in other lakes in Central Africa after anthropogenic introductions. Both species are important fisheries targets in Lake Tanganyika and provide food and livelihood for millions of people [22, 23]. The colonization and subsequent speciation of the Tanganyika sardines has only been explicitly addressed once [17], and estimated as part of larger phylogenies twice more [19, 20]. In these studies, estimates of their divergence time are based on minimum one and maximum three mitochondrial genes, and show substantial variation, the youngest being at 3.91 MYA and the oldest at 8 MYA with a large credibility interval (CI). Lake Tanganyika was formed 9–12 MYA, with the northern and southern sub-basins forming at 7–8 MYA and 2–4 MYA, respectively. The fusion of the sub-basins and onset of clearwater conditions is estimated at 5–6 MYA. Keeping these estimates in mind, a divergence time of the two sardine species of 8–10 MYA would mean that the lineage leading to S. tanganicae and L. miodon started to undergo speciation soon after entering the not yet connected sub-basins of the proto-lake. An older divergence time would indicate riverine speciation and subsequent colonization of the proto-lake. In contrast, a more recent divergence time would agree with intralacustrine speciation i.e. after the sub-basins of the lake connected to form the deep rift lake we see today.

Robust phylogenies and estimates of divergence time between lineages are crucial to understanding the relationship between geological or hydrological events, speciation and realised biodiversity. Mitochondrial genes are routinely used for this purpose [24], but single-gene datasets have limited ability to recover true phylogenetic relationships, especially in more closely related species [13, 25, 26] and tend to overestimate divergence time [27]. Whole mitochondrial genomes can contain phylogenetic information that is lost when targeting a single gene, and have yielded higher resolution and better supported phylogenies in recent studies of fish [28, 29] and other vertebrates [27, 30, 31], especially when investigating recently diverged or taxonomically diverse taxa [27].

In this study, we use short-read next generation sequencing (NGS) to sequence and assemble the complete mitochondrial genomes of S. tanganicae and L. miodon. We then use the new sequences, together with all available full mitochondrial clupeiform genomes, to build the first phylogeny of members of Pellonulinae to include all mitochondrial protein-coding genes (PCGs), rRNA-genes and the D-loop (control region). We revisit the phylogenetic relationships within Pellonulinae and estimate the divergence time of the Lake Tanganyika sardines with improved resolution. We discuss the results in the light of the geological history of Lake Tanganyika.


New mitochondrial genomes, diversity and divergence

The mitogenome assemblies of S. tanganicae and L. miodon were 16,737 bp and 16,739 bp long, respectively. We annotated all 13 PCGs, 22 transfer RNA (tRNA) genes, 2 rRNA genes (total = 37 genes) as well as the control region (D-loop) in both assemblies in the typical fish and vertebrate mitochondrial gene order [32] (Fig. 1). Gene order analysis in CREx confirmed that all included clupeiforms follow the same gene order, except Ilisha elongata, where tRNA-Pro and tRNA-Thr appeared transposed.

Fig. 1
figure 1

Mitochondrial genomes of Stolothrissa tanganicae (upper) and Limnothrissa miodon (lower). Inner circle shading indicates GC-content. Outer circle: black regions are protein coding genes, red regions are tRNA genes, beige regions are rRNA genes, and brown is the D-loop. Regions closer to the centre are located on the minus strand

Nucleotide diversity (π), calculated based on alignments of mitochondrial genes between 107 clupeiform and 6 non-clupeiform fish species, showed peaks in the beginning and end of 16S rDNA, as well as in the genes coding for ND1, ND2, ND3, ND5 and ATP synthase membrane subunit 6 (ATP6) (Fig. 2). The genes coding for COI, COII, COIII and CYTB, along with some regions of 12S and 16S rDNA, were relatively less diverse. The alignments for the D-loop, ND4L, ND4 and ND6 genes contained too many gaps to accurately calculate nucleotide diversity and were excluded from this analysis.

Fig. 2
figure 2

Nucleotide diversity (π) at mitochondrial PCGs (green) and rRNA genes (blue). π was calculated for 107 clupeiform and 6 non-clupeiform fish species using a sliding window with a window size of 150 bp and steps of 35 bp. Values could not be calculated for the gene coding for ND4, ND6, ND4L and for the D-loop (red)

Analysis of pairwise genetic distance of PCGs revealed that S. tanganicae and L. miodon were among the 0.3% (PCGs) or 3% (non-coding regions) most similar species of Clupeiformes (17th or 206th most similar out of 5778 pairwise comparisons, respectively), and were the two most similar species of Pellonulini (1st out of 28 pairwise comparisons). When considering non-coding regions, the Tanganyika species pair was only the 22nd most similar, with L. miodon being more similar to most other pellonulines than to S. tanganicae. Among-group comparisons showed that L. miodon and S. tanganicae were almost equally differentiated from the remaining pellonulines when considering PCGs only (distance ± SE for S. tanganicae = 0.208 ± 0.006, L. miodon = 0.210 ± 0.006), but that S. tanganicae was more than twice as differentiated when considering non-coding regions only (distance ± SE for S. tanganicae = 0.221 ± 0.011, L. miodon = 0.106 ± 0.005).

Phylogenetic analysis

Maximum likelihood (ML) and Bayesian inference (Figs. 3, 4) placed S. tanganicae and L. miodon together with the other members of Pellonulini with high statistical support. Within Pellonulini, several genera appeared non-monophyletic. The position of Microthrissa royauxi was unresolved in these phylogenies, but a Bayesian analysis using only Dorosomatinae placed it paraphyletically with M. congica, Pellonula and Odaxothrissa losera with high confidence (Additional file 2: Fig. S2). The Lake Tanganyika sardines formed a well-supported clade nested within Potamothrissa, with P. obtusirostris more closely related to the Tanganyika sardines than to P. acutirostris. In phylogenies resulting from taxon-reduced datasets focusing on Dorosomatinae (Additional file 2), we did not observe any major topological changes compared to those based on the complete dataset. However, some deeper nodes within this subfamily were resolved, such as the placement of Sardinella lemuru with other species of Sardinella (Additional file 2: Figs. S1, S2). In addition, Bayesian inference including all Dorosomatinae and 21 other clupeiforms placed Gudusia chapra with species of Tenualosa, while Anodontostoma chacunda was placed with Nematalosa, Konosirus punctatus and Clupanodon thrissa. Hilsa kelee, Dorosoma, Sardinella and Harengula jaguana also clustered together in this Bayesian phylogeny, and Escualosa thoracata and Amblygaster sirm formed a well-supported clade (Additional file 2: Fig. S2).

Fig. 3
figure 3

Outgroup-rooted maximum likelihood phylogeny of Clupeiformes. Topology and branch lengths were estimated based on mitochondrial protein-coding genes, rRNA genes and D-loop sequence of 107 clupeiform and 6 non-clupeiform fishes. Node support was assessed by Shimodaira-Hasegawa-like approximate likelihood ratio tests (SH-aLRT%) and ultrafast bootstrap (UFBoot%). Nodes with SH-aLRT% < 75 and UFBoot% < 90 were polytomized and their support values are not shown. The scale bar indicates model-corrected evolutionary distance (expected number of nucleotide substitutions per site). Subfamilies are indicated on the right side in black, families in colour. Pellonulini is highlighted in green

Fig. 4
figure 4

Outgroup-rooted Bayesian phylogeny of Clupeiformes. Topology and branch lengths were estimated based on mitochondrial protein-coding genes, rRNA genes and D-loop sequence of 107 clupeiform and 6 non-clupeiform fishes. Node support was assessed by Bayesian posterior probabilities (BPP). Nodes with BPP < 0.85 were polytomized and their support values are not shown. Probabilities were rounded to the nearest 0.01. The scale bar indicates model-corrected evolutionary distance (expected number of nucleotide substitutions per site). Subfamilies are indicated on the right side in black, families in colour. Pellonulini is highlighted in green

Outside of Dorosomatinae, all subfamilies of Clupeiformes, except Clupeinae, and most of their genera were retrieved with high support. Several deeper node placements in both trees, including some of the traditional clupeiform families with low taxonomic coverage, such as Pristigasteridae, Dussumieridae and Clupeidae II, had low support. Overall, ML and Bayesian analyses were in agreement, with the exception of those deeper, poorly supported nodes. We also found some genera split up or ambiguously placed. For example, there was a closer relationship between Lycothrissa crocodilus and Setipinna melanochir than the latter with other species of Setipinna. Thryssa baleama also did not cluster with other representatives of its own genus. Ilisha elongata clustered with Pellona ditchela and Opisthopterus tardoore, while I. africana and I. sirishai branched off earlier. Only two of the three species of Sprattus clustered together. The third, S. sprattus, was sister to the clade consisting of the two species of Clupea.

Dating of divergence time

Bayesian analysis estimated the divergence time of the Lake Tanganyika sardines at 3.64 MYA [95% CI: 0.99, 6.29] and the divergence between the most recent common ancestor (MRCA) of the Tanganyika sardines and their closest living relative in the tree, P. obtusirostris, at 10.92 MYA [95% CI: 6.37, 15.48]. The split between Pellonulini and the other clupeids and thus the timing of a large marine incursion into north-western Africa, was estimated at 43.71 MYA [95% CI: 31.79, 55.63] (Table 1, Fig. 5).

Table 1 Comparison of key divergence times, taxa and markers in the pellonuline phylogeny between studies
Fig. 5
figure 5

Outgroup-rooted time-calibrated phylogeny of Clupeiformes. Divergence times were estimated using BEAST, based on the first and second codon positions of 13 mitochondrial protein coding genes of 107 clupeiform and 6 non-clupeiform fishes. Blue bars represent Bayesian 95% credibility intervals. Calibration points (C1–C3) are indicated on the corresponding nodes. Pellonulini is highlighted in green


We used NGS to sequence and assemble the complete mitochondrial genomes of the Tanganyika sardines, S. tanganicae and L. miodon, and built a phylogeny of Clupeiformes using full mitochondrial sequences with a focus on the West and Central-African tribe Pellonulini. Based on these complete mitogenomes, we estimated the divergence time of the Tanganyika sardines to investigate the timing of their speciation in relation to the geology of Lake Tanganyika.

Conserved gene order in Clupeiformes

Generally, mitochondrial gene arrangements have remained stable for long evolutionary times, but rearrangements do occur in many lineages of both invertebrates and vertebrates. Small rearrangements of neighbouring genes, for example clusters of tRNA-genes, and non-coding regions are especially common [33,34,35]. Several lineages of Actinopterygii are characterized by such rearrangements, but Clupeiformes is not one of them [32]. Our gene order analysis confirmed the conserved arrangement of mitochondrial genes in this order, aside from one transposition of two tRNA genes (tRNA-Pro and tRNA-Thr) in Ilisha elongata.

Inconsistent tree topologies within and outside Pellonulini

Both of our phylogenies (ML and Bayesian) support a single common ancestor for all included pellonuline species and recover Ethmalosa fimbriata as their sister species with high statistical support, consistent with Lavoué et al. [21]. This is in contrast with the results of Egan et al. [20] and Bloom and Lovejoy [19], neither of whom found good support for this sister-species relationship. Microthrissa appeared non-monophyletic in our study, and the Tanganyika sardines rendered Potamothrissa paraphyletic. The position of Microthrissa differs in almost every study that has addressed it. According to Egan et al. [20], M. royauxi is more closely related to the Tanganyika sardines and Potamothrissa, while M. congica clustered with Pellonula and Odaxothrissa. In the study of Wilson et al. [17], two specimens of M. royauxi did not even cluster together, but this may be a taxonomic artefact. Regardless of this possibility, M. congica was found more closely related to Pellonula than to M. royauxi. Bloom and Lovejoy [19], on the other hand, found both Microthrissa species more closely related to members of Pellonula and Odaxothrissa than to the clade including the Tanganyika sardines and Potamothrissa. In accordance, our analysis could not consistently recover the exact position of M. royauxi with high statistical support, but it did confirm that M. congica is more closely related to members of Pellonula and Odaxothrissa than to M. royauxi. The Lake Tanganyika sardines formed a well-supported clade nested within Potamothrissa. Contrarily, our study is the first to place P. obtusirostris and P. acutirostris in the same clade, albeit paraphyletically. With the improved resolution resulting from our whole mitogenome approach and the inclusion of a large number of clupeiform taxa, our study also confirms E. fimbriata as sister species of Pellonulini.

The source of these inconsistent topologies is unclear, but could be related to smaller, more variable and partly incomplete gene datasets in previous studies. Egan et al. [20] included only the gene coding for CYT-B for most members of Pellonulini, including the Tanganyika sardines, and three nuclear genes and/or the 16S rRNA gene for others. Bloom and Lovejoy [19] did not include O. losera and P. acutirostris and used CYT-B and 16S rRNA genes for most, and two nuclear genes for two species, while Wilson et al. [17] used only mitochondrial genes (CYT-B, 16S rRNA, 12S rRNA). None of the previous studies included more than around 5 kbp of alignment data, except Lavoué et al. [21], who included all PCG, rRNA and tRNA sequences which amounted to slightly over 10kbp. Taxonomic coverage was also highly variable, ranging from 49 to 190 clupeoid species, the lowest number belonging to Wilson et al. [17], which also had the largest credibility interval but had the highest taxonomic coverage of Pellonulini. The Tanganyika sardines and other pellonulines were also missing from several of these studies. Although the inclusion of taxa with incomplete datasets can help to resolve phylogenies [36], there is a trade-off with increased risk of phylogenetic artefacts, and difficulty detecting multiple substitutions [37, 38]. In contrast, our study had the first nearly complete dataset for all taxa, thanks to the readily available mitochondrial genomes from many pellonuline and other clupeid species.

Morphological diversity is relatively low in representatives of Pellonulini compared to for example cichlids [15, 39]. In the FAO species catalogue of clupeoid fishes, several ambiguous identifications and uncertain species descriptions are mentioned. For instance, the distinction between O. losera and O. vittata is based solely on the number of gill rakers, which also varies with the age of the specimen, a common occurrence among clupeid fishes. In P. leonensis, there is also evidence for undescribed subspecies exhibiting characteristics of both P. leonensis and P. vorax, or even specimens belonging to Cynothrissa [39]. It is thus not inconceivable that misidentifications have confounded past taxonomic studies, and that a taxonomic revision of these genera may be needed.

Outside of Pellonulini, we recovered most of the traditional families and subfamilies of Clupeiformes with high statistical support. The positions of the families with lower taxonomic coverage, including Pristigasteridae, Chirocentridae, Dussumieridae, remained unresolved, Clupeidae was not monophyletic and several species were also placed away from their congeners, for example in the genera Setipinna, Thryssa, Ilisha and Sprattus. These latter two findings are consistent with previous studies of Clupeiformes with taxonomic coverage comparable to ours [19, 20, 40], underlining the need for revision of these taxa.

Inconsistent divergence time estimates in Clupeiformes

Bayesian analysis estimated the divergence time of the Lake Tanganyika sardines at around 3.64 MYA [95% CI: 0.99, 6.29]. This estimate is younger than the previous estimate by Wilson et al. [17] at 7.6 MYA, but within its credibility interval [95% CI: 2.1, 15.9]. Conversely, their estimate fell just outside our credibility interval. Our estimate is also younger compared to the one by Bloom and Lovejoy [19] at 6.61 MYA [95% CI: 2.20, 11.01], but in accordance with Egan et al. [20] at 3.91 MYA [95% CI: 1.19, 6.64]. Our credibility intervals were smaller than those of Wilson et al. [17], but comparable to Bloom and Lovejoy [19] and Egan et al. [20]. Other, deeper nodes of interest also differed between the studies. Overall, our estimates most closely agreed with those of Egan et al. [20], except for the divergence time of the pellonulines from other clupeids, which was around 10 MYA older in our study. Bloom and Lovejoy [19] found consistently older estimates, while Wilson et al. [17] estimated the more recent nodes as older, and the deeper nodes as younger than the other three studies.

The differences in estimated divergence times between the studies can be partially attributed to the different estimation procedures. Specifically, methodological choices for Bayesian dating of nodes can strongly influence the accuracy and precision of the divergence time estimates, for example the choice of priors to account for uncertainty surrounding the age of a fossil, and the choice of clock model [31, 41]. Almost all the studies we compared here dated divergence using a fossil-calibrated uncorrelated (relaxed) clock model implemented in BEAST, accounting for substitution rate heterogeneity among branches. Six to eight fossil calibrations were specified as exponential priors with soft maximum ages. Only Wilson et al. [17] used an autocorrelated clock approach with seven fossil calibrations specified as uniform priors. In accordance with the three more recent studies on the divergence times of Clupeiformes [19,20,21], but in contrast with Wilson et al. [17], we chose a relaxed clock model, in accordance with the varying speeds of diversification in different clupeid lineages [40]. A possible caveat of our divergence time dating analysis is the exclusion of highly variable sequences of the D-loop region, rRNAs and third codon positions, which was necessary to achieve convergence of the model. These sites may be useful to resolve ambiguous recent divergences [42] and produce different, most likely slightly older, divergence time estimates.

Ideally, time-calibration of the diversification of the pellonulines would be based on fossils within this clade. Unfortunately, there are no known pellonuline fossils, and the fossil record of fishes of Central Africa in general is sparse [16]. In contrast to Wilson et al. [17], Bloom and Lovejoy [19] and Egan et al. [20], we decided to use pre-calibrated time scaling points (“secondary calibration”) from a previously published study [43] instead of direct fossil calibration (“primary calibration”). Secondary calibrations can result in younger or older, depending on their placement, and falsely narrowed estimates of node ages [38, 44], but were necessary in our case to achieve convergence. Some caution is warranted when interpreting the width of our confidence intervals, but for the reasons described in the methods section ‘Dating of divergence time’, we are confident that our methodological choices have produced node age estimates with the best possible accuracy. The robustness of our approach is further supported by correspondence of our estimates to fossil ages from the literature. The ancestor of Clupeoidei was estimated at 127.48 [95% CI: 95.70, 159.27] in our study, older than the minimum age of 125 MYA attributed to the fossil of Cynoclupea nelsoni [45]. The MRCA of Engraulidae was estimated at 60.10 MYA [95% CI: 44.70, 75.50], corresponding to the minimum age of the fossil of Eoengraulis fasoloi of 50 MYA [46]. Dorosoma petenense was estimated as relatively old at 17.22 MYA [95% CI: 5.46, 28.97], but again within the boundaries of the minimum age of 2.5 MYA of the oldest fossil of Dorosoma petenense [47].

Utility of whole mitochondrial genomes for phylogenomic analysis and divergence time dating

Mitochondrial protein coding genes vary in their ability to recover known phylogenetic topologies. The sequences of ND4, ND5, COI and CYTB genes are generally useful for phylogenetic questions, while fast evolving genes such as ND4L and ATP8 are regarded as poor phylogenetic performers, although this differs per study and taxon [25, 26]. Indeed, we found relatively high nucleotide diversity in some genes or regions compared to others, including parts of the genes coding for the ATP6, ND1, ND2, ND3 and parts of ND5. However, whole mitochondrial genomes can recover accurate phylogenies with high resolution, despite containing “poor” phylogenetic performers [27, 48]. A smaller subset of “good” mitochondrial genes may be able to recover the same topology as the entire mitochondrial genome, but this is highly taxon-specific [25,26,27, 48]. Thus, utilizing more markers that provide complementary information is preferable if previous taxon-specific information on the utility of single markers is not available [27].

Phylogenies based on mitochondrial DNA (mtDNA) alone come with limitations [24, 49]. First, mtDNA is subject to frequent introgression, horizontal gene transfer, incomplete lineage sorting, and mitochondrial capture. As a result, past hybridization can go undetected in the absence of nuclear or morphological data [24, 50,51,52,53,54]. Second, sequencing or assembling nuclear pseudogenes of mitochondrial origin into mitogenomes can introduce false polyphyly within species or closely related taxa [55]. Third, the fast substitution rates of mtDNA make accurate estimation of deep divergences difficult due to problems with saturation and ensuing homoplasy [26, 27]. Overall, nuclear and mtDNA data can contrast or complement each other both in terms of tree topology and branch lengths [7, 8, 40]. Inconsistencies between them should be considered informative, and future studies should strive to include both data sources to reconstruct more complete evolutionary scenarios [49].

There are several examples of ongoing hybridization between clupeid species [56,57,58,59]. With their similar habitat, nursery areas, and modes of reproduction, the Tanganyika sardines may well have a history of introgression that has remained concealed here. At the start of the Eocene (around 50 MYA), global sea levels were more than 100 m higher than today, and steadily decreased over the next 20 million years, with smaller maxima in between [16, 60, 61]. These fluctuations may have allowed frequent isolations and reconnections in the Congo Basin, favouring hybridization between other newly formed pellonuline species as well. To completely resolve the species tree of Pellonulini, phylogenomic analyses using nuclear genomic markers and multiple individuals per species are needed (but see Bloom & Egan [40], who found similar divergence time estimates with mtDNA and nuclear DNA datasets).

Updated divergence time suggests intralacustrine speciation of the Tanganyika sardines

Present-day distributions of several Afrotropical freshwater fish lineages show striking overlap, including members of Pellonulini, Kneriidae and Phractolaemidae, providing evidence for a single marine-freshwater transition across West- and Central Africa around 50 MYA during a period of high sea levels [16]. Despite the high sea levels, Lake Tanganyika was likely never in direct contact with the ocean and has not experienced much higher water levels than at present [3, 6]. Furthermore, due to uplift of the borders of the Congo Basin from the Cenozoic onwards, the possibility of an additional marine incursion close to the lake is faint [18]. It is therefore more likely the Lake Tanganyika sardines evolved from riverine clupeids. Indeed, the presence of a large body of water covering a large area of the Congo Basin (“paleo-lake Congo”) until the Pliocene or early Pleistocene (2–12 MYA, [62, 63]), may have increased the connectivity between the Congo tributaries and its surrounding lakes, and may have facilitated the entry of riverine species into the predecessor of Lake Tanganyika at this time.

Our improved divergence time estimates of the Tanganyika sardines (3.64 MYA) and their MRCA from other pellonulines (10.92 MYA) help us to better understand their origin and colonisation time in connection to the geological history of the lake. Our estimates are compatible with (1) the entrance of the MRCA of the Tanganyika sardines into the newly formed Tanganyika basin (around 12 MYA) via the tributaries of the proto-Malagarasi-Congo River; and (2) intralacustrine speciation at the onset of deep- and clearwater conditions after the sub-basins fused (5–6 MYA). However, based on the 95% credibility intervals of our estimates, we cannot exclude the possibility that the MRCA of the Tanganyika sardines diverged from P. obtusirostris outside of the proto-lake and entered it sometime between the time of its formation and the fusion of its sub-basins.

Which environmental conditions triggered the divergence between S. tanganicae and L. miodon remains uncertain. Sexual selection, such as in cichlids [64], is unlikely to have played a large role due to the mode of reproduction of the clupeids. Ecological differences can be powerful drivers of speciation, even in (partial) sympatry [65, 66]. The newly fused basin, adding ecological heterogeneity to the ancestral sardine’s environment, may have favoured dietary specialization through divergent selection on polymorphic trophic traits. Niche separation and divergence can then prompt genetic reproductive isolation if reinforced by spatial or temporal separation of spawning or lower hybrid fitness [65,66,67]. Indeed, contemporary populations of L. miodon seem to spawn all year round and mostly in the littoral, while populations of S. tanganicae exhibit clear peak spawning times in the pelagic [68]. This suggests that at some point during their divergence, spawning became more common in their respective preferred habitats. An alternative explanation is that their speciation was triggered by periods of allopatry [65, 67]. Given our credibility intervals and the frequent water-level fluctuations potentially separating and reconnecting the southern and central sub-basins of Lake Tanganyika several times, it is likely that ancestral sardine populations frequently occurred in partial or complete isolation.

A Limnothrissa-like ancestor of the Tanganyika sardines?

According to our ML and Bayesian phylogenies, the Tanganyika sardines are most closely related to P. obtusirostris, P. acutirostris, and M. royauxi, but not M. congica, the closest living relative being P. obtusirostris. Ecological studies of these species are sparse, hindering systematic comparison. Much of their distribution overlaps and, aside from M. royauxi, stretches across most of the Congo Basin all the way down to the Lukuga River, which was connected to the Malagarasi River east of Tanganyika around the time of the lake’s formation. While M. congica and P. acutirostris occur in both rivers and lakes, P. obtusirostris and M. royauxi seem to be more strictly riverine [39]. The diet of P. obtusirostris and M. congica consists mostly of aquatic and terrestrial insects [39, 69], with occasional piscivory in M. congica [69]. In M. congica, strong seasonal effects of water level fluctuations on both diet and reproduction have been observed [69, 70]. Ecologically, L. miodon, with its generalist diet including insects and small fishes, is more similar to the riverine pellonulines than S. tanganicae, which is a strict planktivore. In addition, individuals of L. miodon and species of Potamothrissa share a morphological feature that is otherwise rare in clupeid fishes: a row of saw-like teeth at the side of the lower jaw [39]. We thus suggest that the ancestral Tanganyika sardine shared more ecological traits with L. miodon than with S. tanganicae. This is also reflected in the more shorebound and generalist lifestyle of L. miodon, and its ability to invade the Cahora Bassa reservoir though dispersal via the riverine environment of the Zambezi [71], suggesting a relatively high ecological flexibility compared to S. tanganicae [72], and thus a higher ability to colonize a new environment. Nevertheless, the presence of established contemporary populations of S. tanganicae in one of the Congo’s tributaries, the Lukuga, attest its ability to inhabit, or at least cross, non-pelagic environments, provided the water composition is sufficiently similar [73]. We also found larger genetic differentiation of S. tanganicae than L. miodon from the remaining pellonulines in non-coding regions. This could further support our hypothesis of a higher relatedness between L. miodon and the ancestral sardine, but may also indicate different demographic histories [74]. Kmentová et al. [75] found signatures of recent population expansion in both L. miodon and S. tanganicae, but these were more pronounced in the latter. The population expansion in S. tanganicae might be linked to the fusion of sub-basins, or any other major lake-level fluctuation that increased the amount of pelagic habitat. Similarly, species of the pelagic cichlid tribe Bathybatini showed recent demographic expansions, probably also linked to lake-level fluctuations [76].


Using NGS data, we assembled and annotated the full mitochondrial genomes of the Tanganyika sardines S. tanganicae and L. miodon. Putting them into phylogenetic context with full mitochondrial genomes of 107 other clupeiform species, we estimate their divergence time at 3.64 MYA, and divergence from their riverine ancestor at 10.92 MYA. This estimate implies that the MRCA of the Tanganyika sardines entered Lake Tanganyika shortly after its formation during a period of high connectivity of the Congo Basin’s water bodies. We suggest that the speciation event is likely to have been brought on by the fusion of Lake Tanganyika’s sub-basins and the subsequent clearwater conditions.

The mitochondrial genomes of S. tanganicae and L. miodon are valuable resources for future studies of the evolutionary history of these species at the population level, for example as a reference for barcoding, studies of their mitochondrial diversity and evolutionary history, as well as macroevolutionary study of relationships within Pellonulini and Clupeiformes. Future work should focus on the divergence time of different regions of the Tanganyika sardines’ genomes and compare them to a dataset of nuclear genes or genome-wide data. This, in combination with formal tests for hybridization, could help to gauge the role of introgression in the timing and the scenario of speciation. Nuclear genomic sequences from several individuals of all members of Pellonulini would allow a more precise reconstruction of their colonization of West-Africa and clarify the ambiguous classifications in this group.


DNA extraction, library preparation and sequencing

One female individual of the two species was collected from liftnet fishing catches on the night of 15th of December 2018 off the shore of Uvira, Democratic Republic of Congo. Fish were dissected to extract liver tissue, which was directly frozen on dry ice and subsequently stored at − 20 °C until extraction. High molecular weight genomic DNA (gDNA) was extracted using a Blood and cell culture DNA Midi Kit (Qiagen). Libraries were prepared for each species separately using Chromium Genome Library & Gel Bead Kit v.2 (10X Genomics, cat. 120258), Chromium Genome Chip Kit v.2 (10X Genomics, cat. 120257), Chromium i7 Multiplex Kit (10X Genomics, cat. 120262) and Chromium controller according to the manufacturer’s instructions with one modification (added shearing step before Illumina library preparation). Briefly, gDNA diluted to 1.02 ng/μl was combined with Master Mix, a library of Genome Gel Beads, and partitioning oil to create Gel Bead-in-Emulsions (GEMs) on a Chromium Genome Chip. The GEMs were isothermally amplified with primers containing an Illumina Read 1 sequencing primer, a unique 16 bp 10X barcode and a 6 bp random primer sequence. Barcoded DNA fragments were recovered for Illumina library construction. The amount and fragment size of post-GEM DNA was quantified using a Bioanalyzer 2100 with an Agilent High sensitivity DNA kit (Agilent, cat. 5067-4626). Prior to Illumina library construction, the GEM amplification product was sheared on an E220 Focused Ultrasonicator (Covaris, Woburn, MA) to approximately 350 bp (55 s at peak power = 175, duty factor = 10, and cycle/burst = 200). Then, the sheared GEMs were converted to a sequencing library following the 10X standard operating procedure. The library was quantified by qPCR with a Kapa Library Quant kit (Kapa Biosystems-Roche) and sequenced on a partial lane of the NovaSeq6000 sequencer (Illumina, San Diego, CA) with paired-end 150 bp reads.

Mitochondrial genome assembly

For mitogenome assembly, raw 10X Chromium reads were processed using the proc10xG package [77]. was run using default settings to remove GEM and individual sample barcodes. The resulting reads passed read assessment by FastQC v0.11.7 [78] without any quality problems, residual adapters or overrepresented sequences, the latter also commonly indicating adapter contamination. Mitogenomes of the two sardines were assembled from these barcode trimmed reads using MitoZ v.2.4-alpha [79] with default settings. Mitochondrial genes were annotated using the Mitofish annotator web service [80].

Taxonomic sampling and alignment

Taxonomic sampling for phylogenetic analysis included all members of Clupeiformes for which a complete mitochondrial genome is published (accessed 21st of October 2020, Additional file 1: Table S1). The sequences of Odaxothrissa vittata (NC_009590.1) and Etrumeus teres (NC_009583.1) were identical to Pellonula vorax and E. micropus, respectively, and were omitted from analysis. The outgroup was selected based on Lavoué et al. [21] and includes the denticle herring (Denticeps clupeoides), two alepocephaliforms, four ostariophysians and two euteleosts (Additional file 1: Table S1). We extracted mitogenomes and their annotations from NCBI using a combination of efetch [81] and custom Bash, Perl and Python scripts. We manually verified the new annotations of S. tanganicae and L. miodon by comparing the nucleotide and amino acid sequences, translated using vertebrate mitochondrial code, to the already published pellonuline mitogenomes, and checking for the presence of start and stop-codons at the appropriate positions in MEGA-11 [82].

We separately aligned each PCG using a codon-based MAFFT algorithm in the TranslatorX server [83]. We selected options for less stringent selection which allowed smaller final blocks with gap positions and less strict flanking positions. D-loop sequences were aligned using MAFFT v.7.470 with default parameters, and sequences of the rRNA coding regions using MAFFT with the -qinsi option [84]. Incomplete or missing regions were coded as missing data (N). We performed all alignments with and without alignment cleaning by Gblocks, further referred to as ‘complete’ and ‘trimmed’ alignments. We used AMAS v.0.98 [85] to separately concatenate the trimmed and complete alignments, producing one trimmed and one complete dataset. The phylogenetic content of these datasets was compared using likelihood mapping [86] implemented in TREE-PUZZLE v.5.3.rc16 [87]. We determined the optimal model for each dataset using jModelTest 2 [88] and specified these as input models for TREE-PUZZLE. Since there was no difference in phylogenetic content (86.6% fully resolved quartets, 2.3% partly resolved, 11.1% unresolved), we performed all subsequent analyses on the complete dataset of 18,279 bp.

Genetic diversity, divergence, and gene order

We calculated nucleotide diversity (π) of the final alignment using a sliding window analysis implemented in DnaSP v.6 [89] with a window size of 300 bp and steps of 15 bp. We used MEGA to quantify divergence between species and clades. We calculated pairwise genetic distances between all species, and mean between-group genetic distances between S. tanganicae, L. miodon and the remaining members of Pellonulini and Clupeiformes (in each of these comparisons excluding the other Tanganyika clupeid). The distances were calculated separately for PCGs (vertebrate mitochondrial code) and non-coding regions using a Tamura-Nei model including transitions and transversions, gamma-distributed rate variation among sites and heterogenous rate patterns among lineages. Gaps and missing data were deleted in a pairwise manner. The gamma parameter was estimated separately for the PCG and non-coding dataset using jModelTest 2. To estimate the relative similarity of S. tanganicae and L. miodon compared to similarities among other clupeids, we ranked all pairwise genetic distances of (1) Clupeiformes and (2) Pellonulini and calculated in which percentile the Tanganyika sardines fell using R v.4.0.4 [90]. Finally, we compared the gene order of all species included in our study using the CREx web application [91]. Two species, Alosa fallax and Hilsa kelee, were missing several markers (genes), and were thus excluded from this analysis.

Phylogenomic tree building

We ran IQ-TREE v.1.6.12 [92] twice on the complete dataset. The first run determined the best partition scheme (option -m MF + MERGE), allowing different models of molecular evolution in different genes/regions and, for PCGs, at the different codon positions [93]. The second run first implemented ModelFinder [94] to find the optimal model of evolution for each partition found by the previous round (options -spp and -m MFP), then constructed a ML tree, and finally assessed nodal support by 10,000 ultrafast bootstraps (generating support value UFBoot%) and 1,000 Shimodaira-Hasegawa-like approximate likelihood ratio test replicates (generating support value SH-aLRT%). A clade can be considered well supported if UFBoot% ≥ 95 (corresponding to a ~ 95% chance that the clade is true) and SH-aLRT% ≥ 80 [95, 96].

Using the IQ-TREE-derived partition and models, we constructed a Bayesian phylogeny in MrBayes v.3.2.7a [97], allowing estimation of the model parameters for each partition separately (unlinked character state frequencies, substitution rates of the GTR + I + G model). Two independent runs with 4 MCMC chains ran for 60 million generations, sampling every 500 generations and discarding the first 25% as burn in. The remaining samples were used to calculate Bayesian posterior probabilities (BPP) for each node in order to assess nodal support. A clade is considered well supported if BBP ≥ 0.9. The models converged, as indicated by the average standard deviation of split frequencies approaching zero, the absence of a trend in log likelihood of the runs, an Effective Sample Size (ESS) > 200, and the Potential Scale Reduction Factor approaching 1 [98].

Dating of divergence time

We estimated branch lengths and divergence times between S. tanganicae and L. miodon and five other nodes of interest (Table 1) using Bayesian relaxed molecular clock analysis implemented in BEAST v.2.6.3 [99] with the tree topology from MrBayes as a starting tree. We conducted four independent BEAST runs of 50 million generations. Convergence was ensured by checking if ESS was higher than 200 for all parameters using Tracer v.1.7.1 [100]. Trees were summarized and annotated using the TreeAnnotator module in BEAST. All final trees were visualized using ggtree v. in R [101].

For the time calibration of our tree, we first attempted a “primary calibration”, which relies solely on fossils, using the complete dataset. However, when this model failed to converge, even after hundreds of millions of generations, we modified our analysis in two ways. First, instead of primary calibration, we chose to apply “secondary calibration”, which uses estimates from an already existing phylogeny. We used three calibration points from a recent phylogeny of the teleosts using more than 30 fossils [43]. We included the MRCA of members of Clupeiformes and our outgroup (including Danio rerio, Cyprinus carpio and Chanos chanos) at 194 MYA (C1), MRCA of members of Clupeinae and Alosinae at 73 MYA (C2), and MRCA of Engraulidae at 61 MYA (C3). The calibration points were implemented as normal prior distributions for the node ages in BEAST. Secondary calibration inevitably incorporates geological and fossil uncertainty along with uncertainties associated with the primary dataset. They tend to push node age estimates into the more recent direction and falsely narrow the credibility intervals, especially when using a single old secondary calibration [38, 44], but see Powell et al. [102]. In our case, we presume the associated problems to be minimal for four reasons. (1) We used secondary calibrations for both old and younger nodes, which should diminish the tendency to estimate other nodes as younger [44]. (2) In Bayesian analysis implemented in BEAST, lognormally or exponentially distributed prior distributions are most commonly used for primary calibrations. These produce younger node age estimates and narrower credibility intervals than uniform (and presumably normally distributed) priors [44], which are more commonly used for secondary calibrations. (3) We validated the robustness of our secondary calibration by comparing the results of the BEAST dating to primary fossil calibration points from the literature, see discussion section ‘Inconsistent divergence time estimates in Clupeiformes’. (4) The true uncertainty associated with secondary calibrations from Hughes et al. [43] based on more than 30 well-characterized fossils is likely smaller than that produced by using only three or four primary fossils with lognormal distributions with large variance.

Second, we omitted the hypervariable D-loop sequence, regions coding for rRNA, and the third codon positions, leaving a site-reduced dataset with only first and second codon positions of all PCGs. We also merged the separate partitions found by IQ-TREE into two partitions containing first and second codon positions. Since variable regions can be informative for estimating recent divergences, we repeated the analysis with two taxon-reduced datasets focusing on Dorosomatinae, in which we kept these variable regions (Additional file 2).

Finally, we compared our divergence time estimates for six nodes of interest with published estimates [17, 19,20,21] (Table 1). Estimates and 95% CI that were not directly reported in these publications were extracted from time-calibrated trees using WebPlotDigitizer v.4.5 [103].

Availability of data and materials

The datasets supporting the conclusions of this article are available in the NCBI Sequence Read Archive [] and GenBank [Accessions: OP022425, OP021863] repositories. Custom scripts are available on Github:



Million years ago


Cytochrome c oxidase subunit I


Cytochrome B


Ribosomal RNA


NADH-ubiquinone oxidoreductase


NADH-ubiquinone oxidoreductase chain 4L


ATP synthase membrane subunit 6


Mitochondrial DNA


Genomic DNA


Gel bead-in-emulsions


Transfer RNA


Next generation sequencing


Protein-coding genes


Maximum likelihood


Most recent common ancestor


Credibility interval


Bayesian posterior probabilities


Effective sample size


Centre de Recherche en Hydrobiologie-Uvira


  1. Salzburger W, Van Bocxlaer B, Cohen AS. Ecology and evolution of the African Great Lakes and their faunas. Annu Rev Ecol Evol Syst. 2014;45(1):519–45.

    Article  Google Scholar 

  2. Cohen AS, Soreghan MJ, Scholz CA. Estimating the age of formation of lakes: an example from Lake Tanganyika, East African Rift system. Geology. 1993;21(6):511–4.

    Article  CAS  Google Scholar 

  3. Tiercelin J, Mondeguer A. The geology of the Tanganyika Trough. In: Coulter GW, editor. Lake Tanganyika and its life. Oxford: Oxford University Press; 1991. p. 7–48.

    Google Scholar 

  4. Tiercelin J-J, Lezzar K-E. A 300 million years history of rift lakes in central and east Africa: an updated broad review. In: Odada EO, Olago DO, editors. The East African great lakes: limnology, palaeolimnology and biodiversity advances in global change research, vol. 12. Springer: Dordrecht; 2002. p. 3–60.

    Chapter  Google Scholar 

  5. Cohen AS, Stone JR, Beuning KRM, Park LE, Reinthal PN, Dettman D, et al. Ecological consequences of early Late Pleistocene megadroughts in tropical Africa. Proc Natl Acad Sci U S A. 2007;104(42):16422–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Cohen AS, Lezzar KE, Tiercelin AJJ, Soreghan M. New palaeogeographic and lake-level reconstructions of Lake Tanganyika: implications for tectonic, climatic and biological evolution in a rift lake. Basin Res. 1997;9(2):107–32.

    Article  Google Scholar 

  7. Salzburger W, Meyer A, Baric S, Verheyen E, Sturmbauer C. Phylogeny of the Lake Tanganyika cichlid species flock and its relationship to the Central and East African haplochromine cichlid fish faunas. Syst Biol. 2002;51(1):113–35.

    Article  PubMed  Google Scholar 

  8. Sturmbauer C, Salzburger W, Duftner N, Schelly R, Koblmüller S. Evolutionary history of the Lake Tanganyika cichlid tribe Lamprologini (Teleostei: Perciformes) derived from mitochondrial and nuclear DNA data. Mol Phylogenet Evol. 2010;57(1):266–84.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Meyer BS, Matschiner M, Salzburger W. Disentangling Incomplete lineage sorting and introgression to refine species-tree estimates for Lake Tanganyika cichlid fishes. Syst Biol. 2017;66(4):531–50.

    CAS  PubMed  Google Scholar 

  10. Klett V, Meyer A. What, if anything, is a Tilapia?—mitochondrial ND2 phylogeny of tilapiines and the evolution of parental care systems in the African cichlid fishes. Mol Biol Evol. 2002;19(6):865–83.

    Article  CAS  PubMed  Google Scholar 

  11. Koch M, Koblmüller S, Sefc KM, Duftner N, Katongo C, Sturmbauer C. Evolutionary history of the endemic Lake Tanganyika cichlid fish Tylochromis polylepis: a recent intruder to a mature adaptive radiation. J Zool Syst Evol Res. 2007;45(1):64–71.

    Article  Google Scholar 

  12. Ronco F, Matschiner M, Böhne A, Boila A, Büscher HH, El Taher A, et al. Drivers and dynamics of a massive adaptive radiation in cichlid fishes. Nature. 2021;589(7840):76–81.

    Article  CAS  PubMed  Google Scholar 

  13. Meyer BS, Matschiner M, Salzburger W. A tribal level phylogeny of Lake Tanganyika cichlid fishes based on a genomic multi-marker approach. Mol Phylogenet Evol. 2015;83:56–71.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Irisarri I, Singh P, Koblmüller S, Torres-Dowdall J, Henning F, Franchini P, et al. Phylogenomics uncovers early hybridization and adaptive loci shaping the radiation of Lake Tanganyika cichlid fishes. Nat Commun. 2018;9(1):3159.

  15. Gourène G, Teugels GG. Synopsis de la classification et phylogénie des Pellonulinae de l’Afrique Occidentale et Centrale (Teleostei; Clupeidae). J Afr Zool. 1994;108(1):77–91.

    Google Scholar 

  16. Lavoué S. Origins of Afrotropical freshwater fishes. Zool J Linn Soc. 2020;188(2):345–411.

    Google Scholar 

  17. Wilson AB, Teugels GG, Meyer A. Marine incursion: The freshwater herring of Lake Tanganyika are the product of a marine invasion into West Africa. PLoS One. 2008;3(4):e1979.

  18. Giresse P. Mesozoic-Cenozoic history of the Congo Basin. J Afr Earth Sci. 2005;43(1–3):301–15.

    Article  Google Scholar 

  19. Bloom DD, Lovejoy NR. The evolutionary origins of diadromy inferred from a time-calibrated phylogeny for Clupeiformes (herring and allies). Proc R Soc B Biol Sci. 2014;281(1778):20132081.

  20. Egan JP, Bloom DD, Kuo CH, Hammer MP, Tongnunui P, Iglésias SP, et al. Phylogenetic analysis of trophic niche evolution reveals a latitudinal herbivory gradient in Clupeoidei (herrings, anchovies, and allies). Mol Phylogenet Evol. 2018;124(March):151–61.

    Article  PubMed  Google Scholar 

  21. Lavoué S, Miya M, Musikasinthorn P, Chen WJ, Nishida M. Mitogenomic evidence for an Indo-West Pacific origin of the Clupeoidei (Teleostei: Clupeiformes). PLoS One. 2013;8(2):e56485.

  22. Coulter GW. Biomass, production, and potential yield of the Lake Tanganyika pelagic fish community. Trans Am Fish Soc. 1981;110(3):325–35.

    Article  Google Scholar 

  23. Mölsä H, Reynolds JE, Coenen EJ, Lindqvist OV. Fisheries research towards resource management on Lake Tanganyika. Hydrobiologia. 1999;407:1–24.

    Article  Google Scholar 

  24. Ballard JWO, Whitlock MC. The incomplete natural history of mitochondria. Mol Ecol. 2004;13(4):729–44.

    Article  PubMed  Google Scholar 

  25. Cummings MP, Otto SP, Wakeley J. Sampling properties of DNA sequence data in phylogenetic analysis. Mol Biol Evol. 1995;12(5):814–22.

    CAS  PubMed  Google Scholar 

  26. Zardoya R, Meyer A. Phylogenetic performance of mitochondrial protein-coding genes in resolving relationships among vertebrates. Mol Biol Evol. 1996;13(7):933–42.

    Article  CAS  PubMed  Google Scholar 

  27. Duchêne S, Archer FI, Vilstrup J, Caballero S, Morin PA. Mitogenome phylogenetics: the impact of using single regions and partitioning schemes on topology, substitution rate and divergence time estimation. PLoS One. 2011;6(11):e27138.

  28. Lv Y, Li Y, Ruan Z, Bian C, You X, Yang J, et al. The complete mitochondrial genome of Glyptothorax macromaculatus provides a well-resolved molecular phylogeny of the Chinese sisorid catfishes. Genes. 2018;9(6):1–13.

    Article  Google Scholar 

  29. Yamanoue Y, Miya M, Matsuura K, Katoh M, Sakai H, Nishida M. A new perspective on phylogeny and evolution of tetraodontiform fishes (Pisces: Acanthopterygii) based on whole mitochondrial genome sequences: basal ecological diversification? BMC Evol Biol. 2008;8(1):1–14.

    Article  Google Scholar 

  30. Sullivan KAM, Platt RN, Bradley RD, Ray DA. Whole mitochondrial genomes provide increased resolution and indicate paraphyly in deer mice. BMC Zool. 2017;2(1):1–6.

    Article  Google Scholar 

  31. Zhang P, Papenfuss TJ, Wake MH, Qu L, Wake DB. Phylogeny and biogeography of the family Salamandridae (Amphibia: Caudata) inferred from complete mitochondrial genomes. Mol Phylogenet Evol. 2008;49(2):586–97.

    Article  CAS  PubMed  Google Scholar 

  32. Satoh TP, Miya M, Mabuchi K, Nishida M. Structure and variation of the mitochondrial genome of fishes. BMC Genomics. 2016;17(1):1–20.

    Article  Google Scholar 

  33. Boore JL. Animal mitochondrial genomes. Nucleic Acids Res. 1999;27(8):1767–80.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Dowton M, Castro LR, Austin AD. Mitochondrial gene rearrangements as phylogenetic characters in the invertebrates: the examination of genome “morphology.” Invertebr Syst. 2002;16(3):345–56.

    Article  Google Scholar 

  35. Satoh TP, Sato Y, Masuyama N, Miya M, Nishida M. Transfer RNA gene arrangement and codon usage in vertebrate mitochondrial genomes: a new insight into gene order conservation. BMC Genomics. 2010;11(1):479.

  36. Wiens JJ. Missing data and the design of phylogenetic analyses. J Biomed Inform. 2006;39(1):34–42.

  37. Roure B, Baurain D, Philippe H. Impact of missing data on phylogenies inferred from empirical phylogenomic data sets. Mol Biol Evol. 2013;30(1):197–214.

    Article  CAS  PubMed  Google Scholar 

  38. Schenk JJ. Consequences of secondary calibrations on divergence time estimates. PLoS One. 2016;11(1):e0148228.

  39. Whitehead PJP. Clupeoid fishes of the world (Suborder Clupeoidei): an annotated and illustrated catalogue of the herrings, sardines, pilchards, sprats, shads, anchovies and wolf-herrings. Rome, Italy: FAO; 1985.

  40. Bloom DD, Egan JP. Systematics of clupeiformes and testing for ecological limits on species richness in a trans-marine/freshwater clade. Neotrop Ichthyol. 2018;16(3):e180095.

  41. Battistuzzi FU, Filipski A, Hedges SB, Kumar S. Performance of relaxed-clock methods in estimating evolutionary divergence times and their credibility intervals. Mol Biol Evol. 2010;27(6):1289–300.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Simmons MP, Zhang LIB, Webb CT, Reeves A. How can third codon positions outperform first and second codon positions in phylogenetic inference? An empirical example from the seed plants. Syst Biol. 2006;55(2):245–58.

    Article  PubMed  Google Scholar 

  43. Hughes LC, Ortí G, Huang Y, Sun Y, Baldwin CC, Thompson AW, et al. Comprehensive phylogeny of ray-finned fishes (Actinopterygii) based on transcriptomic and genomic data. Proc Natl Acad Sci U S A. 2018;115(24):6249–54.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Sauquet H, Ho SYW, Gandolfo MA, Jordan GJ, Wilf P, Cantrill DJ, et al. Testing the impact of calibration on molecular divergence times using a fossil-rich group: the case of Nothofagus (Fagales). Syst Biol. 2012;61(2):289–313.

    Article  PubMed  Google Scholar 

  45. Malabarba MC, Di Dario F. A new predatory herring-like fish (Teleostei: Clupeiformes) from the early Cretaceous of Brazil, and implications for relationships in the Clupeoidei. Zool J Linn Soc. 2017;180(1):175–94.

    Google Scholar 

  46. Marramà G, Carnevale G. An Eocene anchovy from Monte Bolca, Italy: the earliest known record for the family Engraulidae. Geol Mag. 2016;153(1):84–94.

    Article  Google Scholar 

  47. Miller RR. First fossil record (Plio-Pleistocene) of Threadfin Shad, Dorosoma petenense, from the Gatuña formation of Southeastern New Mexico. J Paleontel. 1982;56(2):423–5.

    Google Scholar 

  48. Williams SM, McDowell JR, Bennett M, Graves JE, Ovenden JR. Analysis of whole mitochondrial genome sequences increases phylogenetic resolution of istiophorid billfishes. Bull Mar Sci. 2018;94(1):73–84.

    Article  Google Scholar 

  49. Rubinoff D, Holland BS. Between two extremes: mitochondrial DNA is neither the panacea nor the nemesis of phylogenetic and taxonomic inference. Syst Biol. 2005;54(6):952–61.

    Article  PubMed  Google Scholar 

  50. Chan KMA, Levin SA. Leaky prezygotic isolation and porous genomes: rapid introgression of maternally inherited DNA. Evolution (N Y). 2005;59(4):720–9.

    CAS  Google Scholar 

  51. Funk DJ, Omland KE. Species-level paraphyly and polyphyly: frequency, causes, and consequences, with insights from animal mitochondrial DNA. Annu Rev Ecol Evol Syst. 2003;34:397–423.

    Article  Google Scholar 

  52. Toews DPL, Brelsford A. The biogeography of mitochondrial and nuclear discordance in animals. Mol Ecol. 2012;21(16):3907–30.

    Article  CAS  PubMed  Google Scholar 

  53. Ballard JWO, Rand DM. The population biology of mitochondrial DNA and its phylogenetic implications. Annu Rev Ecol Evol Syst. 2005;36:621–42.

    Article  Google Scholar 

  54. Perea S, Vukić J, Šanda R, Doadrio I. Ancient mitochondrial capture as factor promoting mitonuclear discordance in freshwater fishes: a case study in the genus Squalius (Actinopterygii, Cyprinidae) in Greece. PLoS One. 2016;11(12):e0166292.

  55. Funk DJ, Omland KE. Species-level paraphyly and polyphyly: frequency, causes, and consequences, with insights from animal mitochondrial DNA. Annu Rev Ecol Evol Syst. 2014;2003(34):397–423.

    Google Scholar 

  56. Anderson JD, Karel WJ. Genetic evidence for asymmetric hybridization between menhadens (Brevoortia spp.) from peninsular Florida. J Fish Biol. 2007;71(SUPPL B):235–49.

    Article  CAS  Google Scholar 

  57. Jolly MT, Maitland PS, Genner MJ. Genetic monitoring of two decades of hybridization between allis shad (Alosa alosa) and twaite shad (Alosa fallax). Conserv Genet. 2011;12(4):1087–100.

    Article  Google Scholar 

  58. Hasselman DJ, Argo EE, McBride MC, Bentzen P, Schultz TF, Perez-Umphrey AA, et al. Human disturbance causes the formation of a hybrid swarm between two naturally sympatric fish species. Mol Ecol. 2014;23(5):1137–52.

    Article  PubMed  Google Scholar 

  59. Alexandrino P, Faria R, Linhares D, Castro F, Le Corre M, Sabatié R, et al. Interspecific differentiation and intraspecific substructure in two closely related clupeids with extensive hybridization, Alosa alosa and Alosa fallax. J Fish Biol. 2006;69(SUPPL B):242–59.

    Article  CAS  Google Scholar 

  60. Van Sickel WA, Kominz MA, Miller KG, Browning JV. Late Cretaceous and Cenozoic sea-level estimates: backstripping analysis of borehole data, onshore New Jersey. Basin Res. 2004;16(4):451–65.

    Article  Google Scholar 

  61. Haq BU, Hardenbohl J, Vail PR. Chronology of fluctuating sea levels since the Triassic (250 million years ago to present). Science. 1987;235(4):1156–67.

    Article  CAS  PubMed  Google Scholar 

  62. Beadle LC. The inland waters of Africa. London: Longman; 1974.

    Google Scholar 

  63. Peters C, O’Brien EM. Palaeo-lake Congo: implications for Africa’s late Cenozoic climate—some unanswered questions. In: Palaeoecology of Africa and the Surrounding Islands, Vol 27; 2001.

  64. Seehausen O, Terai Y, Magalhaes IS, Carleton KL, Mrosso HDJ, Miyagi R, et al. Speciation through sensory drive in cichlid fish. Nature. 2008;455(7213):620–6.

    Article  CAS  PubMed  Google Scholar 

  65. Rundle HD, Nosil P. Ecological speciation. Ecol Lett. 2005;8(3):336–52.

    Article  Google Scholar 

  66. Seehausen O, Wagner CE. Speciation in freshwater fishes. Annu Rev Ecol Evol Syst. 2014;45:621–51.

    Article  Google Scholar 

  67. Bernatchez L, Renaut S, Whiteley AR, Derome N, Jeukens J, Landry L, et al. On the origin of species: Insights from the ecological genomics of lake whitefish. Philos Trans R Soc B Biol Sci. 2010;365(1547):1783–800.

    Article  CAS  Google Scholar 

  68. Mulimbwa NT, Milec LJM, Raeymaekers JAM, Sarvala J, Plisnier P, Marwa B, et al. Spatial and seasonal variation in reproductive indices of the clupeids Limnothrissa miodon and Stolothrissa tanganicae in the Congolese waters of northern Lake Tanganyika. Belgian J Zool. 2022;15:13–31.

    Google Scholar 

  69. Mokoma GM, Devos L, Cardona L. Alimentación de Microthrissa congica (Osteichthyes: Clupeidae) en la cuenca alta del río Congo (África). Rev Biol Trop. 1999;47(4):1081–6.

    Google Scholar 

  70. Mokoma M. Contribution à la connaissance de la Biologie de reproduction et du régime alimentaire de Microthrissa congica (Regan, 1917, Pisces, Clupeidae) du Bassin du Zaire. 1989.

  71. Bernacsek GM, Lopes S. Cahora Bassa (Mozambique). CIFA Tech Pap. 1984;10:21–4.

    Google Scholar 

  72. Marshall BE. Why is Limnothrissa miodon such a successful introduced species and is there anywhere else we should put it? In: Pitcher TJ, Hart PJB, editors. The impact of species changes in African Lakes. 1995. pp. 527–45.

  73. Kullander SO, Roberts TR. Out of Lake Tanganyika: endemic lake fishes inhabit rapids of the Lukuga River. Ichthyol Explor Freshw. 2011;22(4):355–76.

    Google Scholar 

  74. Tajima F, Nei M. Genetic drift and estimation of effective population size. Genetics. 1981;98:625–40.

    Article  PubMed  PubMed Central  Google Scholar 

  75. Kmentová N, Koblmüller S, Van Steenberge M, Raeymaekers JAM, Artois T, De Keyzer ELR, et al. Weak population structure and recent demographic expansion of the monogenean parasite Kapentagyrus spp. infecting clupeid fishes of Lake Tanganyika, East Africa. Int J Parasitol. 2020;50(6–7):471–86.

    Article  PubMed  Google Scholar 

  76. Koblmüller S, Zangl L, Börger C, Daill D, Vanhove MPM, Sturmbauer C, et al. Only true pelagics mix: comparative phylogeography of deepwater bathybatine cichlids from Lake Tanganyika. Hydrobiologia. 2018;832(1):93–103.

    Article  PubMed  PubMed Central  Google Scholar 

  77. Settle ML. Proc10xG. 2017. p.

  78. Andrews S et al. FastQC: a quality control tool for high throughput sequence data. 2010. https://www.Bioinformatics.Babraham.Ac.Uk/Projects/Fastqc/. 2010.

  79. Meng G, Li Y, Yang C, Liu S. MitoZ: a toolkit for animal mitochondrial genome assembly, annotation and visualization. Nucleic Acids Res. 2019;29:1–7.

    Google Scholar 

  80. Iwasaki W, Fukunaga T, Isagozawa R, Yamada K, Maeda Y, Satoh TP, et al. Mitofish and mitoannotator: a mitochondrial genome database of fish with an accurate and automatic annotation pipeline. Mol Biol Evol. 2013;30(11):2531–40.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Kans J. E-utilities on the Unix Command Line. In: Entrez programming utilities help. Bethesda (MD): National Center for Biotechnology Information (US); 2013.

  82. Tamura K, Stecher G, Kumar S. MEGA11: molecular evolutionary genetics analysis version 11. Mol Biol Evol. 2021;38(7):3022–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Abascal F, Zardoya R, Telford MJ. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res. 2010;38(SUPPL. 2):7–13.

    Article  Google Scholar 

  84. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Borowiec ML. AMAS: a fast tool for alignment manipulation and computing of summary statistics. PeerJ. 2016;2016(1):e1660.

  86. Strimmer K, Von Haeseler A. Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc Natl Acad Sci U S A. 1997;94(13):6815–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Schmidt HA, Strimmer K, Vingron M, Von Haeseler A. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics. 2002;18(3):502–4.

    Article  CAS  PubMed  Google Scholar 

  88. Darriba D, Taboada GL, Doallo R, Posada D. jModelTest 2: more models, new heuristics and high-performance computing Europe PMC Funders Group. Nat Methods. 2012;9(8):772.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Rozas J, Ferrer-Mata A, Sanchez-DelBarrio JC, Guirao-Rico S, Librado P, Ramos-Onsins SE, et al. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol Biol Evol. 2017;34(12):3299–302.

    Article  CAS  PubMed  Google Scholar 

  90. R core team. R: a language and environment for statistical computing. Vienna, Austria; 2020.

  91. Bernt M, Merkle D, Ramsch K, Fritzsch G, Perseke M, Bernhard D, et al. CREx: inferring genomic rearrangements based on common intervals. Bioinformatics. 2007;23(21):2957–8.

    Article  CAS  PubMed  Google Scholar 

  92. Nguyen LT, Schmidt HA, Von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32(1):268–74.

    Article  CAS  PubMed  Google Scholar 

  93. Chernomor O, Von Haeseler A, Minh BQ. Terrace aware data structure for phylogenomic inference from supermatrices. Syst Biol. 2016;65(6):997–1008.

    Article  PubMed  PubMed Central  Google Scholar 

  94. Kalyaanamoorthy S, Minh BQ, Wong TKF, Von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017;14(6):587–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Minh BQ, Nguyen MAT, Von Haeseler A. Ultrafast approximation for phylogenetic bootstrap. Mol Biol Evol. 2013;30(5):1188–95.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010;59(3):307–21.

    Article  CAS  PubMed  Google Scholar 

  97. Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19(12):1572–4.

    Article  CAS  PubMed  Google Scholar 

  98. Gelman A, Rubin D. Inference from iterative simulation using multiple sequences. Stat Sci. 1992;7(4):457–72.

    Article  Google Scholar 

  99. Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, et al. BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol. 2019;15(4):1–28.

    Article  Google Scholar 

  100. Rambaut A, Drummond AJ, Xie D, Baele G, Suchard MA. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst Biol. 2018;67(5):901–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Yu G. Using ggtree to visualize data on tree-like structures. Curr Protoc Bioinf. 2020;69(1):1–18.

    Article  Google Scholar 

  102. Powell CLE, Waskin S, Battistuzzi FU. Quantifying the error of secondary vs distant primary calibrations in a simulated environment. Front Genet. 2020;11(March):1–9.

    Google Scholar 

  103. Rohatgi A. WebPlotDigitizer. Pacifica, California, USA; 2021.

Download references


We thank Dovetail genomics for sequencing of the NGS data, Mathieu Tachon, Amalia Mailli, Armando J. Cruz-Laufer for their help with custom scripts, and Michael Matschiner for advice on divergence time dating.


LJMM is supported by her Nord University PhD grant 224000-131. LJMM and JAMR are further supported by JAMR’s start-up grant at Nord University, covering sequencing costs. The Special Research Fund of Hasselt University finances LJMM (BOF19DOC31) and MPMV (BOF20TT06). MPMV is further supported by Czech Science Foundation standard project GA19-13573S and Hasselt University Global Minds project GM2O18INITO7. ELRDK was funded through VLIR-UOS (VLADOC scholarship NDOC2016PR006 and South Initiative project CD2018SIN218A101) and BOF Seal of Excellence of University of Antwerp.

Author information

Authors and Affiliations



LJMM, MPMV and JAMR conceptualized the study. LJMM collected and analysed the sequencing data and wrote the draft with input and interpretation by MPMV and JAMR. FMB, ELRDK, VLK, PMM and NM coordinated and executed the sampling of the Tanganyika sardines. All authors read and revised the draft and approved the final manuscript.

Corresponding author

Correspondence to Leona J. M. Milec.

Ethics declarations

Ethics approval and consent to participate

Collection of specimens used in the study complied with institutional, national, and international guidelines. Fieldwork in the Democratic Republic of Congo was carried out with the approval of the Centre de Recherche en Hydrobiologie – Uvira (CRH-U), which falls under the Congolese Ministry for science and technology (“Ministère National de la Recherche Scientifique et Technologie”) under mission statement 002/MINRST/CRH-U/2018. Samples were exported with an export permit from the CRH-U. Samples were imported into the USA under import permit 70881B by the Department of the Interior U.S. Fish and Wildlife Service, Office of Law Enforcement to Dovetail Genomics. No animals were killed for this study, dead specimens were obtained from fishermen on Lake Tanganyika.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Taxonomic information, accession numbers and references of mitochondrial genomes used for phylogenetic analyses.

Additional file 2.

Phylogenetic analysis and divergence time dating of taxon-reduced datasets focusing on Dorosomatinae.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Milec, L.J.M., Vanhove, M.P.M., Bukinga, F.M. et al. Complete mitochondrial genomes and updated divergence time of the two freshwater clupeids endemic to Lake Tanganyika (Africa) suggest intralacustrine speciation. BMC Ecol Evo 22, 127 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: