East African lake cichlids are one of the most impressive examples of an adaptive radiation. Independently in Lake Victoria, Tanganyika, and Malawi, several hundreds of species arose within the last 10 million to 100,000 years. Whereas most analyses in cichlids focused on nucleotide substitutions across species to investigate the genetic bases of this explosive radiation, to date, no study has investigated the contribution of structural variants (SVs) in the evolution of adaptive traits across the three Great Lakes of East Africa.
Here, we annotate and characterize the repertoires and evolutionary potential of different SV classes (deletion, duplication, inversion, insertions and translocations) in four cichlid species: Haplochromis burtoni, Metriaclima zebra, Neolamprologus brichardi and Pundamilia nyererei. We investigate the patterns of gain and loss evolution for each SV type, enabling the identification of lineage specific events. Both deletions and inversions show a significant overlap with SINE elements, while inversions additionally show a limited, but significant association with DNA transposons. Inverted regions are enriched for genes regulating behaviour, or involved in skeletal and visual system development. We also find that duplicated regions show enrichment for genes associated with “antigen processing and presentation” and other immune related categories. Our pipeline and results were further tested by PCR validation of selected deletions and inversions, which confirmed respectively 7 out of 10 and 6 out of 9 events.
Altogether, we provide the first comprehensive overview of rearrangement evolution in East African cichlids, and some important insights into their likely contribution to adaptation.
African cichlids represent one of the best examples of rapid adaptive radiation [1,2,3,4]. The adaptation to different ecological niches in Lakes Malawi, Tanganyika and Victoria has given rise to several hundreds of species in a period of just a few million years [5,6,7]. The radiation is associated with great phenotypic variation, including jaw morphology, body shape, coloration, adaptation of the visual system to different water depths, and behavior [8,9,10,11,12,13,14,15,16]. Variation in ecological niches and behaviour appears to be associated with different brain development , with differences appearing already at early developmental stages . A great example of adaptation is represented by the evolution of the cichlid visual system, involving eight different opsin genes [19,20,21].
To gain insights on the molecular mechanisms underlying this rapid radiation, Brawand et al.  generated genome references for five species: the Nile tilapia (Oreochromis niloticus), representing an ancestral lineage; Neolamprologus brichardi (Lake Tanganyika), Metriaclima zebra (Lake Malawi), Pundamilia nyererei (Lake Victoria), and Haplochromis burtoni (riverine species around Lake Tanganyika). The study highlighted several mechanisms underlying species diversification, including selection acting on existing standing variation, high rates of gene duplication, novel microRNAs and rapid sequence divergence in otherwise conserved non-coding elements. Following this study, Malinksy et al. described an example of early stage divergence between two cichlid ecomorphs in Tanzania . They identified genomic islands of speciation between them, containing potentially adaptive genes associated with mate choice. Theis et al.  focused on the early phases of adaptive divergence of H. burtoni, which is found in both Lake Tanganyika and inflowing river. Their results highlighted the presence of multiple divergent lake-stream populations, representing different stages of the speciation process. More recently, the sequencing of 134 individuals covering 73 species provided a great characterisation of genomic diversity in lake Malawi . The authors observed very low levels of inter-species divergence (0.1–0.25%), overlapping the diversity found within species. Phylogenetic analyses showed that no single species tree can efficiently represent all species relationships, suggesting high levels of repeatedly occurring gene flow.
In 2014, Fan and Meyer  used the five genome references generated by Brawand et al.  to annotate SNPs, indels and SVs in four of these species, representative of the adaptive radiations. However, this study applies one method (Pindel v0.2.5a1) of detecting SVs based on a less complete and contiguous Nile tilapia assembly (Orenil1.1) than the available PacBio reference genome [26, 27], and does not focus on the adaptive potential of large-scale variation.
Recently, Conte et al. generated an improved reference assembly for the cichlids M. zebra and O. niloticus . The authors compared the genome structure of the two species at the chromosome scale, taking advantage of the high quality of these references. They observed a high number of ~ 2-28 Mb, intra-chromosomal SVs, but a limited number of inter-chromosomal rearrangements. They also identified structural changes associated with lower recombination rates, suggesting inversion events between different species in Lake Malawi. This study, however, did not investigate the patterns of SV evolution across representative cichlid genomes of the three East African Great Lakes, or consider their possible implication in speciation and adaptive phenotypes.
All other studies so far focused on single variation within and between species and to a lesser extent on the evolution of gene regulatory patterns . Structural variants (SVs, including deletions, duplications, inversions, insertions and translocations) are the source of increased genomic variability and in some cases adaptive potential. Gene or exon duplication events might lead to neo- or sub-functionalisation [28,29,30,31,32]. An evolutionary study in East African cichlids focusing on agrp2 (a locus controlling horizontal stripe patterns) revealed several recent duplications, insertions, and deletions, including a tandem duplication of the last exon . This event is not fixed in any of the radiations, and is polymorphic even within some species. This pattern of copy number variation can facilitate neofunctionalization or even loss-of-function of agrp2.
Gene loss events, on the other hand, can reflect relaxed selective pressure or be possibly adaptive in other cases . For example, the loss of ampd3 in sperm whales likely represents an adaptation to their extreme diving ability [34, 35].
Inversions result in suppressed recombination when heterozygous, and might act as a protection against gene flow for specific haplotypes . Inversions might raise in frequency, up to fixation, possibly leading to isolation and even speciation events [37, 38]. Studies in Drosophila melanogaster provided strong evidence for their involvement in adaptation. For example, the inversion 3RP is associated with adaptation to different climates . Its frequency exhibits a parallel latitudinal cline across several continents, being higher close to the equator and decreasing towards higher latitudes [38, 39]. Translocations can result in a heavy restructuration of chromosome organisation , with potential gene loss or changes in regulatory control of expression.
Identifying structural changes across species representative of all three great lakes can provide exciting insights into their explosive radiation. In this study, we use the newly released O. niloticus (riverine species living in shallow waters) reference based on long read PacBio sequencing  and paired-end sequencing data generated by  to identify SVs in four representative cichlid species with varying ecological adaptations: Neolamprologus brichardi (Lake Tanganyika, reef dwelling planktivore, 3–30 m of water depth), Metriaclima zebra (Lake Malawi, rock dwelling algae scraper, 6–28 m of water depth), Pundamilia nyererei (Lake Victoria, reef dwelling planktivore, 4–7 m of water depth), and Haplochromis burtoni (insectivorous riverine species around Lake Tanganyika). Through this analysis we aim to: characterise the evolutionary patterns associated with different rearrangement classes; investigate functional enrichment within those rearranged genomic regions; identify the genes affected by these structural changes and how these can relate to the phenotypes found across the three lakes.
We show that genes lying inside inverted regions are enriched for genes regulating behaviour, or involved in skeletal and visual system development, which are directly relevant to the African radiation. Altogether, we describe the repertoires of structural variations across four species of the East African cichlids, their evolutionary dynamics, and novel insights into their possible contribution to adaptation.
Annotation of SVs across 4 cichlid species
We mapped all previously generated paired-end libraries  to the high quality O. niloticus assembly  to annotate five different classes of rearrangements (deletion, tandem duplication, inversion, insertion and translocation) in the four available species of the East African radiation (Supplementary Fig. 1). We used a combination of three different tools: Breakdancer , Delly  and Pindel  and identified 6694 deletions, 1550 duplications, 1471 inversions, 34,875 insertions and 1354 translocations (Table 1, Additioal file 2: Supplementary File 1).
Our initial predictions showed a bias towards small (< 1 kb) deletions (240229). This number might be inflated as a result of our SV detection pipeline, where deletions are identified using read pairs mapped in a concordant way (as opposed to duplications and inversions). This represents an issue particularly when considering small events. Therefore, we decided to retain only deletions with a minimum size of 1 kb (Table 1). In the resulting dataset, 5483 deletions fall in the 1–10 kb size range, while 1207 represent larger, > 10 kb events (Fig. 1). We investigated whether the size of a SV correlates with the age of the event. While the size distributions of deletions did not seem to be affected by the number of species sharing the SV, we noticed a tendency for duplications and inversions towards larger sizes as the number of species increased. Species specific events are significantly smaller than those common to 2 species (MW test, p = 0.005), which in turn are smaller than the events found in 3 species (MW test; duplications: p = 0.008; inversions: p < 0.0001; see Additional file 3: Supplementary File 2). Moreover, conserved inversions are significantly larger than both conserved deletions (MW test, p < 0.0001) and duplications (MW test, p < 0.0001).
We investigated the patterns of gain and loss evolution for each SV class, using a Dollo Parsimony approach (see Methods). We identified a high proportion of events predicted to be lineage specific (Fig. 2). Additionally, comparison across species allowed us to identify the events common to a single lineage or to all species involved in the African radiation. We will refer to the latter as “conserved SVs”. However, a “conserved SV” could also represent a structural change that occurred in the O. niloticus lineage, and this ambiguity cannot be resolved without the addition of an outgroup species.
We noticed a surprisingly high loss rate of deletions in the M. zebra lineage (Fig. 2a). In order to evaluate the reliability of our approach and the accuracy of our annotations, we compared our results to those obtained through the pairwise, whole genome alignments between the latest M. zebra and O. niloticus assemblies, using Satsuma2 (https://github.com/bioinfologics/satsuma2, see Methods). Out of 2263 deletions annotated in M. zebra, only 54 (2%) were discordant with Satsuma2 alignments. Thus, we show that our annotation in M. zebra has a very high concordance with the high quality genome assemblies of M. zebra and O. niloticus.
With the exception of M. zebra deletions, we observed high proportions of lineage specific events, consistent across all SV classes (Fig. 2). However, in the case of deletions we also observed a high number (1711) of events which are ancestral to the radiation. Overall, these results point at a reduction in genome size associated to the African radiation. While this is in concordance with the observation that the M. zebra assembly is 48 Mb shorter than the O. niloticus reference [26, 41], conserved SVs might also reflect a rearrangement event specific to O. niloticus, as stated previously.
We investigated the extent of interval overlap between our predicted SVs and different genomic features. We considered different subsets of our SV annotations, categorising our predictions based on size range (< 1 kb, 1 kb–10 kb,> 10 kb) and number of species sharing the SV event (whole dataset vs conserved SVs). We observed a strong association between > 10 kb conserved deletions and immunoglobulin chain regions. The association is highly significant for both the constant (14.8 fold change, p = 0.01) and variable (10.8 fold change, p = 5e-03) gene segment annotation, which suggests a possible involvement of copy number variants in immune response mechanisms. It must be pointed out, nevertheless, that these loci are present in multiple, tandemly repeated copies, and the observed association could possibly reflect assembly issues in repetitive regions.
We also hypothesised that repeats throughout the genome facilitate the evolution of structural changes. In order to test this hypothesis, we looked at the genomic association (interval overlap analysis, see Methods) between our SV dataset and African cichlid specific repetitive elements. These analyses highlighted a significant overlap between 1 kb–10 kb long conserved inversions and SINE2 elements (10.2 fold change, p = 8.7e-03). The association with SINE2 is not significant, however, when we consider all conserved inversions, irrespective of their size (1 fold change, p = 0.3).
Conserved duplications are significantly under-represented with African cichlids (AFC) SINE2–1 (0.64 fold change, p = 2.7e-03) and REX1–2 AFC elements (0.3 fold change, p = 3.14e-02). Conversely, they appear to be enriched for several simple repeats, including (AAGTCTC) n (54.7 fold change, p = 1e-04).
Large deletions appear to be negatively associated with AFC RTE-2 elements (0.57 fold change, p = 3.7e-02) but positively associated with AFC L1–1 elements (2.17 fold change, p = 8.7e-03), as well as several simple repeats. When we considered all conserved deletions, irrespective of their size, we observed a significant association with AFC SINE2–1 (1.42 fold change, p = 1e-04) and SINE3 (2.98 fold change, p = 1e-04) elements. Similar conclusions were reached in previous studies on the pig genome [45, 46]. Taken together, these results suggest a correlation between repetitive elements and structural evolution in African cichlids.
We next asked the question whether we observe differences in the repeat landscape inside and outside SV regions. In order to identify homologous regions between the reference and each of the remaining species, we converted Satsuma2 whole genome alignments to chain format, and performed a liftover of all SV coordinates from the reference to each of the other species. This allowed us to compare the repeat landscape in a pairwise fashion, considering different SV class separately.
When heterozygous, an inversion can favor the accumulation of mutations and novel transposable elements, as a result of reduced excisions rates [47, 48]. We tested this possibility by comparing the repeat content inside and outside inverted regions. We focused our analysis on the latest O. niloticus genome reference. Previous studies highlighted very high proportions of DNA transposable elements in African cichlids , an observation which was confirmed by our data (Figs. 3, 4 and 5; SVs annotated in M. zebra). Overlap analyses based on the O. niloticus reference suggested a limited, but significant enrichment in DNA transposons inside inversions (size range: 500 nt-5 Mb; fold change=1.07, p= 1e-04). While for other repeat classes the proportions are very similar inside and outside inverted regions (Fig. 3), the LTR representation is higher in the former across all divergence bins (Fig. 4). This reflected in a significant enrichment in LTR elements inside inversions (size range: 500 nt-5 Mb; fold change= 1.21, p= 1e-04), as opposed to the LTR content outside inversions (fold change=0.92, p= 1e-04).
Next, we repeated the analyses considering duplication events. We observed no difference in repeat content inside and outside duplicated regions (Fig. 5). The association (based on the O. niloticus reference sequence) between duplications and LTR is weaker than expected (fold change = 0.94, p = 1e-04), while no significant deviation was found when considering DNA transposons (fold change = 0.99, p = 0.31). As for regions outside duplications, we observed a significant, although very limited, enrichment for both DNA (fold change = 1.001, p = 0.04) and LTR (fold change = 1.05, p = 8e-04) elements. It must be noted, however, that during the liftover conversion of the genomic coordinates, many inverted and duplicated regions were lost, limiting the sequence space considered in M. zebra.
SV regions are enriched for developmental and immune related genes
Structural variation can provide important evolutionary novelty for speciation and the evolution of adaptive traits [28,29,30, 37]. For instance, gene duplication can lead to dosage effects, neofunctionalisation or subfunctionalisation events (Lynch 2002; Katju and Lynch 2006), while inverted regions can experience drastically reduced recombination rates . We took advantage of our SV dataset across 4 species, to investigate which genes are affected by duplication or inversion events. We first considered different subsets of inversions, separating species specific events from the ones annotated in multiple species. When looking at species specific events, we considered each species separately (Additional file 4: Supplementary File 3). We identified 559 genes in H. burtoni, 109 in M. zebra, 580 in N. brichardi and 814 in P. nyererei. Results for H. burtoni highlighted GO:0006955 (“immune response”, significant: 13; expected: 5.01; padj= 0.0015) and GO:0007600 (“sensory perception”, significant: 10; expected: 5.04, padj= 0.03).
Inverted genes in N. brichardi, are enriched for GO:0065007 (“biological regulation”, significant: 193; expected: 154.35, padj = 0.0411) and GO:0009416 (“response to light stimulus”, significant: 6; expected: 2.45, padj = 0.0351). Interestingly, we also found one gene (gja3, coding for an intercellular channel) annotated to GO:0048050, (“post-embryonic eye morphogenesis”). P. nyererei genes are enriched for GO:0007602 (“phototransduction”, significant: 6, expected: 1.44, padj = 0.003). This set also includes 2 genes annotated to GO:0002089 (“lens morphogenesis in camera-type eye”, significant: 2, expected: 0.23, padj = 0.018), fn1a and foxe3, as well as 3 genes annotated to GO:0061035 (“regulation of cartilage development”, significant: 3, expected: 0.54, padj =0.015): sox32, s1pr2 and pthlha. Members of the sox gene family encode for transcription factors, and play a crucial role in morphological and behavioural variation in teleosts . pthlha is an oral jaw specific gene  coding for the parathyroid hormone. In the case, of M. zebra, we could only identify one accession represented by 5 or more genes: GO:0006468 (“protein phosphorylation”; significant: 6; expected: 2.52;; padj = 0.0374).
We also considered inversions which are common to at least 2 species,not exceeding 5 Mb in size (Fig. 6). A total of 854 GO annotated genes (Additional file 4: Supplementary File 3) could be identified inside these SV regions. GO:term enrichment on this gene set highlighted accessions GO:0007610 ("behavior", significant: 9; expected: 4.66; padj=0.042), GO:0060041 (“retina development in camera-type eye”, significant: 14; expected: 5.7; padj = 0.001), GO:0060042 (“retina morphogenesis in camera-type eye”, significant: 7; expected: 2.76, padj = 0.0018), GO:0048706 (“embryonic skeletal system development”, significant 10; expected: 5.12; padj = 0.03). Among the genes annotated to GO:0060041, we found: vax2 (Ventral Anterior Homeobox 2), a gene known to regulate cone opsin expression ; fgf8a (fibroblast growth factor 8a), part of a key pathway in animal evolution , and ift172 (intraflagellar transport 172).
We repeated the same procedure for inversions up to 10 Mb in length, which increased the number of genes considered to 1404 (Additional file 4: Supplementary File 3). While accession GO:0060041 was still significantly over-represented, we observed additional, immune related processes: GO:0019882 (“antigen processing and presentation”, significant: 8; expected: 2.7; padj = 0.0044), GO:0006955 (“immune response”, significant: 19; expected: 12; padj = 0.03) and GO:0042445 (“hormone metabolic process”, significant: 6; expected: 2.14; padj = 0.01). As part of GO:0006955, we found the gene nfil3 (Nuclear Factor, Interleukin 3 Regulated), coding for a transcriptional regulator.
Next, we compared the GO terms across different subtrees of our five-species phylogeny (Supplementary Fig. 2, Additioal file 4: Supplementary File 3). We first selected inversions common to M. zebra and P. nyererei but absent in the other species, for which we could identify 210 inverted genes. We found enrichment for protein modification and processing, including GO:0006508 (“proteolysis”, significant: 14; expected: 7.3; padj = 0.014). When considering the branch leading to these two species as well as H. burtoni (134 genes, events absent in N. brichardi), we identified genes involved in developmental processes, including 4 annotated to GO:0048598 (“embryonic morphogenesis”) and 2 genes for accession GO:0033339 (“pectoral fin development”): cyp26c1 (cytochrome P450, family 26, subfamily C, polypeptide 1) and sall4 (Spalt Like Transcription Factor 4). The former lies in a sex associated region in H. burtoni . Together with the fact that the inversion is lineage specific, it makes the gene particularly interesting. We can speculate that the inversion event might have helped the maintenance of specific haplotypes (including gene cyp26c1) through the suppression of recombination in the affected region, possibly contributing to the divergence of sex-associated traits.
The enrichment for developmental processes was also observed for genes in conserved inversions (up to 5 Mb in size, n = 90), among which GO:0060042 (“retina morphogenesis in camera-type eye”, < 5 genes) and GO:0048706 (“neuron development”, significant 5; expected: 1.3; padj = 0.04) are particularly interesting.
We also looked for genes contained inside tandem duplications, and filtered the resulting set based on evidence of tandem repeat of at least 3 consecutive exons in the target genome assembly (see Methods). When considering species-specific events (Additioal file 4: Supplementary File 3), we identified 204 genes in H. burtoni, 197 in M. zebra, 143 in N. brichardi and 224 in P. nyererei. For the H. burtoni gene set we identified, among others, GO:0006508 (“proteolysis”, significant: 19; expected: 7.3; padj = 0.021) and GO:0060078 (“regulation of postsynaptic membrane potential”, significant: 6; expected: 0.87; padj = 0.0002). Duplicated genes in P. nyererei are enriched for immune related processes, including GO:0006955 (“immune response”, significant: 10, expected: 1.83; padj = 1.4e− 5) and GO:0019882 (“antigen processing and presentation”, significant: 6; expected: 0.41, padj < 0.0001). Additionally, GO:0055085 is represented by 18 genes (“transmembrane transport”, significant: 18; expected: 11.18, padj = 0.028).
By requiring the duplication event to be shared by at least 2 species (Fig. 6), we could identify 152 genes (Additional file 4: Supplementary File 3). Results highlighted the presence of GO:0019882 (“antigen processing and presentation”, padj = 1e− 6), GO:0007229 (“integrin-mediated signalling pathway”, padj = 1.6e− 5), and GO:0006955 (“immune response”, significant: 8; expected: 1.54; padj = 1.5e− 4). This dataset contains two genes encoding for an H-2 class II histocompatibility antigen chain: ENSONIG00000019943 and ENSONIG00000003904. Accession GO:0048854 (“brain morphogenesis”, significant: 2; expected: 0.15; padj = 0.001) was also significant enriched, however it is represented by only 2 genes: atp1a1 (ENSONIG00000012456), encoding for ATPase Na+/K+ transporting subunit alpha 1a, and shank3 (SH3 and multiple ankyrin repeat domains 3). Similar to inversions, we looked at GO enrichment across the phylogenetic tree (Additioal file 4: Supplementary File 3). While only one gene (lyz) was found in conserved duplications (after filtering for evidence of tandem repeats and presence in the Ensembl annotation), we had 41 genes for the M. zebra-P. nyererei subtree and 58 for the lineage including H. burtoni as well (Supplementary Fig. 3). However, in all of these cases the significantly enriched terms were represented by very low (< 4) numbers of genes.
Altogether, our analyses provide the first insights into the possible contribution of SVs to the evolution of adaptive traits in African cichlids, including circadian rhythm, developmental processes and immune response mechanisms.
Validation of selected deletion and inversion events
In order to better understand the reliability of our computational analyses, we decided to validate selected deletions and inversions by PCR amplification of the rearranged genomic region (Figs. 7 and 8, Supplementary Fig. 4, Methods). We first focused on 10, medium sized (1-5 kb) deletion events annotated in M. zebra (Table 2). For the validation, we compared experimental results obtained using tissue samples for M. zebra (liver and brain) and O. niloticus (testis and fin). In this comparison, O. niloticus represents the SV-free reference sequence, while M. zebra is predicted to carry the deletion event (and hence show a smaller amplification product). Figure 7 provides an overview of the results of the second PCR run. We could confidently confirm the deletion event in 7 out of 10 cases. For deletions 1 and 2, we were not able to detect the expected products. As for deletion 6, we had discordant results between run 1 (Supplementary Fig. 4) supporting the SV, and run 2 showing the expected product in both M. zebra and O. niloticus. Even excluding deletion 6, we obtained a 70% concordance between our computational predictions and the PCR validation, providing evidence for the reliability of our bioinformatics pipeline.
Next, we adapted the primer design strategy for the validation of 9 selected inversions (Table 3), ranging from 1 kb to 10 Mb in size. We chose 7 events containing genes involved in either retina development (GO:0060041) or innate immune response (GO:0045087), plus 2 additional, smaller (< 4 kb) inversions. Different primer sets were designed to match sequences flanking either of the two breakpoints (Table 3, Methods). PCR results (Table 3, Fig. 8) confirmed the majority of these inversions, providing strong support for 6 events, partial support for 1 event (inv_4) and poor support for the remaining 2 (inv_5 and inv_7). Among the genes inside validated inversions (Additional file 5: Supplementary File 4), we find many genes involved in retina development: sf1 (splicing factor 1) as part of inv_2; vax2 (Ventral Anterior Homeobox 2) as part of inv_3; genes id2 (Inhibitor of Differentiation 2), exoc5 (Exocyst Complex Component 5) and ppm1a (Protein Phosphatase 1A) located inside inv_8. No gene annotated to GO:0045087 (“innate immune response”), however, was found inside these validated inversions. Altogether, these results demonstrate the reliability of our bioinformatics analyses, and provide additional, experimental support to our inferences.
Our work uncovers a new, important aspect of the adaptive radiation of East African cichlids. We demonstrate the presence of extensive structural rearrangements across representative species of the three Great Lakes, and strikingly, we show that these large-scale variants are likely implicated in the evolution of important adaptive traits.
We inferred the gain and loss patterns of all annotated SVs across the phylogenetic tree, thus identifying high proportions of lineage specific gains. While the size distributions are generally comparable across different conservation levels, we see a shift towards larger sizes in the case of deeply conserved inversions.
High proportions of lineage specific gains may provide novel opportunities for the adaptive evolution of these species. Therefore, we investigated the repertoires of genes affected by inversions and duplications, considering species specific and more conserved events separately. Among the most interesting biological processes associated with inversions, we find “behavior” (GO:007610), “retina development in camera-type eye” (GO:0060041), “pectoral fin development” (GO:0033339) and “embryonic skeletal system development” (GO:0048706). Moreover, we found enrichment for “neuron development” (GO:0048666) associated with events in the Haplochromine lineage (M. zebra, P. nyererei, H. burtoni is We can speculate that the enrichment for developmental processes reflects the implication of structural variation in shaping the great morphological and behavioural variation observed in East African cichlids [17,18,19, 52]. The enrichment for “retina development in camera-type eye” is however particularly striking, given that the adaptive radiation of East African cichlids is associated with the evolution of the visual system. This has been implicated in the adaptation to different water depths and turbidity conditions, as well as in female mating preferences [14, 16].
Accession GO:0033339 includes gene Cyp26c1, lying in a sex associated region identified in H. burtoni . This result opens to speculations on the link between inversion events, suppressed recombination, and the divergence of sex-associated traits across lineages.
When we consider duplicated regions, we find an enrichment for “antigen processing and presentation” (GO:0019882) and additional immune related categories. This “theme” is common to both species-specific and multi-species events. However, when we consider different subtrees separately, the number of significant genes drops dramatically, making us less confident about the biological relevance of the enrichment results. Our set of genes inside duplicated regions include an “H-2 class II histocompatibility antigen” locus, as well as ilf2 (interleukin enhancer binding factor). The observed association between immune genes and duplication events is not surprising, being in line with previous studies on the fast, adaptive evolution of the vertebrate immune system [54,55,56]. In cichlids, differences in parasite communities across foraging habitats can determine strong selective pressure, favoring adaptive phenotypes and ecological speciation. In particular, several studies have highlighted extensive variation in MHC pools which suggests immunogenetic adaptation [57,58,59,60,61,62,63]. Host-parasite coevolution in Pseudotropheus fainzilberi and P. emmiltos (a pair of sympatric Lake Malawi species) appear to have driven adaptive divergence in MHC alleles, affecting odor-mediated mate choice and leading to reproductive isolation . A large scale analysis of MHC diversity across the major tribes of Lake Tanganyika cichlid fishes  showed how different cichlid tribes partially differ in both parasite communities and MHC diversity. The distinct MHC profile of the Limnochromini, for example, suggests that distinct immunogenetic properties are selected in deep water.
In the threespine stickleback (Gasterosteus aculeatus), it has been shown that Major Histocompitability Complex (MHC) genes are linked with female mating preference, suggesting that divergent selection acting on MHC genes might influence speciation [64, 65].
While our results strongly suggest that structural variation has been implicated in the adaptive evolution of African cichlids (especially for retina development and immune response), their interpretation in the light of morphological and ecological variation remains both challenging and speculative. Moreover, investigating the gene enrichment results in the light of ecological or morphological variation does not seem to highlight any clear pattern. Additional analyses and experiments would clearly be required in order to draw a link between gene enrichment analyses and the evolution of specific adaptive traits.
In order to gain more confidence on the results of our SV detection pipeline, we decided to validate selected events by PCR. These experiments provided strong support for 6 out of 9 inversions. The genomic regions of these 6 validated events include sf1 (splicing factor 1), vax2 (Ventral Anterior Homeobox 2) and other genes which are involved in retina development. Thus, we were able to provide additional, experimental support for selected SV events, potentially involved in the adaptive evolution of the visual system.
Additional PCR experiments also confirmed 7 deletion events out of 10 tested. This suggests similar levels of accuracy (~ 70%) in the SV identification method for deletions and inversions.
It is important to mention that we did not sequence the PCR products to confirm the amplification of the target regions. We are aware that this represents a limitation of our study. However, we would like to stress some important points: 1) our primer design was based on the high quality PACBIO reference of O. niloticus; 2) all validated SVs show a clear and strong correlation between the strongest PCR band and the expected product size; 3) in all validated SVs, the pattern is perfectly and uniquely consistent with the hypothesis of a deletion (or inversion) occurring in M. zebra as compared to the O. niloticus reference; 4) in the case of deletions, our validation is supported by technical replicates that lead to the same result.
We also investigated the possibility of differential evolutionary patterns between inverted and non-inverted regions by comparing their repetitive element landscapes in M. zebra. Despite the observation of a significant enrichment in both DNA transposons and LTR elements, we observed little difference in repeat content. This holds true when comparing duplicated regions to the rest of the genome.
In this study, we provide a comprehensive overview of rearrangement evolution in East African cichlids, and highlight their likely contribution to the evolution of adaptive traits.
The results presented here are likely to inspire further studies, focusing on several aspects of rearrangement evolution. These might include: the evolution of genome size in East African cichlids; the contribution of inversions to speciation events, as highlighted by previous studies [36,37,38,39]; the role of SVs in shaping the expression landscape by altering gene sequences, gene copy number, or regulatory elements [66,67,68,69]; further studies on SV identification, evolution and biological role, considering a different (and possibly greater) set of species. The inclusion of data from additional species, and the resolution of intra and inter specific variability would result in a much greater power in reconstructing the evolutionary dynamics of each SV event [33, 70]. Moreover, it would facilitate the identification of any association between SVs and traits under selection.
The evolution of cichlids in African lakes represents an impressive example of how a relatively low degree of genetic variation can provide the substrate for an explosive and rapid species radiation, allowing for the adaption to many different ecological niches. Single nucleotide variants, large scale rearrangements, transposable elements and several regulatory mechanisms can all contribute to the evolution of diverse genetic traits with high adaptive potential. We are only starting to understand the evolutionary dynamics and molecular mechanisms underlying this impressive radiation, and much work is still needed to shed light on all the different aspects and key players involved.
Paired-end libraries available for Neolamprologus brichardi, Metriaclima zebra, Pundamilia nyererei and Haplochromis burtoni  were downloaded using fastq-dump from the sra-toolkit (https://www.ncbi.nlm.nih.gov/sra/docs/toolkitsoft/). Due to lower base quality issues, the last 30 nt at the 3′ end of the longest, 100 nt reads were trimmed. All libraries were mapped against the O. niloticus genome assembly (Supplementary Fig. 1) using gmap . The resulting bam files were sorted and indexed using samtools , then used as input for 3 algorithms: Breakdancer , Delly  and Pindel .
SV predictions were first filtered by: a minimum of 2 libraries and 5 discordantly mapping read pairs supporting the call (Breakdancer and Pindel); a Breakdancer score of 99; both a PASSED and PRECISE flag provided by Delly’s output files. For each tool and rearrangement class separately (with the exception of translocations), we merged predictions with a reciprocal coordinate intersection of at least 90% into a single SV call.
The sets of merged filtered calls of each algorithm were then compared in a pairwise manner. Specifically, we used Bedtools intersect (Quinlan 2014) to identify SVs independently called by two different algorithms, with a reciprocal intersection at least 90% of the SV region. This gave us three sets of SVs supported by at least 2 algorithms: Breakdancer+Delly, Breakdancer+Pindel and Delly+Pindel (Supplementary Fig. 5). The annotations of each SV class across all species was then combined into a single BED file. For each combined set, we then carried out a conditional merging of the SV genomic coordinates. For events up to 0.5 kb in size, we required a minimum of 50% reciprocal intersection for multiple calls to be merged together. For size ranges of 0.5-1 kb, 1–10 kb and all events greater than 10 kb, we used a threshold 80, 90 and 95%, respectively (in each range, we included the bottom value while excluding the top one).
Overlap analyses and GO:term enrichment
Analyses of overlap between SVs and genome annotation were performed using GAT . The O_niloticus UMD1 gene annotation was downloaded from the NCBI database and converted into BED format. Genes inside SV regions were identified by comparing the gene annotation with the genomic coordinates of our SV dataset, using Bedtools intersect (Quinlan 2014). We selected genes fully contained inside an SV event by using the options “-a genes.bed -b SV.bed -f 1”.
GO:term enrichment was performed on the set of genes mapping to an Ensembl gene ids. We used the elim algorithm from the R package TopGO . The gene background was defined as the set of all genes in the NCBI annotation mapping to an Ensembl gene id.
Whole genome alignments and repeat content analyses
In order to generate whole genome alignments between the latest M. zebra  and O. niloticus  assemblies, we ran Satsuma2 (https://github.com/bioinfologics/satsuma2) using the following parameters: -slaves 10 -threads 16 -km_mem 120 -sl_mem 120 -prob_table true -min_prob 0.99999 -min_seed_length 20 -max_seed_kmer_freq 1 -min_matches 10 -dump_cycle_matches 1.
In order to compare Satsuma2 results with our dataset of M. zebra deletions (O. niloticus genomic coordinates), we converted the satsuma_summary.chained.out output file into a 6 columns BED file. We then used the command “bedtools intersect” from Bedtools  to identify alignments of sequences overlapping a deletion event. A deletion was considered to be discordant with Satsuma2 if at least one alignment spanning 50% or more of the predicted deleted region could be identified.
For the analysis of the repeat content inside and outside SV regions, the software RepeatMasker was run on the O.niloticus reference to identify repetitive elements genome-wide. The .out result file of RepeatMasker was reformatted to generate a 6 column BED file. For each SV separately, we then used Bedtools intersect  to identify repeat elements fully contained inside the SV regions. The analyses were restricted to the SVs annotated in M. zebra. Overlapping repeat elements were separated based on the percentage of divergence from the consensus sequence, provided in the .out result file. An equivalent approach was used to identify repeat elements outside SV events. The repeat content was then calculated as the proportion of repeat nucleotide positions over the total length of the genomic space considered (total size of SV space or the genomic space outside SVs).
The M. zebra individuals were maintained in the cichlid fish facility at University of Hull managed by Alan M. Smith and Domino Joyce. In order to maintain a healthy colony and stimulate breeding and good quality egg production throughout the year, M. zebra individuals were kept under optimal conditions. The O. niloticus individuals are lab-acclimated Egyptian strains (Lake Manzala stock originally maintained at Swansea and Stirling University) kept in the Tilapia fish facility at ARO and managed by Avner Cnaani. In order to maintain a healthy colony and stimulate breeding and good quality egg production throughout the year, O. niloticus individuals were kept under optimal conditions which was, in this case, a temperature of 25C, pH 7.9 and salinity of 0.02%.
M. zebra individuals were sacrificed according to Home Office license schedule 1, killing using overdose of MS-222 (tricaine) at the lab of Dr. Domino Joyce, The University of Hull, UK. O. niloticus individuals were sacrificed according to IACUC certification by the Israeli Ministry of Health’s Council for Experimentation on Animals, licensed schedule 1, killing using overdose of MS-222 (tricaine) at the lab of Dr. Avner Cnaani, Institute of Animal Science, Agricultural Research Organization (ARO), Bet Dagan, Israel. Upon sacrifice, relevant tissues were dissected and preserved in either RNAlater or laboratory grade absolute ethanol (EtOH). A completed ARRIVE guidelines checklist is included as Additional file 6: Supplementary File 5.
PCR validation of structural variants
Oligonucleotide primers were designed against the latest O. niloticus assembly using Primer3  and all primers were synthesised by Integrated DNA Technologies, Iowa.
DNA was extracted from samples of frozen tissue or tissues preserved in ethanol (25 mg) from 1 M. zebra individual (fin, testis) and 1 O. niloticus individual (brain, liver), using the MagAttract HMW DNA Kit (Qiagen, CA) according to the manufacturer’s protocol for “Manual Purification of High Molecular Weight Genomic DNA from Fresh or Frozen Tissue”. Final DNA concentrations were determined using Qubit fluorometer™ 2.0 (Invitrogen, Life Technologies) and purity was assessed using A260:280 ratio (≥1.8) by measurement on a Nanodrop™ spectrophotometer (Thermofisher Scientific). PCR products were amplified according to manufacturer’s protocol for NEBNext® High-Fidelity 2X PCR Master Mix in 25 μl reactions using 50 ng of DNA template. PCR cycle conditions were followed as stated in manufacturer’s protocol and extension times were adjusted according to length of expected product size. PCR products were visualised on 1.5% (w/v) agarose gels stained with SYBR™ Safe DNA Gel Stain and imaged using the Alliance 2.7 gel documentation system (UVITEC, Cambridge).
Availability of data and materials
The data set supporting the results of this article is available in the Short Read Archive (SRA) repository (https://www.ncbi.nlm.nih.gov/sra) under the following accessions: PRJNA59571 (SRP004171) for O. niloticus; PRJNA60365 (SRP004799) for N. brichardi; PRJNA60367 (SRP004869) for P. nyererei; PRJNA60369 (SRP004788) for M. zebra; and PRJNA60363 (SRP004787) for H. burtoni .
Long Terminal Repeat
Short Interspersed Nuclear Element
Polymerase Chain Reaction
Animal Research: Reporting of In Vivo Experiments
Kocher TD. Adaptive evolution and explosive speciation: the cichlid fish model. Nat Rev Genet. 2004;5(4):288–98.
Svardal H, Quah FX, Malinsky M, Ngatunga BP, Miska EA, Salzburger W, Genner MJ, Turner GF, Durbin R. Ancestral hybridisation facilitated species diversification in the Lake Malawi cichlid fish adaptive radiation. Mol Biol Evol. 2019;37:1100.
Ivory SJ, Blome MW, King JW, McGlue MM, Cole JE, Cohen AS. Environmental change explains cichlid adaptive radiation at Lake Malawi over the past 1.2 million years. Proc Natl Acad Sci U S A. 2016;113(42):11895–900.
Malinsky M, Challis RJ, Tyers AM, Schiffels S, Terai Y, Ngatunga BP, Miska EA, Durbin R, Genner MJ, Turner GF. Genomic islands of speciation separate cichlid ecomorphs in an east African crater lake. Science. 2015;350(6267):1493–8.
Meyer BS, Hablutzel PI, Roose AK, Hofmann MJ, Salzburger W, Raeymaekers JAM. An exploration of the links between parasites, trophic ecology, morphology, and immunogenetics in the Lake Tanganyika cichlid radiation. Hydrobiologia. 2019;832(1):215–33.
Maan ME, Seehausen O, Soderberg L, Johnson L, Ripmeester EA, Mrosso HD, Taylor MI, van Dooren TJ, van Alphen JJ. Intraspecific sexual selection on a speciation trait, male coloration, in the Lake Victoria cichlid Pundamilia nyererei. Proc Biol Sci. 2004;271(1556):2445–52.
Seehausen O, Terai Y, Magalhaes IS, Carleton KL, Mrosso HD, Miyagi R, van der Sluijs I, Schneider MV, Maan ME, Tachida H, et al. Speciation through sensory drive in cichlid fish. Nature. 2008;455(7213):620–6.
Hofmann CM, O'Quin KE, Marshall NJ, Cronin TW, Seehausen O, Carleton KL. The eyes have it: regulatory and structural changes both underlie cichlid visual pigment diversity. PLoS Biol. 2009;7(12):e1000266.
Muschick M, Barluenga M, Salzburger W, Meyer A. Adaptive phenotypic plasticity in the Midas cichlid fish pharyngeal jaw and its relevance in adaptive radiation. BMC Evol Biol. 2011;11.
Brawand D, Wagner CE, Li YI, Malinsky M, Keller I, Fan S, Simakov O, Ng AY, Lim ZW, Bezault E, et al. The genomic substrate for adaptive radiation in African cichlid fish. Nature. 2014;513(7518):375–81.
Kmentova N, Gelnar M, Mendlova M, Van Steenberge M, Koblmuller S, Vanhove MP. Reduced host-specificity in a parasite infecting non-littoral Lake Tanganyika cichlids evidenced by intraspecific morphological and genetic diversity. Sci Rep. 2016;6:39605.
Kratochwil CF, Liang Y, Urban S, Torres-Dowdall J, Meyer A. Evolutionary dynamics of structural variation at a key locus for color pattern diversification in cichlid fishes. Genome Biol Evol. 2019;11(12):3452–65.
Catacchio CR, Maggiolini FAM, D'Addabbo P, Bitonto M, Capozzi O, Signorile ML, Miroballo M, Archidiacono N, Eichler EE, Ventura M, et al. Inversion variants in human and primate genomes. Genome Res. 2018.
Kirkpatrick M. How and why chromosome inversions evolve. PLoS Biol. 2010;8:9.
Conte MA, Gammerdinger WJ, Bartie KL, Penman DJ, Kocher TD. A high quality assembly of the Nile Tilapia (Oreochromis niloticus) genome reveals the structure of two sex determination regions. BMC Genomics. 2017;18(1):341.
Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009;25(21):2865–71.
Ai H, Fang X, Yang B, Huang Z, Chen H, Mao L, Zhang F, Zhang L, Cui L, He W, et al. Adaptation and possible ancient interspecies introgression in pigs identified by whole-genome sequencing. Nat Genet. 2015;47(3):217–25.
Zhao P, Li J, Kang H, Wang H, Fan Z, Yin Z, Wang J, Zhang Q, Wang Z, Liu JF. Structural variant detection by large-scale sequencing reveals new evolutionary evidence on breed divergence between Chinese and European pigs. Sci Rep. 2016;6:18501.
Alfano G, Conte I, Caramico T, Avellino R, Arno B, Pizzo MT, Tanimoto N, Beck SC, Huber G, Dolle P, et al. Vax2 regulates retinoic acid distribution and cone opsin expression in the vertebrate eye. Development. 2011;138(2):261–71.
Lighten J, Papadopulos AST, Mohammed RS, Ward BJ, GP I, Baillie L, Bradbury IR, Hendry AP, Bentzen P, van Oosterhout C. Evolutionary genetics of immunological supertypes reveals two faces of the red queen. Nat Commun. 2017;8(1):1294.
Hablutzel PI, Gregoir AF, Vanhove MP, Volckaert FA, Raeymaekers JA. Weak link between dispersal and parasite community differentiation or immunogenetic divergence in two sympatric cichlid fishes. Mol Ecol. 2016;25(21):5451–66.
Malaga-Trillo E, Zaleska-Rutczynska Z, McAndrew B, Vincek V, Figueroa F, Sultmann H, Klein J. Linkage relationships and haplotype polymorphism among cichlid Mhc class II B loci. Genetics. 1998;149(3):1527–37.
Milinski M, Griffiths S, Wegner KM, Reusch TB, Haas-Assenbaum A, Boehm T. Mate choice decisions of stickleback females predictably modified by MHC peptide ligands. Proc Natl Acad Sci U S A. 2005;102(12):4414–8.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. Genome project data processing S: the sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.
We thank the funding bodies for supporting this research project.
AM, LPD, TM, WH and FDP were supported by the BBSRC Institute Strategic Programme Grant [BB/J004669/1]; WH and FDP are supported by the BBSRC Core Strategic Programme Grant [BB/P016774/1]. The Data Infrastructure group at EI is funded in part by EI’s BBSRC Core Strategic Programme (BBS/E/T/000PR9817). The funding bodies had no role in the design of the study, in the collection, the analysis or the interpretation of data and in writing the manuscript.
Authors and Affiliations
Earlham Institute, Norwich Research Park, Colney Lane, Norwich, NR47UZ, UK
Experimentations on M. zebra individuals were carried out in accordance with the UK Home Office, license schedule 1, at the lab of Dr. Domino Joyce, The University of Hull, UK.
Experimentations on O. niloticus individuals were carried out according to the IACUC certification by the Israeli Ministry of Health’s Council for Experimentation on Animals, licensed schedule 1, at the lab of Dr. Avner Cnaani, Institute of Animal Science, Agricultural Research Organization (ARO), Bet Dagan, Israel.
Consent to participate is not applicable, since our study does not involve human subjects.
Consent for publication
The authors declare that they have no competing interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Schematic of the SV detection pipeline. Supplementary Figure S2. Association of different enriched GO terms across the phylogenetic tree, considering the genes found inside inverted regions (up to 5 Mb). For each node, selected GO terms are shown for the inversion events specific to (and conserved across) the M.zebra + P.nyererei lineage (top right), M.zebra + P.nyererei + H.burtoni lineage (top-centre) and conserved across all species (bottom left). Numbers on the top of each bar indicate the number of observed genes. Supplementary Figure S3. Association of different enriched GO terms across the phylogenetic tree, considering the genes found inside duplicated regions. For each node, selected GO terms are shown for the duplication events specific to (and conserved across) the M.zebra + P.nyererei lineage (top right), the M.zebra + P.nyererei + H.burtoni lineage (top-centre) and conserved across all four species (bottom left). Numbers on the top of each bar indicate the number of observed genes. Supplementary Figure S4. A) Experimental design for the PCR validation of deletion events. Arrows represent primer sequences mapped to the genomic sequence (in blue and red). Primer couple AF1 + AR1 is used to test for the presence or absence of the deletion event (expected to differ by about N bp in the amplification product). Primer couples BF1 + BR1 and CF1 + CR1 are used as a control (expected product:300-400 bp). B-E gel images of PCR run 1, used for the validation of 10 deletion events. See Fig. 7 for a detailed explanation of the figure labels. Supplementary Figure S5. Venn diagram depicting the intersection between filtered SV calls of all three tools (Breakdancer, Delly and Pindel). In the case of insertions, no intersection was found between Breakdancer and the other two tools.
Results of MW test to compare SV size distribution across different conservation categories. For each comparison, the p-value is indicated. In the case of a significant difference, the directionality of the change is indicated. For example, “1 < 2” indicates that the ranks of the 1-species dataset are significantly lower than those for the 2-species dataset.
Genes found inside PCR validated inversions. Each row corresponds to one gene. Genomic coordinates of the associated inversion, and the number of species carrying the inversion are indicated, along with the gene name and ncbi id.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Penso-Dolfin, L., Man, A., Mehta, T. et al. Analysis of structural variants in four African cichlids highlights an association with developmental and immune related genes.
BMC Evol Biol20, 69 (2020). https://doi.org/10.1186/s12862-020-01629-0