High genetic diversity and strong genetic structure of Strongyllodes variegatus (Coleoptera: Nitidulidae) demonstrate the population history of its distribution in oilseed rape production areas in China

Background: Strongyllodes variegatus (Fairmaire) is a major insect pest of oilseed rape in China. Despite its economic importance, the contribution of its population genetics in the development of suitable protection control strategy for oilseed rape crops is poorlys tudied. Using the sequences mitochondrial DNA cytochrome c oxidase subunit I (COI ) and cytochrome b (Cytb ) as genetic markers, we analyzed population genetic diversity and structure of 437 individuals collected from 15 S. variegates populations located in different oilseed rape production areas in China. In addition, we estimated the demographic history using neutrality test and mismatch distribution analysis. Results: The high level of genetic diversity was detected among the COI and Cytb sequences of S. variegates . The population structure analysis strongly suggested three distinct genetic and geographical regions in China with limited gene ow. The Mantel test showed that the genetic distance was greatly inuenced by the geographical distance. The demographic analyses showed that S. variegates experienced population uctuation during the Pleistocene Epoch, which was likely to be related to the climatic changes. Conclusion: Overall, these results demonstrate that the strong population genetic structure of S. variegates in China may is attributed to the isolation through the geographical distance among populations, their weak ight capacity and subsequently adaptation to the regional ecological conditions. and Fu’s Fs values are sensitive to demographic expansion, which usually leads to large negative values. Pairwise mismatch distributions were implemented to test whether a population experienced expansion events. A goodness-of-t test was used to determine the smoothness of the observed mismatch distribution (using Harpending’s raggedness index, Rag) and the degree of t between the observed and simulated data (using the sum of squares deviation, SSD) [54, 55]. The expansion signal for a population was indicated by a smooth and unimodal distribution pattern with non-signicant p-values for the SSD. The time of expansion was evaluated with the formula τ = 2μkt [53], where τ is the crest of mismatch distribution, μ is the nucleotide substitution rate, and k is the number of nucleotides.

For the haplotype network of the COI gene, there was only one common haplotype (H1) in three haplogroups. The haplotype 2 (H2) was only detected and abundant in the CC haplogroup. The haplotype 3 (H3) was only discovered in the CE haplogroup. There were six common haplotypes (H4-H9) between the NW haplogroup and CC haplogroup. A total of ve missing haplotypes were observed in all populations (Fig.  2a). Similarly, for the haplotype network of Cytb gene, there were two common haplotypes (H1, H4) in three haplogroups. The haplotype 2 (H2), the most abundant, was only detected in the CC haplogroup. The haplotype 3 (H3) was only discovered in CE haplogroup. The haplotypes 5-6, 7, 8-9 (H5-H6, H7, H8-H9) were common in both the NW and CC haplogroups, NW and CE haplogroup, CC and CE haplogroup, respectively. A total of four missing haplotypes were observed in the CC haplogroup (Fig. 2b).

Population genetic differentiation
To further assess whether the three inferred clusters of S. variegates populations are genetically distinct, the Bayesian clustering analysis was performed using STRUCTURE. The STRUCTURE analysis showed that the most likely value of K chosen with Evanno's ΔK method was 3, indicating a division of genetic variation into three clusters as well. The proportions of each population that contributed to each of the three clusters are showed in Figure 3. Clusters 1 (red) and 2 (yellow) were contributed mainly from the NW and CC populations, respectively. The CE populations were mainly shared in cluster 3 (green).
A strong genetic divergence was observed across populations (F ST = 0.425, P < 0.0001, Table 2). The F CT value among three regions (NW, CC and CE) was highly signi cant (F CT = 0.470, P< 0.0001, Table 2), further demonstrating that S. variegates populations in China is divided into three regions. A signi cant genetic differentiation was observed among populations within regions (F SC = 0.072, P< 0.0001, Table 2), and within populations (F ST = 0.508, P< 0.0001, Table 2) based on the combined date of the COI and Cytb genes. The percentages of genetic variation within populations (60.16% in the populations between NW and CC regions, and 56.00% between in the populations NW and CE regions) were signi cantly higher than those of the comparisons between regions (33.89% between NW and CC regions, 33.88% between the NW and CE regions) ( Table 2). However, the percentage of genetic variations between the CC and CE regions (54.95%) was higher than that of 42.82% within populations (Table 2), an indicator that there is limited gene ow between the CC and CE regions.
The pairwise F ST values based on the combined date of the COI and Cytb genes among populations ranged from -0.015 to 0.811 (Table 3). In 105 comparisons, 88 comparisons showed a signi cantly high genetic differentiation. The pairwise F ST values among populations within the CC and CE regions were less than 0.159, while the pairwise F ST values between populations from the CC and CE regions were above 0.409. In addition, the pairwise F ST values were high and signi cant among regions (F ST > 0.25, P < 0.001, Table 4), and gene ow among regions was estimated extremely low (Nm < 1, Table 4), suggesting a limited gene ow among regions. The results were greatly consistent with those obtained by the analysis of molecular variance (AMOVA) described in above sections.
The Mantel test based on the combined date of the COI and Cytb genes revealed a signi cant correlation between the genetic distance (F ST /(1-F ST )) and the geographical distances among all populations (r = 0.500, P < 0.0001, Fig. 4).

Demographic analyses
The Tajima's D values obtained with the single and combined gene data in the NW region were negative, but not signi cant (P > 0.05, Table  1). The Tajima's D and Fu's Fs values in the CC and CE regions were negative and highly signi cant (P < 0.05, Table 1), whereas the CE region showed signi cant sum of squares deviation (SSD) values (P < 0.05, Fig. 5, S2). Thus, for the NW and CE regions, the sudden expansion hypothesis was rejected. However, the distributions of the pairwise differences obtained with the single and combined gene data in the CC region were unimodal with non-signi cant SSD and Harpending's raggedness index (Rag) values (

Discussion
Using two mitochondrial genes, we investigated the genetic diversity and structure of 437 individuals collected from 15 S. variegates populations located in different oilseed rape production areas in China. The results exhibited a high genetic diversity and clear genetic structure of S. variegates in the sampled areas.
Based on the analyses of the mtDNA sequences, haplotype distribution, haplotype networks, Bayesian clustering and AMOVA, three genetically diverse and geographically distinct regions of S. variegates distribution in China are classi ed, namely the NW region, CC region and CE region. A high proportion of total genetic variance was attributed to variations within populations (49.18%) and among regions (47.01%). This showed that the largest source of variation might not be due to the geographical barriers among regions but to the variations among individuals within populations. It was reported previously that the variations among individuals within populations had a signi cant effect on the genetic structure of Chilo suppressalis [19]. This contrasts with the studies of Myotis myotis and Plecotus austriacus [20,21], which showed the geographical barrier was the most important effect. Other factors could also play a signi cant role on the genetic structure. Chen and Dorn analyzed the genetic variation of Cydia pomonella populations in Switzerland and found that host speci city, geographic isolation, intrinsic ight capacity and anthropogenic measures could shape the population structure [22].
A limited gene ow (Nm < 1) was revealed among regions by the current study. It is known that once populations have become genetically differentiated, their genetic divergence status can be maintained if they have differentially adapted to regional ecological conditions, since geographic variation in selection can act as a strong barrier to gene ow [23]. Our analysis also suggested a large gene ow among populations within the CC and CE regions. This may be due to the geographical isolation. The Mantel test results showed that the gene ow between the populations was greatly in uenced by geographical distance. This strong isolation-by-distance relationship in our study may be also due to the limited ight capacity of S. variegates. It was reported that S. variegates can y 30~40 m in 2 min [2]. However, the ight ability of S. variegates is less than tens of kilometres and would not be enough to weaken the isolation-by-distance relationships and increase the potential for allopatric or parapatric speciation [24,25]. On the other hand,, the three regions shared common haplotypes, suggesting small amounts of gene ow among regions. This may be because some of adults are mixed into the harvested rapeseed over summer [4,6]. Human intervention in the method of alternating seed breeding in a different location of oilseed rape crops could also paly an important role in the mixing of populations from distant geographic regions and provide the conditions for the gene ow among regions [6].
Gene ow in insects has been reported to increase with mobility, which is more pronounced on herbaceous plants, and this feature is strong especially in agricultural pests [26]. The large genetic variation within populations was also found for the pollen beetle, Meligethes aeneus, another oilseed rape pest [9,[27][28][29]. However, no population structure of the pollen beetle could be found in ve provinces of Sweden [28]. M. aeneus is found to have high altitude ights (up to ca 200 m) at speci c points during the year and low-altitude ights at multiple periods [29], which could help to disperse over large distances with the assistance of prevailing wind currents [30], resulting in the high gene ow similar to the diamondback moths, Plutella xylostella [31].
Both the neutrality test and the mismatch distribution analysis indicated a population expansion in the CC region. Furthermore, the phylogeographic patterns of the COI and Cytb haplotype networks were roughly composed of three "star-like" clusters. Based on 2.3% per site per million years [32], the expansion time of the CC region for COI and Cytb was estimated to be 104 and 128 ka years ago, respectively, within the interglacial time of the Pleistocene. Vast glaciers developed at that time in Tibetan Plateau, Qinling Mountain and even in the Yangtze River valley [33,34], which could trigger episodes of range contractions and expansions in many plant and animal species [35][36][37].
In China, the management practices against S. variegates have primarily focused on using chemicals. The investigation of the genetic diversity of S. variegates populations can provide a useful guide for controlling this pest. Furthermore, localized populations with similar genetic structure should be considered as a same management unit for most effective control [38]. For isolated populations, various management methods should be used, especially, a variety of chemical pesticides with different properties. Additional research will be carried out using other molecular markers, such as nuclear genes, or even faster evolutionary markers, such as microsatellites to obtain better understanding of the population genetic structure and evolutionary history of S. variegates in China, and in the rest of the world if the pest would occur in future.

Conclusions
The current study provides the rst population genetic analysis of S. variegates, a serious pest of oilseed rape crops. The high variability observed in the COI and Cytb molecular markers indicates that the markers are useful for measuring the genetic patterns in S. variegates populations. We con rmed the strong genetic structure of S. variegates populations in China, which could be divided into three genetic haplogroups and geographical regions with the limited gene ow among them. The distribution of this species in oilseed rape production areas in China is mainly structured by the isolation through geographical distance among populations and their weak ight capacity. We also found a population expansion signature in the CC region, which might be related to the climatic changes during the Pleistocene. These results suggest that phylogenetic information could help to guide the development of suitable protection control strategy for oilseed rape crops.

Sampling
A total of 437 S. variegates individuals were collected from 15 populations in China (Fig. 1). Sample size ranged from 24 to 37 individuals per population spot except eight individuals for the ESHB population (Table S2). All S. variegates individuals were freshly collected from the elds and immediately stored in absolute ethyl ethanol at -20℃ before molecular analysis. The PCR products were subjected to electrophoresis on a 1.5 % agarose gel (UltraPure Agarose, Invitrogen) containing 10,000× stock GelRed (Biotium) diluted at 1:10,000, visualized on a BioDoc-it imaging system (UVP) and puri ed using ExoSAP-IT (USB, USA). The PCR products were bidirectionally sequenced (using the above primers) on an ABI 3730XL Automated Sequencer using the BigDye Terminator Cycle Sequencing 3.1 Ready Reaction Kit (Applied Biosystems, USA).  Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Competing interests
The authors declare that they have no competing interests.  a Regions as defined in Fig. 1.

Additional Files
Additional file 1: Table S1 Geographical  The populations in each of three regions are indicated by circles (NW), square (CC) and triangle (CE). Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.

Figure 2
Haplotype networks estimated from the sequences of (a) COI and (b) Cytb. Circles represent haplotype, numbers in the circle represent name of haplotype, small black circles represent missing haplotypes that were not observed, circle size denotes the total haplotype frequency, while each slice represents the haplotype frequency in different populations, and lines between linked haplotypes corresponded to one mutation. Three haplotype regions are indicated by three different colors; NW region (red), CC region (yellow) and CE region (green).  Scatter plots of genetic divergence vs. geographical distance. The genetic divergence FST/(1-FST) and the geographic distance (ln) were compared using the Mantel test with 10,000 permutations. There is a strong correlation between the genetic divergence and the geographical distance in the pairwise comparisons of all populations (r = 0.500, P < 0.0001).

Figure 5
Pairwise mismatch distributions based on the combined date of the COI and Cytb genes for three derived regions. The x coordinate represents the number of pairwise differences among sequences, and the y coordinate represents the frequencies of pairwise differences in each region. The signi cance values (p) of the parameters were evaluated with 1,000 simulations; PSSD: P value for SSD (sum of squared deviations) PR: P value for Rag (Harpending's raggedness index); τ: the index of population expansion.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.