Immigration of Sogatella Furcifera from the Greater Mekong Subregion into Northern China Revealed by 2b-RAD Sequencing

Background The white-backed planthopper (WBPH), Sogatella furcifera (Horváth) (Hemiptera, Delphacidae), is a migratory pest of rice in Asia. Shandong Province, in northern China, is located on the migration pathway of WBPH between southern and northeast China. The potential sources of WBPH in northern China are poorly understood. We studied the sources of WBPH in Shandong Province by determining the population genetic structure of WBPH in 18 sites distributed in Shandong and in six regions of the Greater Mekong Subregion (GMS). We used mitochondrial gene and single-nucleotide polymorphism (SNP) markers (2b-RAD sequencing) for analysis. Results All of the WBPH populations studied in the seven regions had low genetic diversity. Pairwise F ST values ranged from − 0.061 to 0.285, while F ST based on SNP data ranged from − 0.007 to 0.009. These two molecular markers revealed that 4.40% (mtDNA) and 0.19% (SNP) genetic variation could be explained by the interpopulation variation, while the rest came from intrapopulation variation. The populations in the seven geographic regions comprised four hypothetical genetic clusters (K = 4) not associated with geographic location. Eighty-four of 129 individuals distributed across the given area were designated as recent migrants or of admixed ancestry. Although the substantial migration presented, a weak but signicant correlation between genetic and geographic distances was found (r = 0.083, P = 0.004). results northern

as well as between CN_YN and TH (P = 0.045). The haplotype network obviously displayed a two-star pattern with the common haplotype (H1 and H2) in the center of the two stars (Fig. 1). The F ST values among populations represented in the Principal coordinate analysis (PCoA) showed that the Thailand population (TH) was separated from the other populations (Fig. 2a).
Pairwise FST values computed over SNP loci were quite low, ranging from -0.007 to 0.009, with an average FST of 0.002. Pairwise FST value (0.009) between CN_SD and CN_YN was signi cant (P < 0.05).
The PCoA result showed that 100% of the variation was explained by the rst two axes. The rst axis of the PCoA separated CN_SD and CN_YN (Fig. 2b).
The results of the AMOVA test on mtDNA and SNP markers in different populations are shown in Table 4. The global AMOVA of the data for the two molecular markers revealed that 4.40% (mtDNA) and 0.19% (SNP) genetic variation could be explained by the variation among populations, whereas the remainder came from variation within populations. Because of the low sample size in the Thailand population (TH), we reanalyzed global AMOVA excluding TH, and found no signi cant variation among the remaining populations. Based on the results of pairwise F ST , we set the group = 2, which considered CN_SD as one group and the other populations as another group. Signi cant variability was found among SNP between these two groups (0.96%, P < 0.05) ( Table 4). .009* We also analyzed the population genetic structure based on SNP data using STRUCTURE software. The STRUCTURE analyses suggested that WBPH most likely forms four genetic clusters (Fig. 3). Indeed, for K = 4, the log-likelihood of the multilocus genotypic data was maximal and had low variance (Additional le 1: Figure S1). These clusters were not dependent on geographic regions because each population had the four genetic clusters, indicating a high level of gene ow.

Population assignment and isolation by distance
We identi ed 84 individuals as migrants, representing almost 65.12% of the 129 individuals analyzed. There were connections among the Shandong population and other populations, all of which were expected to be a possible source of the Shandong population (Table 5). There were also frequent migrations among the Yunnan population and Southeast Asian populations. It is plausible that a high dispersal rate exists in Yunnan and Southeast Asia areas. However, migrants mainly moved from Yunnan population to the Southeast Asia areas populations.

Genetic diversity of WBPH
The analysis based on the mitochondrial gene (COI) showed that all of the WBPH populations have relatively low genetic diversity. Only two dominant COI haplotypes widely exist in all populations suggesting a high-level gene ow among them. Among the 11 COI haplotypes identi ed, four haplotypes, including the two dominant haplotypes, were also found by previous study [8]. Our study results demonstrate that the haplotype diversity was lower in Shandong Province than that in the GMS, which may be a consequence of founder events during migration. This result is consistent with that of [39] who demonstrated that range expansion can reduce genetic diversity in a long-distance migratory species.
The genome SNP markers (2b-RAD) analysis showed that a heterozygosity de cit existed in all populations. This result may be explained by demographic expansion, and it is consistent with [9] who also found that WBPH had a heterozygosity de cit during expansion. These ndings suggest that WBPH may have non-random mating and intense migration in the sampled populations. With regard to spatial genetic variations, geographic factors play a weak role in WBPH populations. The AMOVA result indicated that there is only 4.40% (mtDNA) and 0.19% (SNP) genetic variation when all of the samples were grouped based on the geographic criteria. These results con rmed previous ndings that WBPH migrates between the counties in GMS and China [9].
Population genetic structure of WBPH The mtDNA and SNP data revealed different patterns of genetic connectivity among the WBPH populations (Table 4, Fig. 2). The main differences between markers concerned the genetic structure Table 5 Assignment test for Sogatella furcifera individuals in the seven geographic regions. Individuals are presented in rows according to their sampling locations as individuals assigned to their own population (self) and those assigned to other putative source population. PopulationSelfPutative source population -In our mtDNA sequence data set, no isolation by distance (IBD) effects were detected with the standardized pairwise F ST (r = -0.028, P = 0.085; Fig. 4a). In contrast, there was a weak but signi cant IBD effect across the seven geographic regions in the SNP data (r = 0.083, P = 0.004; Fig. 4b).
within the populations in Thailand and Shandong, where mtDNA clearly differentiated the Thailand population from the other populations, while SNP data separated the Shandong population from Yunnan population (Fig. 2). Because mtDNA is sensitive to founder effects and small population size, the probable loss or gain of a mtDNA haplotype will be greater for small populations, and it is often used to indicate migration among populations [40]. Based on the results of mtDNA, Shandong had close connectivity with the GMS. Therefore, it seems that the populations in Shandong may have come from the GMS.
SNP data provided information about the genetic structure of WPBH populations. Genetic connectivity among the GMS populations and Shandong population was close. Most of migrants were from Yunnan population as showed by results of population assignment. Compared with the mtDNA data, the SNP data were more consistent with the IBD pattern, perhaps as a result of high information of SNPs. Besides, mtDNA is often unsuitable for detecting isolation by distance [41]. This nding is consistent with the results of [42] who determined the effects of geographic isolation on the genetic structure of WBPH populations in Asia using microsatellite markers. A possible explanation for this might be that WBPH appears to have a stepwise migration. For example, it migrates from Southeast Asian areas into southern China, and then the second or later generations continue to move northward. Both genetic drift and local adaptation may in uence the genetic variations of WBPH. As a consequence, geographic barriers and migration probably acted together to shape the genetic structure of WBPH.
In this study, the SNP markers showed that all of the populations exhibit high levels of admixture between the clusters identi ed with STRUCTURE. This indicates the occurrence of long distance migration events within geographical regions. Long-distance migration of WBPH allowed genetic mixing between populations from remote geographical origins [43]. This pattern may be common in other migratory insects, such as Helicoverpa spp. [44]. A population assignment test using the rst-generation migrant detection method revealed Yunnan as the main source of WBPH in Shandong following by other areas in the GMS ( Table 5). The major migration routes of WBPH in East Asia were illustrated by [1] who reviewed previous studies of trajectory analyses. From mid-June to July, WBPH migrate from southern China to paddy elds in the middle and lower reaches of the Yangtze River, western Japan, and Korea. Because information on the migration of WBPH in Shandong is limited, many trajectory analysis and migration simulation usually neglect this area. Our results provide useful data for the migration route and source of WBPH in Shandong, which provide a better understanding of its migration routes. We inferred that WBPH in Shandong represents an important migration station. In this area, WBPH can establish connections with populations in Liaoning (North China), Korea, and Japan.

Dispersal of WBPH in China
Weather conditions are thought to expedite long-distance immigration of planthoppers [45]. In southern China, early season rice is planted in late March or early April. WBPH migrates into this area from Southeast Asian countries, such as Vietnam, Laos, and Thailand [5]. From May to June, WBPHs continually migrate into the Yangtzi River Valley. However, during this period, the emigration of the WBPH in southern China is often hindered by heavy precipitation in southern China [46]. Because the migration of WBPH mainly depends on seasonal weather systems, the WBPHs cannot migrate further north before mid-June [46]. Based on these data, we infer that the northward migration of WBPH during June and July largely contributes to the populations in the Shandong area.

Conclusions
This study demonstrated that WBPH populations have a low level of genetic diversity and a mixed genetic structure. We arranged the samples in chronological order, which depended on the occurrence of WBPH. Rice planting in Shandong mainly begins in May, and the WBPHs often have population outbreaks in July and August. Although the GMS were revealed as the main source of WBPH in Shandong, WBPH probably expand their range in a stepwise manner. Populations reproducing in other areas of China, such as Yangtze River Valley, Guangxi Zhuang Autonomous region, and Guizhou Province, may also be important sources. Future study is needed to examine more geographic populations and understand the temporal and spatial genetic structure of WBPH in China.

Methods
Insect samples WBPH individuals were sampled in various geographic regions (China and Southeast Asia) ( Table 1, Fig. 1). These comprised six sites in Yunnan Province; four sites in Shandong Province; two sites in Laos, Cambodia, and Vietnam, respectively; and one site in Myanmar and Thailand, respectively (Table 1). Samples were put in 95% ethanol and stored at -20 °C until DNA extraction.

Mitochondrial Coi Sequencing
Insect DNA was extracted using the TIAMamp Micro DNA Kit (Tiangen, Beijing, China) according to the manufacturer protocol. The mtDNA COI gene was ampli ed using primers 2195-MF(5'-CTGGTTYTTTGGTCATCCRGARGT-3') [16] and a newly designed reverse primer 2830-R(5'-CAATCAGCATAATCTGAATATCG-3') (Sangon Biotech, Shanghai), which ampli ed a 635-bp fragment. The PCR reactions were performed in 20 µl solutions containing 1 × buffer, 0.32 mM of each dNTP, 1.0 mM of each primer, 1.0 unit of Taq DNA polymerase, and 2 µl of template DNA. PCR was performed under the following conditions: initial denaturation at 95 °C for 5 min, followed by 35 cycles of 1 min at 94 °C for denaturation, 1 min at 54 °C, for annealing and 1 min at 72 °C for elongation, and nal extension at 72 °C for 5 min. The PCR products were electrophoresed in a 1.0% agarose gel in TAE and were sequenced bi-directionally. Sequencing quality was evaluated, and sequencing results were manually corrected using BioEdit 7.2.6 software [17], followed by BLAST for homology comparison in NCBI. The alignment of sequences was performed using multiple sequences of the Clustal W algorithm in MEGA7.0 [18].

2b-rad Sequencing And Genotyping
The 2b-RAD sequencing and genotyping were outsourced to Shanghai OE Biotech Ltd. (Shanghai, China). Libraries were constructed following the 2b-RAD protocol [13]. Brie y, library preparation began with digestion of DNA samples. The BsaXI restriction enzyme (New England BioLabs, Ipswich, MA, USA) was used to prepare RAD libraries. Next, library-speci c adaptors and the digestion products were linked with T4 DNA ligase. Ligation products were ampli ed by PCR, and the target band was excised from a 2% agarose gel. Finally, the paired-end RAD tags were sequenced on the Illumina Hiseq Xten platform (Illumina, San Diego, CA, USA). Quality ltering was conducted as follows: raw reads were trimmed to remove adaptors, and the terminal 2-bp positions were discarded to eliminate artifacts that might have arisen by ligation. Ambiguous bases (N) or reads of low quality (> 10 bp with quality less than Q20) were removed. SNPs were determined, and genotypes were called using a maximum-likelihood statistical model implemented in the software Stacks v1.32 [19].

Genetic Variation Analysis Based On Mitochondrial Data
Numbers and distribution of haplotypes, composition of haplotypes in each population, numbers of unique haplotypes, within-population mean number of pairwise differences, and nucleotide diversity were assessed using DnaSP v.5.10 [20]. The statistical parsimony network (also known as the TCS network) of haplotypes was analyzed using Popart ver. 1.7 [21,22].

Genetic Variation Analysis Based On Snp Data
The genotype data contained information for each locus and each individual. The primary SNP loci number was 13,565, which could genotype all 133 individuals. We used Plink version 1.07 [23] to lter SNPs for genetic analysis. SNPs were ltered to meet the following criteria: (a) SNPs that were included in at least 80% samples of a population, (b) SNPs with a minor allele frequency (MAF) higher than 0.05, and (c) loci with strong deviations from the Hardy-Weinberg equilibrium (HWE, P < 0.0001) were removed. We excluded samples from populations that had too many missing data from further analyses reducing our sample size to 129 individuals. The nal ltered SNP dataset had 1,108 SNP loci and was used for all downstream analyses. The parameters for population genetic analyses, that is, percentage of polymorphic loci (%poly), Shannon's information index (I), observed heterozygosity (H O ), unbiased expected heterozygosity (uH E ), and xation index (F), were estimated by using GenALEx 6.5 [24,25].

Population structure
We evaluated population genetic structure using ve different approaches: (i) measuring genetic differentiation (F ST ) among populations, (ii) Principle Coordinate Analysis (PCoA) (iii) hierarchical analyses of molecular variance (AMOVA), (iv) Bayesian model-based clustering, and (v) Isolation by distance (IBD).
For mtDNA data, the pairwise Fst were calculated using Arlequin v.3.5.1.2 [27] and using the Tamura-Nei model [28]. For SNP data, the pairwise Fst were calculated using GenALEx. Principal coordinates analysis (PCoA) was used to nd and plot the major pattern within a genetic distance matrix dataset. The PCoA using GenALEx, performed on genetic distance, was used to display genetic divergence among the populations. To determine the proportion of genetic variation that could be attributed to differences between sampling sites, hierarchal analyses of molecular variance (AMOVAs) were performed. A hierarchical AMOVA was performed using Arlequin, with 1,000 permutations. Populations were grouped corresponding to two major criteria, i.e., geographical area, and population genetic structure, to test genetic homogeneity in different hierarchies.
The Bayesian approach was used to determine genetically distinct groups (or clusters) using the program STRUCTURE v.2.3.1 [29][30][31][32]. We set the length of the Burnin period at 10,000 and number of MCMC Reps after Burniin was 20,000. We set the K value from 1 to 7 and for each K the number of iterations was 10.
To estimate the group number, we used the online calculation developed by [33]. We examined the change in Ln P(D) using the deltaK approach [34]. Because the STRUCTURE software showed results of each 10 replications in the case of K = n, we used CLUMPP to average these results [35]. All of the data were visualized through DISTRUCT v.1.1 [36].
To estimate the admixture between geographic populations, we used a Bayesian assignment method as implemented in Geneclass2 [37]. This analysis identi es putative rst-generation migrants among populations. To calculate individual probabilities of assignment to each population, we used the Monte-Carlo resampling method [38] with 1,000 simulated individuals at probability thresholds of α = 0.05. Isolation by distance (IBD) analysis was performed using Mantel tests (1,000 permutations) in GenAlex to nd the correlation between genetic and geographic distances.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The datasets of SNP genotype and summary statistics le can be accessed via Dryad. https://doi.org/10.5061/dryad.kwh70rz1c

Competing interests
The authors declare that they have no competing interests.