Seascape genomics reveals population isolation in the reef-building honeycomb worm, Sabellaria alveolata (L.)

Background Under the threat of climate change populations can disperse, acclimatise or evolve in order to avoid fitness loss. In light of this, it is important to understand neutral gene flow patterns as a measure of dispersal potential, but also adaptive genetic variation as a measure of evolutionary potential. In order to assess genetic variation and how this relates to environment in the honeycomb worm (Sabellaria alveolata (L.)), a reef-building polychaete that supports high biodiversity, we carried out RAD sequencing using individuals from along its complete latitudinal range. Patterns of neutral population genetic structure were compared to larval dispersal as predicted by ocean circulation modelling, and outlier analyses and genotype-environment association tests were used to attempt to identify loci under selection in relation to local temperature data. Results We genotyped 482 filtered SNPs, from 68 individuals across nine sites, 27 of which were identified as outliers using BAYESCAN and ARLEQUIN. All outlier loci were potentially under balancing selection, despite previous evidence of local adaptation in the system. Limited gene flow was observed among reef-sites (FST = 0.28 ± 0.10), in line with the low dispersal potential identified by the larval dispersal models. The North Atlantic reef emerged as a distinct population and this was linked to high local larval retention and the effect of the North Atlantic Current on dispersal. Conclusions As an isolated population, with limited potential for natural genetic or demographic augmentation from other reefs, the North Atlantic site warrants conservation attention in order to preserve not only this species, but above all the crucial functional ecological roles that are associated with their bioconstructions. Our study highlights the utility of using seascape genomics to identify populations of conservation concern.


Background
Conservation strategies in the context of climate change rest on the premise that populations can show three responses to avoid fitness loss: evade, evolve or acclimatise [1]. Characterisation of genetic diversity can shed light on these responses [2], by indicating dispersal and gene flow using neutral genetic markers (evasion), or adaptation using adaptive genetic variation, which inform us about the potential for an organism to evolve in altered conditions [3][4][5]. Separating neutral and adaptive causes of phenotypic differentiation, or lack thereof in a heterogeneous environment (sensu countergradient variation [6];), can require logistically difficult experiments [7]. However, advances in genomic technologies and analyses have opened up new avenues for exploring genome-wide (neutral) versus locus-specific (adaptive) genetic variation between populations, allowing us to compare patterns of adaptive genetic diversity with environmental variation to infer agents of selection [8,9]. These methods have already been successful in contrasting neutral and nonneutral genetic variation in a range of taxa including fish [5,8,10,11], insects [4], and marine invertebrates [12,13] and represent an important step in interpreting genetic diversity in relation to environment.
Despite genomic advances allowing investigations of genetic diversity in non-model species, which is particularly important for species of conservation concern, understanding of gene flow and local adaptation in marine environments had, until recently, lagged behind that for terrestrial species [14][15][16]. Recent advances in seascape genomics have facilitated consideration of the effect of landscape on neutral and adaptive genetic variation in marine environments, taking into account ocean circulation patterns and environmental variation to explain observed population structuring [8,17]. For instance, Benestan et al. (2016) [17] found that ocean currents as well as geographic distance were key to explaining observed patterns of neutral genetic variation in the American lobster (Homarus americanus H. Milne Edwards, 1837), whereas sea surface temperature acted as an agent for natural selection. Ocean circulation modelling for understanding dispersing larval stages has emerged as an important tool to explain neutral population structure where isolation-by-distance measures alone are uninformative [18][19][20].
Intertidal communities have been proposed as particularly suitable for studying the impacts of climate change, as many species already exist at the limit of their thermal tolerance ranges [21,22]. Ecosystem engineers provide and modify habitat for other species and are thus disproportionately important to diversity, ecological functioning, and conservation in a changing climate [1,14,[23][24][25]. Therefore, understanding how littoral ecosystem engineers have adapted to the local environment they experience is key to understanding not only their future survival in a changing climate, but also the potential survival of intertidal ecosystems as a whole.
The honeycomb worm (Sabellaria alveolata (Linneaus, 1767)), is an ecosystem engineer that constructs biogenic reefs along temperate coastlines of North Africa and Europe [26]. They are gonochoric sexually reproducing broadcast spawners. Asexual reproduction is not known to occur in the species, in line with the observation that clonality occurs in less than 1% of polychaetes [27]. Larvae are dispersive but preferentially settle on pre-existing reef or remains, although they do not discriminate based on reef or origin [28,29]. Each individual constructs a tube from sand, which is glued together with biomineralised cement [29,30]. Individuals are gregarious and tubes aggregate to form large bioconstructions or reefs, which can stretch over several kilometres [31][32][33]. These reefs provide a large diversity of microhabitats for other species and alter wave action, water circulation, erosion and sedimentation patterns [34], and phytoplankton concentration [35] and thus modify the availability of resources, leading to a high associated biodiversity compared to surrounding areas [36,37]. Biogenic reefs, including those of Sabellaria, are listed under the EC Habitats Directive (92/43/EEC) Annex 1 and are thus of conservation interest [38]. Despite this, little is known about neutral genetic structure (but see [39,40]) and to our knowledge nothing is known about adaptive genetic structure of S. alveolata. Neutral population structuring is expected because (i) adults are sessile, and (ii) during the dispersive planktotrophic larval stage of this species, individuals have been shown to demonstrate positive taxis towards the cement of sand grains and bioclasts of S. alveolata tubes [41] leading to site philopatry [42]. However, modelling of larval dispersal is needed to fully understanding how the environment influences gene flow within marine systems [17].
We have previously demonstrated that S. alveolata show local adaptation to temperature in terms of their membrane lipid composition [43] but the genomic basis of this adaptation is not known. Therefore, as their distributional range extends along a latitudinal thermal gradient from Scotland to Morocco, S. alveolata provide an excellent opportunity to investigate if/how adaptive population structure is related to temperature, and assess the role of neutral processes in gene flow.
In this study, we explore the extent of neutral and adaptive genetic variation among reef sites of the honeycomb worm, S. alveolata, and the relationship with the environment. We test: 1) if gene flow is restricted among S. alveolata reef sites and whether population structure can be predicted by ocean circulation modelling; and 2) whether S. alveolata show evidence of local adaptation and if this relates to local thermal conditions.

SNP detection and summary statistics
Due to high variability in sequence coverage between samples (37,000 -1,400,000 reads per sample; Additional file 1), only 68 samples across the nine sites were retained for analysis (Table 1). Of these, 50 samples were from sites where temperature data was also available (Table 1; Fig. 1). In total, 439,598 SNPs were identified across 273,397 RADtags with an average coverage depth of 14.85x per sample. Due to low read coverage overall, only 506 variable SNPs were identified, 482 of which were from independent RADtags and could be used for population genetic analyses. Marker independence across sites was confirmed with no loci flagged as showing linkage disequilibrium. Although some loci showed significant deviation from Hardy-Weinberg, the loci involved differed across sites and were thus included in further analyses [45].
All sites showed reasonable numbers of alleles per locus, nucleotide diversity and expected heterozygosity (N alleles = 2.44 ± 0.13, θ = 0.26 ± 0.05, H e = 0.52 ± 0.04). Observed heterozygosity was low across all sites (H o = 0.11 ± 0.02) but was significantly lower than H e only in the Balearic Sea site, which had a low sample size (Table 1). Most molecular variance was found within rather than between sites (within = 71.42%, between = 28.58%, P < 0.01) and within rather than between the genetic clusters identified by STRUCTURE (see below) (within = 84.08%, between = 15.92%, P < 0.01).
From the total number of SNPs, ARLEQUIN identified 191 as outliers (39.63%) and BAYESCAN identified 28 as outliers ( Fig. 2; 5.81%). Twenty seven of the 28 outlier SNPs identified by BAYESCAN as outliers were also identified by ARLEQUIN. Therefore, the 27 outlier SNPs (5.60%) were removed from the data set and 455 (94.40%) were used in the neutral genetic analyses.

Population structure
Pairwise F ST values for neutral loci had an average of 0.28 ± 0.10 across sites, with 25 out of the 36 comparisons showing significance following Bonferroni correction, strongly demonstrating a high degree of population substructure within the system as a whole (Table 2). In particular, the North Atlantic, Tyrrhenian Sea and Bay of Biscay sites were genetically differentiated from all other sites (Average F ST = 0.42 ± 0.10; 0.32 ± 0.12 and 0.25 ± 0.05, respectively; Table 2), suggesting very low gene flow to and from these sites. The Discriminant Analysis of Principal Components (DAPC) results also showed the Tyrrhenian Sea to be highly differentiated from the other sites (Fig. 3a). When the Tyrrhenian Sea was excluded from the analysis, the English Channel population emerged as being differentiated from the remaining populations (Fig. 3b). Using the Bayesian cluster analysis in STRUCTURE, ΔK showed a clear peak at K = 2, suggesting two genetic clusters within the nine sites studied (Fig. 4a). Likelihood probability profiles also showed that K = 2 had the highest likelihood with low variance. The barplot in STRUCTURE for K = 2 identified that individuals from the North Atlantic site formed a distinct population from all the other sites combined (Fig. 4b). This result did not vary regardless of whether sampling location was included as a prior and there was no evidence of hierarchical clustering (K = 1 when North Atlantic individuals were removed). F ST based isolation (Fig. 5a) by straight-line geographic distance ( Table 2) and shortest ocean distance ( Table 2; Fig. 5b) were not significant (r = 0.14, P = 0.27 and r = 0.08, P = 0.27, respectively).
Maps of larval densities from dispersal simulations show the range of potential dispersal pathways for a single generation released from each sample location: from the most conservative (1 week of 6 hourly releases, 5 weeks maximum planktonic larval duration (PLD), Fig. 6a), to the most optimistic (daily releases all year round, 12 weeks PLD, Fig. 6b). The most optimistic dispersal scenarios are suggestive of isolation among sites (Additional file 1). Model simulations suggest isolation among the North Atlantic, northern Irish Sea, and southern Irish Sea sites (Additional file 1), which is reinforced by the F ST distances ( Table 2). Under the conservative scenario estimates, no larvae were exchanged among any sites in a single generation (Fig. 6a). Under the most optimistic scenario, high densities of larvae were retained close to natal release sites (average within 133 km, maximum 230 km) and very few larvae exchanged among release locations (Additional file 1). Potential connectivity was only identified between the northern Irish Sea and southern Irish Sea sites (as per [47]) but < 0.1% of the larval population made this transition. Dispersal distances away from source were shortest from the North Atlantic release site on the west coast of Ireland; this site exhibited the highest proportion of larval retention (79% self-recruited) and very few larvae reached nearby reefs, and none reached other release sites (Additional file 1). Dispersal resistance, as identified using ocean circulation modelling, did not show significant correlation with genetic distance, in terms of F ST , for any of the PLD scenarios (5 weeks: r = − 0.02, P = 0.59; 8 weeks: r = − 0.01, P = 0.59; 10 weeks: r = − 0.006, P = 0.57; 12 weeks: r = 0.00, P = 0.57).

Loci under selection
Both ARLEQUIN and BAYESCAN analyses suggested that the 27 loci identified as outliers were under balancing selection (Fig. 2). Temperature data could not be collected from three of the study sites due to loss of equipment during the data collection period (Balearic Sea, Tyrrhenian Sea and North Africa). Temperature data was thus available for six of the study sites ( Fig. 1) and only these study sites were included in the  Fig. 1 Sampling sites spanning the latitudinal and longitudinal range of S. alveolata, which were also used as the release sites for the Lagangrian model. Sites shown as triangles are those from which temperature was available. Created in R version 3.1.2 [44] environmental association analyses. Of the temperature parameters collected from six of the sites (Table 3), no environmental associations were found with any loci.

Population structure
Population genetic analysis using genome-wide SNPs identified from nine S. alveolata reef sites spanning the latitudinal and longitudinal range of the species revealed low gene flow between sites, equating to low effective dispersal, with 25 [49], which is in line with the estimated larval duration of S. alveolata at 5-12 weeks, dependant on temperature and food availability [42,51]. Indeed, a meta-analysis revealed that mean pelagic larval duration was not a good predictor of gene flow [52] and our findings support this interpretation. Three sites were differentiated from all other sites: North Atlantic (average pairwise F ST = 0.42 ± 0.10 and STRUCT URE), Tyrrhenian Sea (average pairwise F ST = 0.32 ± 0.12 and DAPC) and Bay of Biscay (average pairwise F ST = 0.25 ± 0.05). The observed differences in the results produced by the two cluster assignment methods can be explained by how they use data: STRUCTURE assigns cluster groupings in order to minimise Hardy-Weinberg and linkage disequilibria within clusters by looking at allele frequencies [53,54]. In contrast, DAPC maximizes the variance in linear combinations of allele frequencies (principal components) between groups while minimizing variance within groups [53]. Differences in genetic structure estimated by STRUCTURE and DAPC therefore likely result from how genetic variation is considered in either approach. The northern Irish Sea, southern Irish Sea (in terms Fig. 2 Posterior odds of the selection model and locus-specific F ST for each SNP as calculated in BAYESCAN [46]. The vertical line represents the threshold for significance for a locus to be under selection of P ≥ 0.99. Loci to the right of the threshold line are suggested to be outlier loci of F ST ) and English Channel (in terms of F ST and the DAPC) sites were also all differentiated from each other ( Table 2; Fig. 3b). The ocean circulation modelling predicted high isolation (zero to low larval density; Additional file 1) among all release sites, even in the most optimistic (dispersive) scenario. Therefore, the F ST results are supported, at least in part, by patterns of ocean circulation influencing passive larval dispersal along the Atlantic and Mediterranean coastlines of Europe and North Africa. High population sub-structuring could be partially explained by two behavioural barriers to connectivity: first, spawning in polychaetes is triggered by the action of waves, especially during spring tides, which tends to push larvae toward the coast rather than offshore [55] and second, S. alveolata larvae act to reduce dispersal out of the bay where they were spawned by moving higher in the water column with an incoming-and lower with an outgoing-tide [42]. This 'tidal stream transport', has been suggested as a possible mechanism for facilitating/restricting advective transport in a number of taxa (e.g. [56,57]). Yet despite this, relatively high levels of genetic diversity were observed across the sites (H e = 0.52 ± 0.04). Observed heterozygosity was very low at all sites and could be due to inbreeding [58], as a result of the site isolation predicted by the ocean circulation modelling and due to the reproductive strategy of gregarious colonial species, as juveniles settle together in patches and spawning of one individual triggers the spawning of those immediately adjacent [41]. Heterozygote deficiency has also been recorded in marine invertebrates as a consequence of the Wahlund effect, due to the coexistence of genetically distinct cohorts within a  sampling site [18]. However, low heterozygosity estimations could also be a consequence of poor genome coverage and low sample size at some sites [59,60]. Further research is needed to assess whether presence of null alleles is genome-wide, as evidence of inbreeding, or restricted to particular loci, as evidence of allelic dropout due to low coverage, in order to assess whether the species is at inbreeding risk.
Identifying the relative contribution of gene flow, genetic drift and natural selection to population structure is  difficult in marine invertebrates due to their fluctuating population sizes [9]. This is particularly difficult in S. alveolata, as in-depth local ecological knowledge, such as population size and breeding behaviour, is lacking in the majority of sites (but see [33,[61][62][63]). That said, the identification by STRUCTURE of only the North Atlantic site as a separate, but not completely isolated, population, with all other sites forming a single population, supports the idea that low levels of gene flow are still maintaining genetic diversity within the system as a whole. It could be that asymmetric dispersal, a common feature of seascapes due to unidirectional currents [64] has led to some sharing of genotypes between the North Atlantic and other sites, as observed in the STRUCT URE plot (Fig. 4b). Such wide-ranging genetic connectivity has also been observed in other polychaete species [65]. This is potentially a positive sign for survival of the species in a changing climate, as the maintenance of genetic diversity is key to facilitating rapid evolution when environmental conditions change [66]. However,  S. alveolata reduce their larval duration with increasing temperatures [42] and our models show that shorter larval duration leads to reduced larval dispersal. Therefore, lower connectivity, and thus gene flow, is predicted for S. alveolata in a changing climate. Despite some evidence of low levels of gene flow between the majority of the study sites within the system, both F ST values and STRUCTURE analysis revealed a level of isolation of the North Atlantic site and identified this reef as a separate population to all others. There are two possible processes that could, separately or in synergy, be causing the observed pattern of population structure: historical isolation and contemporary patterns of gene flow. During the Last Glacial Maximum, geomorphology and fossil evidence suggests that southwest Ireland was partially unglaciated [67] and genetic data supports the presence of a glacial refugium in this area for both terrestrial [68,69] and marine species [70][71][72]. In particular, Jolly et al. (2006) [70] found that two coastal polychaete worms (Lagis koreni Malmgren, 1866, formerly Pectinaria koreni, and Owenia fusiformis Delle Chiaje, 1844) showed a private Irish Sea haplotype linking two ancestral haplotypes and they suggest this could have evolved in a small ice-free area along the southwest coast of Ireland. Therefore, one potential explanation is that this population was isolated in a different glacial refugium to the rest of the sites, with subsequent admixture, but further study is needed to test this hypothesis.
The second hypothesis, that contemporary gene flow is very low between the North Atlantic and other S. alveolata sites, is supported by predictions of larval dispersal as seen in the ocean circulation modelling. In even the most optimistic scenario for larval dispersal, the North Atlantic site was not predicted to have interchange of individuals with any of the other sampled sites and had by far the highest predicted proportion of larval retention (79% of released individuals were retained compared to an average of 20 ± 13% for all other sites). This can be explained by the hydrodynamic modelling in terms of current patterns within Galway Bay limiting dispersal. The presence of the North Atlantic Current, which moves eastwardly towards Ireland and then continues northwards, is also an isolating factor for the North Atlantic reef. Any larvae that do move beyond their spawning reef at the North Atlantic site are drawn northwards, beyond the current northern range limit for the species (the Solway Firth, Scotland [43]). This oceanic barrier is also a likely cause of the observed genetic divergence between the northern/southern Irish Sea sites and the English Channel, and the reason that the dispersing larvae from more southerly sites do not reach the North Atlantic site. Although larval dispersal of many other marine species has been found to be mediated by the North Atlantic Current (e.g. [73,74]), isolation due to the North Atlantic Current has not previously been reported, likely because it depends upon species-specific occurrence range and life history traits. Therefore, reduced contemporary gene flow is a likely cause of the observed population isolation of the North Atlantic S. alveolata reef. These findings highlight the North Atlantic reef site as at risk if population size reductions occur as recruitment and genetic augmentation is unlikely from elsewhere within the species' range. Therefore, conservation management is needed to ensure that population size does not decrease at this site, and potentially throughout Ireland.
Despite the importance of ocean circulation to larval dispersal in general [18,19,75] and to S. alveolata in particular, as identified by our data in regards to the isolated North Atlantic site, our ocean circulation models of larval dispersal did not show a significant correlation with observed genetic distance between sites. This is surprising as both F ST values and larval dispersal models suggest low connectivity between sites. This lack of correlation between our larval dispersal modelling and genetic distance values is in contrast to studies on a number of species with a dispersing larval stage including the Mediterranean shore crab (Carcinus aestuarii Nardo, 1847) [19], the bat star (P. miniata) [49] and the American lobster (H. americanus) [17]; these studies found genetic structure was directly related to ocean currents or to estimates of potential larval connectivity obtained with coupled physical-biological models. However, these studies were conducted over a smaller geographical area than our study. As per our study, Jorde et al. (2015) [76] found that when looking at large-scale differentiation patterns in the north Atlantic, geographic distance and larval drift alone explained only a minor portion (2.5-4.7%) of genetic isolation in the northern shrimp (Pandalus borealis Krøyer, 1838). Galindo et al. (2010) [48] modified a biophysical model for Monterey Bay in California to simulate dispersal of the acorn barnacle (Balanus glandula Darwin, 1854) but it also did not match an observed genetic cline in the species. They discovered that their model fit was improved by including natural selection, larval retention, and input values from an additional source population [48].
There are several factors likely to reduce the similarity between modelled larval dispersal and observed gene flow. Gene flow is representative of multigenerational mixing, while the model used in this study is representative of the dispersal patterns of only one generation. It is possible to simulate multigenerational gene flow based on dispersal models [49,77,78] but this would be confounded by the lack of known sites that could be used as "stepping stone" sites, which may facilitate dispersive spread over multiple generations. Such sites are likely to exist but are currently undocumented. These undiscovered sites represent missing source populations within our model and may well be a cause of the observed mismatch between predicted larval dispersal and observed gene flow. The distribution and occurrence of S. alveolata reefs are not well documented, particularly in the southern range of the species (Firth, unpublished), and the prediction of our model that genetic differentiation between our sample sites would be high could be due to the absence of intermediate reef sites within the model that would create higher levels of admixture within the system as a whole [9,79]. This is further complicated by the fact that reefs can vary temporally in their presence at a site [62]. Therefore, increased knowledge of the location of S. alveolata reefs and ongoing monitoring of reef sites is required to generate a clearer picture of the role of connectivity in meta-population maintenance in this species and to further inform the ocean circulation model. Our study highlights the importance of comparing bio-physical models with observed population structure of a species in order to create accurate dispersal predictions.

Loci under selection
Twenty-seven SNPs were identified as outlier loci, all of which were putatively under balancing selection. Identification of a higher number of loci potentially under balancing rather than divergent selection is common in genome scan studies, including those on marine invertebrates [12,50,[80][81][82]. Our finding that 5.6% of SNPs were potentially under balancing selection is reflective of a growing body of evidence that suggests that balancing selection, which acts to preserve polymorphism, is more important in the genome than previously considered [12,50,83]. In their study of the sea anemone, Nematostella vectensis Stephenson, 1935, covering a broad geographical range, Reitzel et al. (2013) [12] identified 37 polymorphic sites inferred to be under balancing selection, but none under divergent selection, as with our study. However, further evidence in the form of elevated polymorphism, reduced differentiation, and shifts towards intermediate allele frequencies are needed to confirm that these loci are indeed under balancing selection [58].
The lack of identification of loci under divergent selection using either outlier analyses or environmental association tests is surprising given that local adaptation to temperature is known to be present in this system [43]. We previously reported that S. alveolata individuals from sites along the latitudinal range of this species (all of which are included in this study) showed different responses to thermal regime changes in terms of their membrane lipid composition dependant on their site of origin [43]. This is likely a reflection of poor genome coverage leading to a low number of genotyped loci (as per [12]) and whole genome sequencing would be beneficial to search for adaptive loci and their functions in this system [84].

Conclusions
Modelling of larval dispersal predicted low gene flow between sites for S. alveolata across their range, which was in part supported by the genomic data. In particular, population genetic analyses identified the North Atlantic reef is an isolated population, due to the effect on dispersal of the North Atlantic Current. These findings have implications for the management of S. alveolata engineered habitats in a changing climate and have highlighted the North Atlantic reef as at increased risk and in need of direct conservation action in order to preserve not only this species, but above all the crucial functional ecological roles (biodiversity hotspots, coastal protection, carbonate traps, etc.) that are associated with these bioconstructions. Our study highlights the utility of using seascape genomics to identify populations of conservation concern.

Sampling and molecular methods
In total, 180 mature individual worms were haphazardly collected from nine sites (20 per site) across the global geographic range between August 2013 and March 2014, from North Africa at the species' southern boundary, along the Atlantic Coast, to the Northern Irish Sea in the north, as well as from two sites on the Mediterranean coast (Fig. 1). Individuals were placed immediately in RNA-later (Qiagen Inc., Crawley), maintained at room temperature for between 48 h and 1 week and then stored at − 20°C until DNA extraction.

SNP detection and summary statistics
All analyses were carried out on the Analyses and Bioinformatics for Marine Science (ABiMS) Galaxy Platform at the Station Biologique de Roscoff (galaxy.sb-roscoff.fr [87];). Samples with mean Phred scores of < 25 or read coverage of < 150,000 were removed from the dataset and sequences were trimmed to 60 bp length using Trimmomatic version 0.36.4 [88]. STACKS version 1.46.0 [89] was used to de novo map the sequences and call SNPs by sequentially running "ustacks", "cstacks" and "sstacks" using the parameters suggested to be optimal for population genetic inference by [90] namely, allowing a minimum depth of coverage of three (ustacks: m; stack coverage), a maximum of two mismatches between reads for a single individual (ustacks: M), and four mismatches between primary and secondary reads (ustacks: N). These settings have been found to increase the number of loci and reduce the SNP and allele error rates within a dataset [90]. Furthermore, the low value of M used here reduces the risk of paralogous loci being merged into the same SNP [91]. The number of mismatches allowed between loci was set at three (cstacks: n [91];).The removal or break up of highly repetitive RAD tags was permitted within the program (ustacks: deleverage).
The programme populations from the STACKS suite, was used to process the SNP data [89]. RADtags with at least 10x depth of read coverage were retained to ensure accuracy of heterozygous SNP calls (populations: p [50]; ). Lumped paralogs can be a concern in high coverage data sets [92], but 10x coverage is considered low [93] and thus reduces this to a low risk factor. A locus had to be present in 70% of the populations to be included in the analysis, which allows for mutations in restriction sites that may cause loci to dropout in certain lineages [50,94] and the minor allele frequency within populations (populations: min_maf) was set at > 0.01 [91]. In order to maximise SNP discovery, the percentage of individuals required within a population to process a locus was set at 50% (populations: r). Only a single SNP per RADtag was retained for subsequent analyses to avoid problems of non-independence between markers [10]. The STRUCTURE file output by populations was transformed in PGD Spider version 2.1.0.3 [95] for use in all further analyses. Each population and locus was tested for deviation from Hardy-Weinberg equilibrium and linkage disequilibrium in ARLEQUIN version 3.5.2.2 [96] and significance was assessed following Bonferroni correction for multiple tests. Loci that showed null alleles in multiple populations, or with significant deviation from Hardy-Weinberg equilibrium, were removed from further analyses to reduce the risk of inclusion of paralogous sequence variants [97].
Genomic summary statistics were calculated in ARLE QUIN version 3.5.2.2 [96] as mean number of alleles per locus (N alleles), nucleotide diversity (θ and π), expected heterozygosity (H e ) and observed heterozygosity (H o ). Separate analyses of molecular variance (AMOVA) were carried out based on site of origin and genetic clusters identified in STRUCTURE (see below) using 16,000 permutations.
An outlier approach was used to identify loci that had a higher or lower F ST than expected under a neutral model of selection (under positive or balancing selection, respectively). Outliers were estimated in two ways: firstly, using BAYESCAN version 2.1 [46], which uses a Bayesian approach to estimate the posterior probability that a locus is affected by selection, and was run using 20 pilot runs of 5000 iterations each, a total of 1,050,000 iterations (sample size of 100,000 and a thinning interval of 10), and a burn-in of 50,000. Only loci with a posterior probability (P) ≥ 0.95 with a prior odd of 10 were considered as outliers. BAYESCAN has consistently outperformed other outlier detection methods in terms of lack of false positives [4,17,98]. Outliers were also calculated in ARLEQUIN using 100,000 simulations, 500 demes, and a maximum expected heterozygosity of 0.5 [99]. Outlier SNPs were identified using a P-value of ≤0.05. Only SNPs identified as outliers by both BAYES-CAN and ARLEQUIN were conservatively selected as candidate loci under balancing or directional selection. Subsequent analyses of population structure and seascape genomics were conducted separately for neutral and outlier loci.

Population structure
Pairwise F ST were calculated in ARLEQUIN version 3.5.2.2 [96] and significance in certainty of the estimator assessed following Bonferroni correction for multiple tests. Straight-line distances between sites were calculated using ArcGIS Version 10.1 (ESRI, 2011) and shortest ocean distances between sites were calculated in R version 3.1.2 [44] using the package marmap [100], where distance was calculated excluding positive elevation [99]. Isolation-by-distance (straight line and shortest ocean) was tested in ARLEQUIN (Version 3.5.2.2) using a Mantel test [101] with 10,000 permutations. To visualise these relationships, neighbor-joining (nj) tree estimation plots were constructed in R package ape 5.3 [102], following the methods outlined in [103], and were constructed using F ST and shortest ocean distances for each pairwise (site) combination. Divergence between sites was also assessed using a Discriminant Analysis of Principal Components (DAPC) with the R package adegenet 2.1.2 [104], with 30 components and two discriminant functions retained in the analysis. Hierarchical structure was also considered by carrying out a second DAPC excluding the most divergent site, as identified in the first DAPC.
Genetic clusters (populations) were inferred using Bayesian analyses carried out in STRUCTURE version 2.3.4 [54]. The analyses assumed admixture, correlated allele frequencies, and were run with 100,000 burn-in cycles and 100,000 Markov Chain Monte Carlo runs. The number of populations (K) was considered between 1 and 10, with 10 replicates per K. This was repeated with and without location priors, and using a hierarchical approach to resolve fine-scale genetic structure by removing highly differentiated populations [45]. The most likely value of K was inferred by estimating ΔK using the Evanno method [105], as implemented in Structure Harvester version 0.6.94 [106].
Hydrodynamic models are increasingly used as predictors of larval dispersal (gene flow) in marine systems (e.g. [20,107,108] and many others). Probabilistic dispersal simulations of potential larval connectivity were run using the Connectivity Modeling System (CMS, [109]) paired with 3-hrly Hybrid Coordinate Ocean Model (HYCOM) and Navy coupled ocean data assimilation (NCODA) global 1/12°daily outputs from 2004, 2010, and 2012 [110]. These years were selected to represent neutral, negative and positive phases of the North Atlantic Oscillation (NAO) respectively, and are thus likely to reflect average dispersal patterns over the last decade [109,111]. Vertical velocities were calculated using the Eulerian continuity equation, and a random diffusive kick was added at each hourly dispersal time-step to better capture the potential effects of sub-gridscale turbulent diffusion (horizontal = 15 m 2 s − 1 , vertical = 0.05 m 2 s − 1 ; after Kough et al. 2016 [112]). HYCOM is a global model appropriate for large landscape continental-scale dispersal simulations, but does not include tides. Larvae were considered as passive particles even though diel vertical migrations may occur [42]. Simulations can therefore be over-dispersive, as the inclusion of tides and behaviour is liable to be retentive [113].
In the simulations, 100 propagules, each representing an S. alveolata larva, were released in a Lagrangian framework (so that each theoretical larvae was tracked individually), from each of nine locations (Fig. 1), and their dispersal under two different release scenarios predicted: (1) a conservative scenario with propagule releases every 6-h over 1 week in April for each year, centred on the spring tide, and (2) an optimistic scenario which modelled a daily release on every day of the year. The optimistic scenario was designed to simulate all the potential pathways of dispersal if reproduction occurs throughout the year, which has been suggested in S. alveolata [42], thus capturing the full variability of potential larval fates. Larval competency to settle was assumed after 4 weeks, and a maximum PLD of 12 weeks applied based on field and laboratory estimates [42,51]. Larvae were permitted to settle when located within 5 km of a known reef site (Additional file 1). Additional (non-release) sites were also included as reported in The Global Biodiversity Information Facility (GBIF.org) in which larvae could also settle if encountered. Larval exchange counts (i.e. the source-sink contribution) were generated after 5, 8, 10, and 12 weeks PLD. Model outputs were processed in Matlab (Mathworks, Version R2016a). Pairwise matrices of larval exchange counts, standardised using proportions, were generated from larval fates averaged across the three focus years to compare against genetic pairwise F ST matrices. Daily positions of larvae throughout the simulation were used to generate cumulative 2-D maps of larval density to highlight the predicted pathways of dispersal and areas of potential settlement.
The proportion of settlers reaching each study site from each release site under the four larval duration scenarios (5, 8, 10 and 12-weeks) in the ocean circulation modelling simulations were inverted and used to form a matrix of pairwise dispersal resistances (i.e. the proportion of larvae that did not reach each site). Mantel tests [101] were carried out in ARLEQUIN (Version 3.5.2.2) [96] using 16,000 permutations to assess the correlation between pairwise F ST (using putatively neutral loci) and dispersal. A Bonferroni correction was applied to assess the significance of the Mantel tests [114].

Loci under selection
Locally collected temperature data provides insight into the conditions that S. alveolata experience at a spatial and temporal scale that cannot be achieved using satellite data [115] and can be used for comparison with genomic data, to test genotype-environment associations [116]. Temperature data were obtained from Seabra et al. (2015) [117] who used biomimetic temperature loggers placed in the intertidal zone of exposed shores [115,118]. At five sites, temperature data were collected with a resolution of 0.5°C at mid-intertidal level every hour between July 2010 and July 2014. High-and low-tide temperatures were also identified to the nearest hour. At the remaining four sites (i.e. English Channel, Balearic Sea, Tyrrhenian Sea and North Africa), an iButton (Maxim, Munich) was placed in the mid-littoral zone, collecting temperature data with a resolution of 0.5°C every 3 h between April 2014 and April 2015. Temperature readings closest to high-and low-tide were identified.
S. alveolata experience tidally-driven fluctuations in water and air temperatures. Sixteen metrics were calculated to describe variation in temperature at each site as follows: mean temperature, maximum temperature, minimum temperature, temperature range (maximum-minimum), standard deviation of the mean, 95th -5th percentile (to exclude outlier temperatures), average daily temperature range (maximum-minimum per day), and average daily standard distribution (standard deviation of the mean), per site. High-tide (water) and lowtide (air) temperature values were also used separately to calculate mean temperature, maximum temperature, minimum temperature, and temperature range (maximum-minimum) for water and air temperatures, respectively ( Table 3). Larvae of S. alveolata can be found in the water column all year round [42], suggesting that they are reproductively active throughout the year, thus temperature data from all months were included in the analyses.
We utilised a gene-environment association software, BAYENV2 [119], to test for covariance between SNP allele frequencies and the 16 thermal variables. BAYENV2 controls for neutral population structure by first estimating a covariance matrix among populations for all loci and then accounting for that covariance in the gene-environment association test [120,121]. The programme was run with 100,000 iterations for both neutral parameterisation and association testing [122]. Outlier loci were classed as those with a Bayes Factor > 3 [8,123,124].