Study sites
Two field sites were selected near the center of the hybrid zone: a coastal marsh at Popham Beach State Park and an inland marsh at Wharton Point on Maquoit Bay, located on the northeastern coast of the United States, between Brunswick, Maine and Phippsburg, Maine (Fig. 1). We chose these sites with expectations of relatively similar species abundances based on recent abundance estimates [55] and a relatively high number of first-generation hybrids based on a peak in interspecific heterozygosity across the hybrid zone [40]. Leveraging prior work in the southern end of the Saltmarsh-Nelson’s Sparrow hybrid zone [38], we were able to compare levels of hybridization and patterns of introgression on a large spatial scale between the center and southern range margins of the hybrid zone (~ 100 km from the center); including three previously sampled marsh locations: Chapman’s Landing in Stratham, New Hampshire (latitude 43.041; longitude − 70.924); Lubberland Creek in Newmarket, New Hampshire (latitude 43.073; longitude − 70.903); and Eldridge Marsh in Wells, Maine (latitude 43.292; longitude − 70.572).
The two study locations differ in fine-scale habitat (vegetation) characteristics and amount of tidal inundation [39], allowing us to assess patterns of introgression on a small spatial scale between a coastal and inland tidal marsh (~ 20 km apart). The coastal marsh (15-ha plot) at Popham Beach State Park is located at the tip of a peninsula, behind a sand dune system on the Gulf of Maine. The inland marsh is located 7 km inland from the mouth of Maquoit Bay where it meets Casco Bay and 20 km inland from the Gulf of Maine. It is a much smaller marsh (5-ha plot) than the coastal site. Further, the coastal marsh is part of an expansive coastal marsh network, while the inland marsh is located in a small cove that is surrounded by mostly forest and field. Although both sites experience daily and monthly tidal inundation, tide heights tend to be dampened in inland marshes relative to coastal [45]. In addition, through water-level monitoring on the marsh over the two-year study period, we know flooding rates were lower at the inland site compared to the coastal site [56].
Field data collection
To determine the extent of hybridization and patterns of introgression, we sampled the population at both sites during the 2016 & 2017 breeding seasons (Additional file 2 describes all sampled individuals). We followed standardized protocols established by the Saltmarsh Habitat and Avian Research Program (SHARP; www.tidalmarshbirds.org). We performed systematic and opportunistic netting, using 2–6 12-m mist-nets throughout the breeding season to sample the resident adult population. To test predictions of Haldane’s rule and assortative mating, we sampled as many offspring as possible. We conducted intensive nest monitoring at both sites during May—August, encompassing approximately 3 nesting cycles (following SHARP nest monitoring protocols; www.tidalmarshbirds.org). From each nest, nestlings were banded with a USGS aluminum leg band and a single site-specific color band when they were 6 days old. A blood sample (a few drops on a filter card) was also collected from the medial metatarsal vein of each nestling for genotyping, hybrid identification, and molecular sex determination. We also collected any deceased, unbanded chicks (most chick death occurred due to flooding, which allows for carcass recovery) or eggs that had failed to hatch to use in genetic analyses. To determine the identity of females associated with each nest, we conducted targeted mist-netting to capture females off of their nests during incubation or brooding. Once caught, each female was banded with a USGS aluminum band, a site-specific color band, and a PIT tag that was attached to a color band for non-invasive detection of re-nesting attempts. Males were sampled systematically and opportunistically across each study site and throughout the breeding season and banded with a USGS aluminum band and a site-specific color band. We collected standard morphological measurements from all adults and recorded presence/absence of brood patch for females. Blood samples were drawn from the cutaneous ulnar vein and stored on blood filter strips at room temperature for genetic analyses.
ddRAD library preparation
Samples from adult females, nestlings, and salvaged chicks or eggs from the two field seasons were used to prepare double digest restriction site associated DNA (ddRAD) sequencing libraries. In addition, we also used 30 samples each from allopatric Nelson’s Sparrow (Upper Naraguagus, Maine; Wolfville and Yarmouth, Nova Scotia; Hobart Stream, Maine) and allopatric Saltmarsh Sparrow populations (Sawmill Creek, Idlewild, Marine Nature Center and Shirley, New York; Sachuset, Rhode Island; Barn Island, Connecticut) from previous sampling of the hybrid zone for developing a hybrid index (Additional file 1; Fig. 1). DNA was extracted from blood samples using either Qiagen DNeasy Blood and Tissue kit (Qiagen, Valencia, CA) or Zymo Quick DNA kit (Zymo, Irvine, CA) following the manufacturer’s protocol. We determined the concentration of resulting DNA samples using Qubit fluorometer Broad Range double-stranded DNA assay kit (Life Technologies, NY, USA). We targeted a DNA concentration of 5–25 ng/ul. Samples below 10 ng/ul after initial extraction were vacuum centrifuged to concentrate to within the target range. Samples that were above 25 ng/ul were diluted down to 25 ng/ul. A small number of samples below 5 ng/ul were included and grouped into one index group to ensure the best results. ddRAD libraries were created using the protocol described in Peterson et al. [57, 58]. DNA was digested with SbfI and MspI and ligated to P1 and P2 adapters using T4 DNA ligase (30 min at 37 °C and 60 min at 20 °C, held at 10 °C) [58]. Samples were pooled into index groups by their unique P1 adapter and cleaned using 1.5 × Agencourt AMPure XP beads. Using BluePippin (Sage Science, MA, USA), fragments were size selected between 400–700 bp in length. Low cycle PCR reactions were then performed to incorporate the Illumina TruSeq primer sequences into the library, as well as a final clean up using AMPure XP beads. Libraries were visualized on a fragment Bioanalyzer to ensure desired fragment size/distribution and index groups pooled. Resulting libraries were sequenced across three Illumina HiSeq 2500 lanes and one HiSeq 2500 rapid run lane (read length 100 bp) at the Cornell University Institute for Biotechnology (Genomics Facility Research Center).
Bioinformatic data processing and SNP detection
Sequences were initially evaluated for overall quality using FastQC, then trimmed and filtered using FASTX-Toolkit. Specifically, reads were trimmed on the 3′ end to 97 bp and eliminated if the Phred quality scores were below 10 or if 95% of the bases had Phred quality scores below 20. Using STACKS (version 1.48), we demultiplexed the remaining sequences. We used the process_radtags command with the following conditions: any reads not meeting Illumina’s chastity/purity filter and of low quality were discarded, data were cleaned such that any read with an uncalled base was removed, reads with mismatches in the adapter sequence > 1 were removed, and reads were only processed if the sequence had an intact SbfI RAD site and one of the unique barcodes. Subsequently, fastx_trimmer was used to trim all sequences to the length of the shortest sequences. Reads were aligned to the Saltmarsh Sparrow reference genome [59] using Bowtie 2 end-to end option (version 2.2.9). STACKS (version 1.48) was used to identify SNPs. Minimum stack depth for a read to be assembled into a catalog was 6. The number of mismatches allowed between sample loci was set at 5. We filtered catalog loci based on the mean log likelihood of the catalog locus in the population, with the minimum log likelihood set at -300. These filtering steps resulted in the recovery of 5,391 SNPs.
We used the program Populations to subset a panel of SNPs for use in calculating a hybrid index. We chose only one SNP per locus and required that a SNP be present in a minimum of 50% of all individuals, with a minimum stack depth of 6, for it to be included. Subsequently, VCFtools [60] was used to group individuals into 3 populations: (1) all individuals sampled in this study from the center of the hybrid zone, (2) allopatric Nelson’s Sparrows, and (3) allopatric Saltmarsh Sparrows. We then calculated the fixation index (Fst) for each SNP using VCFtools and subsetted the panel further to include only fixed SNPs (Fst = 1) between allopatric Nelson’s and Saltmarsh Sparrows. This resulted in a panel of 135 fixed SNPs that we used for the development of a hybrid index to classify pure and hybrid individual sparrows (Additional file 3).
We also created a separate panel of SNPs to be used in paternity analysis to address questions about assortative mating using only sympatric birds from the inland and coastal study sites (i.e., excluding allopatric samples; Additional file 4). For the paternity panel we again chose only one SNP per locus and required that a SNP be present in a minimum of 95% of the individuals with a minimum stack depth of 6. This resulted in a 589-SNP paternity panel.
Determining genotypic classes
Sparrows were assigned to genotypic classes using methods of Milne and Abbot (2008) [61], as in Walsh et al. [38]. Using this method, which combines hybrid index and interspecific heterozygosity, we placed each individual into one of five genotypic classes consisting of: pure Nelson’s Sparrow, backcrossed Nelson’s, F1/F2 (recent generation hybrids), backcrossed Saltmarsh, or pure Saltmarsh Sparrow. Hybrid index was defined as the proportion of alleles inherited from the Saltmarsh Sparrow (0 = pure Nelson’s Sparrow and 1 = pure Saltmarsh Sparrow), based on the 30 allopatric Saltmarsh and Nelson’s sparrows. Interspecific heterozygosity was defined as the proportion of genotypes that were heterozygous across the species for the parental alleles (0 = all homozygous genotypes, found only in one parental species, and 1 = all heterozygous genotypes across species). Individuals with intermediate hybrid index (0.25–0.75) and high heterozygosity (> 0.3) were considered recent generation hybrids (F1 or F2), and individuals with very low or high hybrid index (0.05–0.25 or 0.75–0.95) and low heterozygosity (< 0.3) were considered backcrossed. Pure individuals were defined by a hybrid index of 0–0.05 (Nelson’s Sparrow) or 0.95–1 (Saltmarsh Sparrow). The Introgress package in R was used for calculating the hybrid index and interspecific heterozygosity [62]. Analyses did not distinguish between F1 and F2 individuals, which were grouped together into an overall recent-generation hybrid category, used throughout. Genetic composition of the coastal and inland populations were compared to allopatric parental populations (Saltmarsh and Nelson’s) using structure, version 2.3.4 [63] and visualized using CLUMPAK [64].
Paternity analyses
We conducted paternity analyses of nestlings using genotype data from the SNP paternity panel and reconstructed mating pairs. Candidate fathers were assigned using the approaches implemented in cervus [65] and colony v2.0 [66]. The maximum likelihood approach of CERVUS uses simulated genotypes from provided data to create a log-likelihood confidence level in true parentage assignments but does not account for unsampled males in the population. To address this problem, we used the full likelihood approach in COLONY, which can assign paternity to a sampled male even if the true father was not among the sampled males. For both methods, we used a genotyping error rate of 1, 95% of loci typed, and candidate father sampling of 70%. We assumed the proportion of sampled mothers to be 95% given the targeted netting identification of females off of their nests. For each site and year, a list of candidate fathers was developed. For 2016, all sampled adult males were included, and for 2017, all males that were sampled in that year, as well as any males from 2016 (adults and offspring as determined from molecular sexing) were included to account for any hatch years that may have returned to their natal site, as well as any returning adult males that may have evaded capture in 2017. For each offspring, we determined the most likely father as assigned by CERVUS (delta trio value ≥ 95%). This was then compared to the paternity assignment made in COLONY. For any discrepancies on confident paternity assignments (> 95%) between the two programs, we compared the number of loci mismatches, delta pair confidence, and overall loci typed to identify the best male assignment. The sex of each offspring was identified by PCR amplification of the CDH1 gene [67, 68] and visualized using gel electrophoresis.
Hypothesis testing
Demographic processes
To test for the influence of relative population density on patterns of introgression, we compared the distribution of genotypes for all individuals (nestlings and adults) between the center and the south of the hybrid zone and between inland and coastal sites within the center of the hybrid zone. Using the observed genotypic distribution of adult birds, a predicted distribution of offspring genotypes was calculated using a contingency table, assuming random mating dependent on the relative abundance of each observed genotypic class for the center and south of the zone, as well as for each site within the center of the hybrid zone, separately. In this contingency table, pure Nelson’s mating with pure Nelson’s Sparrow resulted in another pure Nelson’s Sparrow; and similarly, pure Saltmarsh Sparrow mating with pure Saltmarsh Sparrow also produced a pure species designation. Backcrossed Nelson’s Sparrows within the contingency table were produced from the following three crosses: backcrossed Nelson’s with pure Nelson’s, backcrossed Nelson’s with F1/F2, and backcrossed Nelson’s with backcrossed Nelson’s. Similarly, backcrossed Saltmarsh Sparrows were the result of any of the following three crosses: backcrossed Saltmarsh with pure Saltmarsh, backcrossed Saltmarsh with F1/F2, and backcrossed Saltmarsh with backcrossed Saltmarsh. A mating between a Saltmarsh and Nelson’s Sparrow, a backcrossed Saltmarsh Sparrow and backcrossed Nelson’s Sparrow, as well as a pure Nelson’s with a backcrossed Saltmarsh, or a pure Saltmarsh and a backcrossed Nelson’s were considered F1/F2 designation. Subsequently the observed offspring distribution for each site and hybrid zone location were compared to the predicted offspring genotypic composition using a Goodness of fit Exact Multinomial Test with Monte Carlo approach (ntrial = 100,000) using the EMT package in R (Version 1.1). If patterns of gene flow were controlled by neutral demographic processes alone, we would expect that the observed offspring distribution would be proportional to the one predicted assuming random mating dependent on observed relative population densities of each genotypic class. To determine if there were higher levels of hybridization at the inland or coastal site, we compared the number of recent-generation hybrids (F1/F2 class) between the coastal and inland site using a two-tailed Student’s T-test. We also compared the mean interspecific heterozygosity and hybrid index scores between the study site locations.
Exogenous environmental factors
To test for the influence of habitat on patterns of introgression, we compared the distribution of the genotypic classes between coastal and inland site using a chi-squared test. In addition, two-tailed Student’s t-tests were performed to compare the proportion of backcrossed individuals, mean hybrid index, and mean interspecific heterozygosity levels between the two sites to determine if there was more backcrossing towards Nelson’s Sparrow at inland site and more backcrossing towards Saltmarsh Sparrow at the coastal, as predicted based on recorded directions of local adaptation from previous study [38]. Finally, we compared the observed offspring genotypic class at each site to the predicted distribution based on demographic processes (described above). If habitat were acting on patterns of gene flow, we could expect that the observed offspring genotypic class would differ from that predicted by demographic processes alone, and, specifically, that the backcrossed Nelson’s and pure Nelson’s categories would be higher than expected by random mating at the inland site and the backcrossed Saltmarsh and pure Saltmarsh categories would be higher than expected randomly at the coastal site. The observed offspring distribution for each site was compared to the predicted offspring genotypic composition using a Goodness of fit Exact Multinomial Test with a Monte Carlo approach (ntrial = 100,000). Multinomial confidence intervals were calculated for a post-hoc test to determine which categories differed, such that estimates with confidence intervals that did not contain the theoretical proportion were identified as different from the predicted.
Additionally, if habitat were influencing patterns of introgression, we would expect nesting females to be more site selective than males as a result of the fitness consequences of settlement patterns. We predict that there will be more Nelson-like females nesting at the inland marsh and more Saltmarsh-like females nesting at the coastal marsh, due to known relationships between habitat and nesting success for the two species [41]. To test this, we compared the female genotypic distribution between sites using a Fisher Exact Test for count data. To compare this to observed patterns for males, we then tested the distribution of male genotypic classes between sites, also using a Fisher Exact Test for count data.
Endogenous factors
To test Haldane’s Rule about the sex-biased effects of genetic incompatibilities in hybrids, we determined: (1) if production of recent-generation hybrids was male-biased due to offspring sex ratio manipulation or reduced viability of female eggs or offspring; or (2) if there was reduced occurrence of hybrid females from the nestling to adult stage, suggesting a reduction in survival. To assess reduced survival of females, we compared the proportion of recent generation hybrids among nestling females, adult females, nestling males, and adult males using a 2-sample test for equality of proportions. We performed two-tailed Student’s t-tests to compare the hybrid index of male and female offspring across both sites and the proportion of male offspring produced from interspecific and intraspecific mating events.
Sexual selection
To test for the influence of sexual selection on patterns of introgression, we sought evidence of assortative mating using the results of the paternity analyses. Each mating event was classified into two categories: within or between species. The within species mating included: Nelson’s Sparrow with Nelson’s Sparrow (including backcrossed), and Saltmarsh Sparrow with Saltmarsh Sparrow (including backcrossed). Between species category included: F1/F2 with Nelson’s Sparrow (pure or backcrossed), F1/F2 with Saltmarsh Sparrow (pure or backcrossed), and Nelson’s Sparrows (pure or backcrossed) with Saltmarsh Sparrow (pure or backcrossed). The number of unique mating events resulting from each group was compared. To account for mate availability, an expected distribution of between and within species pairings was determined based upon observed frequencies of the genotypic classes. The proportion of observed between and within species mating pairs was compared to the expected distribution using a 2-sample test for equality of proportions. Under assortative mating, we would expect the observed proportion of between species pairings would be lower than expected levels of interspecies mating assuming random mating.
Assortative mating was also compared between coastal and inland study sites. An expected proportion of within and between species matings were determined for both coastal and inland marshes and compared to observed levels of inter and intra species pairings using a 2-sample test for equality of proportions. Additionally, we compared mating patterns between coastal and inland, testing for differences in the proportion of between species and within species mating across the two sites using a two-tailed Student’s t-test. Finally, we tested for a correlation between the parental hybrid index scores for each offspring using a Pearson product-moment correlation coefficient across all individuals and between our two hybrid-range-center study sites.