Skip to main content

Aquaporins in the wild: natural genetic diversity and selective pressure in the PIP gene family in five Neotropical tree species



Tropical trees undergo severe stress through seasonal drought and flooding, and the ability of these species to respond may be a major factor in their survival in tropical ecosystems, particularly in relation to global climate change. Aquaporins are involved in the regulation of water flow and have been shown to be involved in drought response; they may therefore play a major adaptive role in these species. We describe genetic diversity in the PIP sub-family of the widespread gene family of Aquaporins in five Neotropical tree species covering four botanical families.


PIP Aquaporin subfamily genes were isolated, and their DNA sequence polymorphisms characterised in natural populations. Sequence data were analysed with statistical tests of standard neutral equilibrium and demographic scenarios simulated to compare with the observed results. Chloroplast SSRs were also used to test demographic transitions. Most gene fragments are highly polymorphic and display signatures of balancing selection or bottlenecks; chloroplast SSR markers have significant statistics that do not conform to expectations for population bottlenecks. Although not incompatible with a purely demographic scenario, the combination of all tests tends to favour a selective interpretation of extant gene diversity.


Tropical tree PIP genes may generally undergo balancing selection, which may maintain high levels of genetic diversity at these loci. Genetic variation at PIP genes may represent a response to variable environmental conditions.


Within the tropics water availability, with soil fertility, is one of the most important environmental factors determining tree species richness [1] and distribution [2]. Although wet tropical regions are characterized by high annual rainfall, seasonality makes it unevenly distributed across the year such that even tropical humid forest can experience seasonal soil drought [3]. Regions currently occupied by luxuriant rainforest have also undergone decade- to century-long drier spells in both recent and geological past [4]. Tree species' natural range may have changed during those periods, but selective pressure may also have acted on extant populations. Genetic mechanisms of drought tolerance are therefore expected to have evolved in tropical tree species, and variation for these mechanisms is expected as different species, and populations within species, are adapted to different soil water availability optima. Study of potentially adaptive natural genetic diversity is needed to understand ecological mechanisms underlying the composition of such diverse ecosystems as tropical forests and to predict species responses to climate change (e.g. Amazonian forest ecosystems have been shown to be sensitive to damage induced by severe droughts [5]). Research on mechanisms and molecular bases of drought stress tolerance have been conducted for decades and several reviews exist [69], but only a few studies have focused on tropical trees as keystone species of tropical forests. There is a need therefore to explore the adaptive potential of forest tree populations [10] in natural tropical ecosystems.

The molecular basis of drought tolerance is extremely complex and a wide variety of expressional candidate genes has been suggested for example in A. thaliana [11] and in trees [12]. Among protein classes involved in response to drought, and the regulation of water balance in general, aquaporins are a good candidate starting point for the exploration of genetic diversity in natural populations of non-model species such as Neotropical rainforest trees, as they are ubiquitous, well known and the focus of in-depth functional studies in plants in general [13] and trees in particular [14]. In prokaryotes and eukaryotes, aquaporins play a channel role in water transport [14]. In plants, they form a large family divided into four subfamilies [15]. We chose for this study plasma membrane intrinsic proteins (PIPs), which are grouped in two subfamilies (PIP1 and PIP2). PIPs are well characterised and share a recent evolutionary history which permits quick isolation of multiple members of the gene family by homology-based methods. Moreover Alexandersson et al. [11] have shown that most PIP transcripts are down-regulated upon gradual drought stress, indicating that they are involved in, or affected by, response mechanisms to drought stress. Therefore, these genes may be under selection in natural populations. We tested the hypothesis that the genes coding for these proteins undergo natural selection in a set of Neotropical species.

The extent and selective/demographic meaning of diversity at drought-response candidate genes in natural populations of forest trees has been analysed in several recent studies (e.g. [1625]). Some of these studies [17, 21, 23] present results on one or a few aquaporin genes, but no signature of selection or demography was detected at these loci when tested.

In this study we developed universal primers, based on plant sequences available in public databases, to sequence PIP genes in five tropical tree species: Pachira quinata (Bombacaceae), Virola sebifera (Myristicaceae), Carapa guianensis (Meliaceae), and two congeneric species Eperua falcata and Eperua grandiflora (Caesalpiniaceae). We describe their nucleotide diversity in natural populations and apply tests for departures from the standard neutral equilibrium to detect patterns that potentially indicate the action of natural selection. Our results point to the action of a combination of balancing selection and demographic events at these loci in populations of Neotropical forest trees.


Amplifications with universal primers allowed the cloning of 1-5 different genes per species (Table 1). For eleven contigs, homology in TAIR databases [26] was found and sequence information was used for the development of gene-specific and species-specific primer pairs for the amplification of seven amplicons (Table 2). Successful amplification conditions were found for six primer pairs, while for one of the three E. falcata-specific primers transfer was possible to the congeneric species E. grandiflora. Using specific primers we sequenced a total of 4036 bp both in coding and non-coding regions (Figure 1). The identity of each amplicon as a different locus was proven at the population level by the detection of both homozygote and heterozygote individuals in the sampled populations. If there had been co-amplification of closely related isoforms, all individuals would have been expected to show heterozygosity for the sites differentiating the isoforms. As further evidence that single genes are amplified by each primer pair, non-synonymous polymorphisms, when occurring, mostly caused replacements between amino acids of similar structure and chemical properties, and in no case were stop-codon mutations detected. This led us to conclude that the amplicons correspond to separate, functional gene loci.

Table 1 Detailed results of isolation of PIPs gene fragments
Table 2 Description of the PIPs fragments amplification conditions
Figure 1
figure 1

Representation of the localisation of the different fragments compared to Arabidopsis thaliana PIP structure. Blue boxes, located on the diagram of the Arabidopsis gene structure, represent the six transmembrane helices.

Polymorphism distribution in each gene fragment and each species is shown in table 3. Polymorphism was identified in all species, although the amount varied according to gene and species. A total of 79 SNPs (including indels) was found, distributed across coding and non-coding regions and between synonymous and non-synonymous sites. Details of the distribution of these polymorphisms in coding and non-coding regions, and between synonymous and non-synonymous sites within the latter, are provided in Table 3 (note that the total number of SNPs in Table 3 sums to more than 79 because they are reported at the population level, not the species level, for the species for which two populations have been analysed).

Table 3 Genetic diversity and results of mutation-drift equilibrium tests for each gene

Haplotype (i.e. gametic phase) reconstruction was generally robust, with P-values for most genotypes higher than 0.90. The number of haplotypes varies greatly between species and populations with only one for C. guianensis population NW (PIP1.1) and 19 in E. falcata population NW (PIP1.1) in French Guiana. Overall, haplotypic diversity is large with values higher than 0.70 except in C. guianensis (H d equal to zero and 0.53 for PIP1.1 in NW and SE populations respectively) and E. grandiflora (H d = 0.23 for PIP2.1).

Two gene amplicons (VsePIP2.1 and EfaPIP1.1 population NW) have very high ρ values (Table 3), in spite of having levels of haplotypic diversity similar to those displayed by other amplicons, perhaps indicating large effective population sizes and/or past recombination with very divergent populations.

Tests for departures from the standard neutral equilibrium (Table 3) were performed taking into account recombination rates. The calculation of neutral confidence intervals for mutation-drift equilibrium statistics was performed at the most-likely values of rho, as a robustness test showed that the confidence interval limits of the statistics did not change substantially when values of rho at least 1000 times less likely than the most-likely value (ΔLOD ≤ 3) were used (see Additional File 1: Supplementary Table S1). In particular, as all significant tests are positive (see below), we were mostly interested in changes that shift upward the upper limit of neutral confidence intervals, which is the critical threshold for significance of positive values of the statistics. The only case for which a dramatic threshold change (> 10%) was observed is Tajima's D for E. falcata population SE at gene PIP1.2 (the upper limit of some values of F S is also modified, but this statistic is not biologically defined when positive). However, as shown in table 3, this statistic is not significant, and therefore changing rho estimates has no consequences for our results. For six amplicons/populations, mutation-drift equilibrium tests gave significant departures from neutral intervals: CguPIP1.1 (population SE), PquPIP1.1, EfaPIP1.1 (population NW), EfaPIP1.2 (population SE), EfaPIP2.1 (population NW) and EgrPIP2.1. Four amplicons/populations out of eleven did not reveal any departure from mutation-drift equilibrium (VsePIP2.1, EfaPIP1.1 (population SE), EfaPIP1.2 (population NW) and EfaPIP2.1 (population SE)). One amplicon/population combination (CguPIP1.1 population NW) could not be tested due to lack of polymorphism.

Tests of selection on gene sequence data are summarised in Table 3 and Figure 2 (a-d). All significant tests show a departure from neutral allele frequency spectrum toward an excess of haplotypes at intermediate frequencies and/or deficit of rare variants (n.b. some values of F S are numerically negative but lie above the neutral confidence interval of the statistic after correction for historical recombination (see Additional File 1: Supplementary Table S1); thus they are effectively positive in terms of their statistical meaning). Although not all tests are significant, the global trend is towards positive and significant values for the test statistics. This indicates either the occurrence of a past bottleneck for most species or the ongoing action of balancing selection. No departure from (demographic) equilibrium towards population expansion or background selection was detected.

Figure 2
figure 2

Graphical representation of observed statistics and their neutral confidence intervals for each of the analysed genes. (a) Tajima's D statistic; (b) Fu and Li's F* statistic; (c) Fu and Li's D* statistic; (d) Fu's F S statistic; (e) Nei's genetic diversity H. A, Carapa guianensis population SE; B, Carapa guianensis population NW; C, Eperua falcata, population NW; D, Eperua falcata, population SW; E, Eperua grandiflora; F, Virola sebifera; G, Pachira quinata. For each species and population, genes are the same as in Table 3. For C and D, genes appear in the order: EfaPIP1.1, EfaPIP1.2, EfaPIP2.1. Filled squares: observed values for each test (red, green = significant, P < 0.05; orange = marginally significant, 0.10 <P < 0.05). Dashed lines: neutral confidence intervals. Values exceeding the upper limit of the confidence interval (red) indicate balanced selection or bottleneck; values lower than the lower limit of the confidence interval (green) indicate negative selection or population expansion (in (e), green squares indicate significant values after Bonferroni correction). "No polymorphism": this population did not show any sequence variation for this gene.

For one gene (EfaPIP1.1), a sliding-window analysis of the mutation-drift equilibrium tests (Figure 3) shows that most positive and significant values are detected in windows centred at nucleotides 230-240, at or just downstream from the predicted intron splicing site. Two more sites display significant statistics within the exon.

Figure 3
figure 3

Sliding window of Fu and Li's F * and D * tests and Tajima's D , along EfaPIP1.1. Stars indicate the centre of windows showing a significant statistic. A horizontal bar indicates that several contiguous windows were significant.

To check whether the observed statistics were compatible with purely demographic events (i.e. bottlenecks in the case of positive tests), all significant mutation-drift equilibrium statistics were re-computed on simulated data sets having the same descriptive statistics as the observed data and having undergone bottlenecks of varying size, duration, and timing. Observed statistics were considered as compatible with a given demographic scenario if they fell within the 95% upper quantile of the values obtained on simulated data. The results of demographic simulations are illustrated in Figure 4 for Tajima's D and for Fu and Li's F*. The observed values of Tajima's D for C. guianensis gene CguPIP1.1 (population SE) (Figure 4 upper left pane) are only compatible with relatively recent and strong bottlenecks (in particular when the lower estimate of mutation rate is used). There is widespread compatibility with several demographic scenarios for the E. falcata gene EfaPIP1.1 (population NW) (Figure 4 upper right pane) when a short bottleneck is simulated, but not for simulations including a longer bottleneck. Simulations for Fu and Li's F* show that the observed statistic is entirely incompatible with any demographic scenario for C. guianensis (Figure 4 lower left pane), while most demographic scenarios are compatible with the observed value for EfaPIP1.1 population NW (Figure 4 lower right pane). Finally, all scenarios are compatible with the observed values for Fu and Li's D* (not shown) and with the significant Tajima's D test observed for EfaPIP2.1 (not shown). The significant departure from standard equilibrium observed for P. quinata (Fu and Li's D*) is also entirely compatible with demographic scenarios.

Figure 4
figure 4

Graphical representation of comparisons between observed test statistics and the 95% quantile of simulated test statistics. Demographic scenarios were simulated with bottlenecks of variable age and strength and with low (μ = 10-9) and high (μ = 10-8) mutation rates. For all plots, y-axis: effective population size during the bottleneck, expressed as fraction of current population size; x-axis: approximate number of generations elapsed since the bottleneck. Lower left cells thus correspond to recent and strong bottlenecks, upper right cells to old and shallow bottlenecks. Dark cells correspond to scenarios compatible with the observed statistic (i.e. observed statistic is lower than the simulated 95% upper quantile). Upper half: tests for Tajima are D; lower half: tests for Fu and Li's F*. Left half: tests for CguPIP1.1 population SE; right half: tests for EfaPIP1.1 population NW.

Results on demographic transitions, as detected by chloroplast SSRs, are shown in table 4 and visually summarised in Figure 2(e), and show global trends that tend to exclude bottlenecks: no species or population displayed signatures of population contraction. C. guianensis, P. quinata and V. sebifera did not display departures from mutation/drift equilibrium after Bonferroni correction, but the tests could only be performed on one locus for V. sebifera due to lack of polymorphism. However, the majority of tests tend to be negative, thus excluding a bottleneck signature. For E. falcata population NW and for E. grandiflora, values of test statistics against population bottlenecks were found at SSR loci concurrently with signatures of population contraction and/or balancing selection at PIP loci. C. guianensis population NW showed signatures contrary to bottlenecks at cpSSRs but was monomorphic for PIP sequences. E. falcata population SE also showed test statistics tending to exclude population bottlenecks, but no mutation-drift equilibrium test on gene amplicons was significant.

Table 4 Results of tests for demographic changes based on chloroplast SSR loci


The present study shows indications of the action of balancing selection on drought response-related genes in Neotropical tree species, although demographic changes in the populations, causing the same kind of departure from the standard neutral model, cannot be excluded, at least for some species (i.e. P. quinata, E. grandiflora and at least one population of E. falcata).

Extensive nucleotide diversity was found for six PIP gene fragments in five tropical tree species; the sequences obtained are useful for the detection of SNP polymorphisms, for genetic diversity analyses and for testing departures from expectations derived from the neutral theory of molecular evolution [27]. In one case (PIP2.1) the same primer pair was used to amplify two congeneric species (Table 2.b), providing a direct comparison between orthologs. For these two genes (EfaPIP2.1 and EgrPIP2.1, table 1) the level of polymorphism in E. falcata was lower than in E. grandiflora, but departures from mutation-drift equilibrium were detected in both species (Table 3), although not for the same statistics.

Standard neutral model tests on approximately half of the gene amplicons gave significant results. In genes and populations with significant departures from mutation-drift equilibrium the general trend is towards population size bottlenecks and/or balancing selection. However, sliding window analyses show that this trend does not apply uniformly to the entire sequence and that different regions of the genes may be subject to different processes. In one case (EfaPIP1.1; Figure 3), sites responsible for the positive and significant test statistics seem to be mostly restricted at the exon-intron boundary, possibly indicating selection on intron splicing sites or other regulatory functions by intronic sequences. Interestingly, when matched against small RNA databases, the intron fragment contained in EfPIP1.1 sequences is similar to immature Arabidopsis thaliana miRNA ath-MIR863 and to Oryza sativa miRNA osa-MIR420 (E-values of 0.048 and 0.064, respectively), indicating a putative functional role for the intron.

Two pieces of evidence favour balancing selection, rather than demography, as an explanation for the diversity patterns observed in this study, at least for a subset of the gene amplicons.

First, simulations of bottlenecks of variable sizes and ages, resulting in the same amounts of genetic diversity as observed in current populations, cannot produce statistics (Tajima's D, Fu and Li's F*) as high as those observed on real data for C. guianensis (Figure 4 left pane), unless moderate to strong and recent bottlenecks are assumed. D* values are however compatible with all demographic scenarios. For E. falcata (Figure 4 right pane), short bottlenecks are generally compatible with the observed Tajima's D, but not long bottlenecks (F* values are compatible with most demographic scenario in E. falcata, indicated by the fact that this statistic is not significant; table 3 and figure 2). As the species studied here were sampled in largely undisturbed portions of Neotropical forests, and are not known to have historically been exploited for timber, strong, short and very recent bottlenecks seem unlikely. However, bottlenecks hundreds of generations ago would be compatible with processes dating to the Late Pleistocene-Early Holocene, which would correspond to large-scale floristic changes in Neotropical forests [28]. The contrasting behaviour of D*, and F* and D, can be explained by what the statistics actually test and by the structure of our data. Both F* and D rely on estimation of the difference between an estimate of θ based on average pairwise sequence divergence and another estimate based on the number of segregating sites (singletons for F*, global estimate for D), whereas D* compares two estimates of numbers of segregating sites (common versus singleton). The former account for the depth of sequence divergence, whereas the latter does not. It appears that, at least in some cases, purely demographic scenarios cannot account for the excess sequence divergence observed in our data, while they can account for the observed number of segregating sites. This is an indication of exceedingly long genealogical branches in our data, which again favours balancing selection as opposed to bottlenecks. Moreover, all the tests applied have been shown to have similar power in the detection of recent bottlenecks [29], so that they are expected to give similar results under a purely demographic scenario.

Second, the tests on chloroplast SSRs generally refuted past bottlenecked populations, even if not all loci detect signatures of population expansion. Variability in the results from different loci is due to the fact that, if all loci describe the same evolutionary process, their statistics are multiple stochastic outcomes from the same probability function. As demography should influence all portions of the nuclear genome, as well as organellar genomes, in the same direction although with different sensitivity for the intensity and timing of demographic events [30], it is unlikely that events in opposing directions would be detected at different loci (recombination tends to smooth diversity estimates across genes, by reducing the variance of θ estimators [31]). Thus we can tentatively conclude that the departures from equilibrium at PIP loci are due to balancing selection. A caveat should be added that, as SSRs generally have different mutation rates than coding sequences, comparison of results from these two types of data may be misleading. In particular, chloroplast SSRs tend to mutate more quickly than nuclear, non-repeated sequences, although more slowly than nuclear SSRs [32], and the demographic events suggested by these markers may be more recent than the results of forces acting on sequence diversity. A further concern is that effective population size and demographic dynamics are not the same for chloroplast and nuclear loci, making the former, being haploid, more sensitive to demographic change. However, a recent demographic event detected in SSRs should also be detectable in sequences. Significant values for tests of departures from standard neutral equilibrium of opposite sign seem to support balancing selection on the expressed loci, while cases where demographic signatures are detected at chloroplast markers but not at PIP loci can be conservatively attributed to differences in marker sensitivity. Note, however, that this argument does not hold if the populations have undergone a bottleneck relatively far in the past (more than approximately 50 generations, the time beyond which the SSR-based methods used here cannot detect a bottleneck). In this case, fast-evolving SSRs may display the effects of the expansion following the bottleneck, while slow-evolving sequences may still display the effect of the bottleneck itself.

The observed departures from hypothetical selective neutrality in spite of a relative paucity of non-synonymous mutations is not surprising, given that regulatory sequences, potentially undergoing selection, may lie anywhere along gene sequences and that introns can have a function besides gene regulation. Among the gene fragments shown by mutation-drift equilibrium tests to putatively experience selection, VsePIP2.1 and EfaPIP1.1 do not contain non-synonymous mutations, and so, for these loci, selection may not be acting directly on the portion of protein-coding sequence analysed here. Selection may however affect other properties of the transcribed sequence, such as codon composition or intron functions. As PIP sequences are conserved among and within species [15] selection could rather act on regulatory regions. Alternatively, these polymorphism patterns may reflect selection on neighbouring sites, although the estimated population parameters suggest that recombination would quickly break down associations between selected sites and associated neutral sites, such that selection signatures may not extend beyond a few hundred base pairs from the site under selection. For genes that did not display any departure from expectations of the standard neutral model, full-length sequencing, including the promoter region, is advisable. In general, evolutionary patterns may diverge among different parts of a gene [33] and there is evidence of balancing selection in the promoter region of the TFL1 gene in Arabidopsis thaliana [34]. On the other hand, demographic events could be responsible for significant departures from the standard neutral model in the whole genome, including gene regions. Moreover, loci outside the sequenced region, but in linkage disequilibrium (LD) with it, may also affect the results of mutation-drift equilibrium tests. This possibility cannot be ruled out for the current data set, as the fragments are too short for testing the decay of LD with distance, although, in some species, population recombination rates are relatively high (Table 3), indicating that LD should rapidly fall to zero. This is generally the case for forest trees, where LD falls below 0.2 within 200-300 base pairs [10]). Therefore the main issue when observing significant mutation-drift equilibrium tests in genes is to discover whether selection or past demography underlie the observed levels of diversity.

For C. guianensis population SE, all tests for departure from mutation-drift equilibrium on locus PIP1.1 gave strongly significant results, while no chloroplast SSR marker did. This pattern is consistent with the maintenance of polymorphism by long-term balancing selection. It is interesting to note that the samples of population SE were collected from a hybridisation zone between C. guianensis and C. procera. Bayesian assignation methods, applied to independent loci (SSRs), show however that the sampled trees belong with a probability of 1 to C. guianensis populations [35], perhaps pointing to a stable and selectively advantageous introgression of C. procera genes into a C. guianensis genetic background. Actually, sequences of CguPIP1.1 obtained in pure C. procera stands show the same haplotype found in population SE of C. guianensis (Casalis et al. in preparation). If this hypothesis holds true, it may also explain the pattern of polymorphism observed for the two populations of this species: the NW population would display the "typical" status of the species, while the SE population, which would have undergone historical introgression, would carry an extra allele derived from the sister species C. procera (n.b. the taxonomic status of the latter species is under revision; P.M. Forget, personal communication). Further analyses of the diversity of PIP genes in the two species will test this hypothesis (M. Casalis et al. in prep.).

The results for PIP genes of E. falcata (the species for which the largest number of different genes was obtained) were divergent for the two populations. In summary (Table 3), population NW displays significant results for D and F* for genes PIP1.1 and PIP2.1, while population SE displays a significant test for gene PIP1.2 and the statistic D*. The results are almost perfectly complementary for the two populations: they show significant results for different genes and different statistics. Moreover, test statistics for population NW largely show incompatibility with demographic scenarios. As stated above, D and F* are more sensitive than D* to the extent of sequence divergence, and thus to balancing selection. Therefore, the simplest explanation for the observed pattern is that balancing selection is detected in population NW for two genes, while demography (i.e. a past population bottleneck) drives diversity patterns in population SE. The fact that chloroplast SSRs tend to support population expansion, rather than bottleneck, for population NW provides a further hint that the positive values of the mutation-drift equilibrium tests are due to selection rather than demography. Selection on the PIP1.1 gene would overcome the baseline demographic signal, which would provide negative statistics which are detected by SSRs (n.b. SSRs can only detect recent bottlenecks, while older ones may still affect sequence diversity). A similar argument can be applied to the only interspecific comparison, for gene PIP2.1 in the two Eperua species. Here, estimates of θ from pairwise sequence differences (θπ) are higher in E. falcata than in E. grandiflora, while estimates from haplotype diversity (θW) are higher in E. grandiflora (Table 3). This apparent contradiction may be explained by the fact that E. falcata has a smaller number of more divergent haplotypes than E. grandiflora (A; table 3). Such results tend to favour different explanations for the diversity patterns in the two species: the historical preservation of highly divergent haplotypes in E. falcata, which is compatible with balancing selection (see above) and the presence of larger numbers of less differentiated haplotypes for E. grandiflora, which may be compatible with recovery from an old bottleneck or population expansion.

Finally, for P. quinata, departure from mutation-drift equilibrium was found for Fu and Li's D*, but the species displayed very little polymorphism at cpSSR loci, and therefore it was difficult to evaluate alternative hypotheses. The observed value of D* was, however, entirely compatible with demographic scenarios. Moreover, sample sizes per sampling site were relatively small for this species (see Methods) and, although tests for population differentiation were all non-significant, statistical power was low. Consequently, the departures from mutation-drift equilibrium may in this case be an artefact caused by hidden population structure. Large year to year variation (up to 100%) in overall rainfall and length of dry season may provide balancing selection, while extensive gene flow (bat pollination, seed dispersal by convection currents) may reduce population differentiation.

Among the cases described in this study, balancing selection may be the most common trend, but it is currently hard to outline why this should be the case. One possibility is that the sampled populations may be further structured along local ecological gradients at a smaller geographic scale, and that different environments favour alternative alleles, thus maintaining genetic diversity within populations. In this case, variation of ecological conditions at a spatial scale smaller than the size of populations may lead to the coexistence of different genetic optima in sub-structured populations, providing patterns that mimic balancing selection. As stated above, hidden population subdivision at loci responding to selective gradients can bias tests such as those applied here [36]. However, since we used PIP genes themselves to group individuals into non differentiated populations, this is likely to be a relatively minor problem in our data set. Alternatively, the wide variation in environmental conditions over time, both between seasons and over longer climatic cycles, may prevent selection from fixing a particular variant, explaining the balancing selection patterns. In general, although selective explanations of observed patterns of diversity should not be taken for granted unless solid evidence is provided, there is no reason either to assume that the most likely explanation should be a demographic one [37].

To test for the presence of selection, further studies could look for association between haplotype and adaptive traits related to water stress, as well as for selection in other candidate genes. Indeed, a large number of genomic or proteomic studies have identified candidate genes for water stress tolerance. For example dehydrin genes are up-regulated by drought stress (BjDHN2 and BjDHN3 in Brassica juncea [38]; PgDhn1 in Picea glauca [39]) and alcohol dehydrogenase (Adh) transcripts are induced by anoxia and hypoxia [40, 41]. Contrasting the results for these genes with analyses of neutral loci to detect demographic events will also be fundamental to disentangle demographic versus adaptive interpretations of tests for departure from the standard neutral equilibrium model. As this is the first genomic study of gene sequences in non-plantation tropical trees, genomic resources that would allow such comparisons among different classes of genomic regions are not yet available.

Another limitation of our study is the length of the fragments analysed for each gene. Statistical tests, such as those we applied, gain more from increases in sequence length than from increases in sample size. Again, the complete lack of genomic resources for non-plantation tropical trees slows down the accumulation of sequence information, although isolation of larger sequences (Vedel et al. in prep.), as well as large-scale genome sequencing (Duret et al. in prep.) are under way. On the other hand, analysing larger sample sizes than generally suggested for this kind of studies was necessary, as the underlying distribution of genetic diversity and delimitation of populations was also unknown for these species. This information was necessary prior to the application of statistical tests. In spite of these limitations, the current data set allowed us to detect informative trends in sequence diversity. The results reported here should actually prove the feasibility of such research programmes, and are expected to prime more extensive investigations into the genomics of non-model tropical trees.


The fragments of PIP (aquaporin) genes analysed here have revealed large variability and potentially strong signatures of past population-genetic events, in some cases with patterns that vary along the gene. Besides being the first report on molecular diversity in genes of ecologically relevant species of one of the most diverse biomes on Earth and on non-plantation tropical trees, the results presented here, if confirmed by further studies, may convey some interesting messages. First, high levels of diversity at PIP genes may be maintained by selective mechanisms - either sensu strictu balancing selection or divergent selection at a local scale, which mimics the effects of balancing selection. Second, this trend appears to be common to a diverse array of species. Extended characterisation of these loci is likely to reveal more details on the processes shaping their diversity and to provide information on the link between genetic diversity and ecological conditions. More generally, systematic characterisation of candidate genes may lead to a more complete picture of the way genotypes interact with their environment in tropical forests and other ecosystems. We are convinced that this is a necessary step towards the consolidation of ecological genetic understanding of tropical ecology. At the same time, the suggested presence of balancing selection seems to fit well in the greater picture of tropical ecosystems: the maintenance of gene diversity by selection at the species level may be another facet of the many mechanisms proposed to explain the high levels of biological diversity observed in the tropics. The extant genetic diversity, and its maintenance by selection, may indicate that these species harbour the adaptive potential to cope with future, expected climate change. This will be particularly crucial if tropical rainforests undergo increased droughts that threaten [5] to severely affect their productivity and therefore their survival.



Five Neotropical tree species from four different families were selected for covering varying geographic and environmental ranges. Carapa guianensis, (Meliaceae), Pachira quinata (Bombacaceae) and Virola sebifera (Myristicaceae) display a continental distribution whereas Eperua falcata and E. grandiflora are endemic to the Guiana shield. All species except Pachira quinata, display preferences for wet, flooded environments (see table 5). Pachira quinata was sampled in three countries (Colombia, Costa Rica and Honduras) across a range of soil (loam to seasonally flooded vertisols) and rainfall (1200 to 3500 mm/year, 3-7 month long dry season) conditions, whereas the other four species were sampled in French Guiana (see Table 5). French Guiana is characterised by annual rainfall varying between 5000 and 2500 mm/year, and by strong seasonality in rainfall (June to December dry season characterised by long and severe dry spells). In French Guiana 5-10 samples per species were collected at sites separated by at least 10 km along a North-West to South-East transect, that spans the rainfall gradient. The total number of samples per species depends on the frequency of occurrence of each species at each site. At each site, samples were collected so that no tree was farther apart from the next sampled tree than the average gene dispersal distance for the species (or congeneric species when information on the species was not available) [42]. This ensures sampling of panmictic populations within sites.

Table 5 Sampling sites and species characteristics

DNA extraction

For all species except Pachira quinata, total genomic DNA was extracted from cambium or leaf tissues following a CTAB method adapted from [43] and [44], starting from approximately 1 cm² of silica gel-dried tissue. DNA quality was analysed by spectrophotometry at 260 nm and 280 nm or by agarose gel electrophoresis. For P. quinata, Qiagen DNeasy 96 Plant Kit (69181) was used to extract genomic DNA from embryos.

Universal primer design, PCR amplification and DNA sequencing

To isolate PIP genes from tropical tree species, all nucleic and amino-acids sequences of Dicotyledonous PIP1 and PIP2 subfamilies available in GenBank were aligned with CLUSTALW. Degenerated, universal primers were designed in the most conserved regions (Figure 5, table 2). To isolate PCR fragments corresponding to the selected loci, two alternative strategies were used, in both cases using degenerate primers for the amplification of fragments corresponding to genes in the subfamilies PIP1 and PIP2. (a) For Eperua falcata, RNA was isolated using the protocol described in [45] and cDNA was obtained using Lambda-ZAP-cDNA Synthesis Kit (Stratagene); (b) for Carapa guianensis, Pachira quinata, Virola sebifera, degenerate PCRs were carried out on genomic DNA. As a control, the latter protocol was applied to E. falcata, to check that the two methods provided similar results. Details of PCR and sequencing conditions are reported in Additional file 2: Supplementary Methods 1(a).

Figure 5
figure 5

PIP1 - PIP2 conserved regions chosen based on CLUSTALW alignments. Blue rectangles represent transmembrane helices two and six, in which the universal primers were chosen (amino-acid sequences in red) (only a subsample of aligned sequences is shown).

Sequence analysis and specific primer design

Base calling and contig assembly were done using CodonCode Aligner v2.0.1 (Codoncode Corporation, Dedham, MA). For each species, contigs were named according to their closest match (either PIP1 or PIP2) in the TAIR database [26], followed by contig number. The partition of genomic fragments in exons and introns (Figure 1) was obtained by alignment of the genomic fragments with publicly available mRNA sequences. To infer whether introns contained any conserved motif, that may undergo natural selection, homology between intron sequences and small RNA families was checked by matching the former to the latter on the Sanger Centre's microRNA database [46].

Sequence information was used for the development of gene-specific and species-specific primer pairs (Table 2, Figure 1). For the identification of priming regions 96 clones were sequenced on both strands for E. falcata, 46 for C. guianensis and P. quinata and 22 for V. sebifera. The sequences retrieved in E. falcata from cDNA and genomic DNA PCRs closely matched (not shown), thus confirming the equivalence between the two strategies for the isolation of fragments of coding regions. The PCR products obtained for each species were expected to contain a mix of fragments from several members of the PIP subfamily, and therefore clones corresponding to different genes had to be sorted prior to the design of specific primers. To do this, sequences were aligned and grouped into clusters using ClustalW [47]. Clusters contained closely related sequencing runs (with less than 5% sequence divergence), thus allowing for sequencing errors; each cluster was considered as representing a different gene. Cluster consensus sequences were aligned and the most divergent regions were identified and used for the subsequent step. Species- and locus-specific primers (tables 1 and 2) were designed in the outermost regions that granted sufficient divergence among contigs within each species to obtain the largest possible fragments with the best specificity. Primer pairs were used to amplify genomic DNA from 8-16 samples per species. Primers pairs that produced multiple PCR products were discarded. Those that produced a single product were sequenced. The sequence traces of a subset of primer pairs showed at least two overlapping sequences, possibly indicating the co-amplification of more than one gene; these primer pairs were also discarded. Details on protocols for gene- and species- specific PCR are provided in Additional file 2: Supplementary Methods 1(b).

DNA polymorphism, population structure and demographic processes

Since the DNA samples were diploid, the identification of haplotypes (i.e. sequence variants) was ambiguous where more than one SNP was present and heterozygote individuals were observed. Diploid sequences were treated using Haplotyper [48] to produce two haploid sequences per individual. Insertions-deletions ("indels") were coded like SNPs: each gap, irrespective of its length, was replaced by a nucleotide producing a SNP to treat indels in subsequent analyses. Indel inference in heterozygote samples was performed based on the comparison of sequences obtained from the two strands and by applying the "Split heterozygote indels" function in CodonCode Aligner.

Population sub-structuring introduces biases in the outcome of mutation-drift equilibrium tests, and therefore the latter must be applied exclusively to panmictic, Wright-Fisher populations (although some tests are robust to population structure). Therefore, the groups of samples obtained for each sampling site were tested for genetic differentiation - based on haplotypes at PIP genes - and lumped together when differentiation could not be detected. For E. falcata and C. guianensis, sampling sites turned out to belong to two clusters (identified as NW and SE populations in tables 3 and 4) and therefore standard neutral equilibrium model tests were performed separately for each population.

Analyses of sequence data were performed using DnaSP v. 4.10.9 [49]. Nucleotide diversity was estimated by Watterson's θw [50] and π, the average number of pairwise nucleotide differences among sequences in a sample [51].

Coalescent simulations with DnaSP were performed with recombination, because the accumulation of historical recombination events influences patterns of sequence diversity even on very short genetic distances. Within-amplicon recombination rate of gene fragments was estimated by LDhat v2.1 [52]; most-likely values of rho and their confidence interval (i.e. the interval of values with a ΔLOD no larger than 3 from the most likely value) were used to test the robustness of standard neutral equilibrium statistics (see Results) to variations of rho. That is, neutral confidence intervals for each statistic were obtained and compared for the most likely value of rho and for two estimates, one on each side of the most likely estimate, the probability of which is 1000-fold lower than the probability of the most likely estimate.

Tajima's D [53], Fu and Li's D* and F* [54] and Fu's F s [55] tests were computed to identify departures from the standard neutral model of evolution. All these tests are based on the comparison of observed levels of DNA sequence diversity obtained from different estimators, which, under neutral conditions of a population with stable effective size and in the absence of selection, estimate the population diversity parameter theta. Departures from the standard neutral equilibrium model affect the various estimators differently, causing their (standardised) difference to be non-zero. Tajima's D-statistic was computed for each locus and reflects the difference between π and θW. Fu and Li's tests (D* and F*) are based on the distribution of mutations in the genealogy and compare the number of "old" and "new" mutations. The Fs test, based on the haplotype (gene) frequency distribution, was also calculated. These tests were preferred over those requiring comparisons to an out-group, due to lack of genomic information on closely related species and the difficulty of correctly identifying orthologs in multiple-gene families such as PIPs. All these statistics were also estimated within each sequence by a sliding-window method. Test statistics were recomputed on windows of one-hundred base pairs length with a step size of two using the function implemented in DnaSP [49].

Since both selection and variations in effective population size can affect the statistics in similar ways, there is no way to deduce which evolutionary force is acting on the populations based on the simple observation of departure from mutation-drift equilibrium in a given direction. We applied two independent strategies to split the effects of different evolutionary forces on the statistics. (1) To test whether purely demographic events were sufficient to explain the observed values of the statistics, simulations were performed using the MS program [56]. Samples of the same size and diversity parameters (theta, population recombination rate, number of segregating sites) as the observed populations were simulated. Simulations were performed assuming a mutation rate per generation per site of μ = 10-8 and μ = 10-9; approximate estimations of effective population sizes were carried out as in [16]. Bottleneck events were simulated [20] for a reduction to a population size between 1/10 and 1/1000 of current estimated population sizes, having occurred between one and ten thousand generations before present, and having lasted for 10 or 100 generations (after which pre-bottleneck population size is instantly restored). These parameters were chosen to provide scenarios with expected positive values for the estimated statistics, as the only significant observed values of D, D* and F* (see results) were positive. Simulations were not performed for F S as this statistic has a clear meaning only when negative [55], whereas only positive significant values were obtained. One-hundred samples were simulated for each combination of bottleneck size, duration and age, and the 95% upper quantile was computed for all statistics for each combination of parameters. Tajima's D was estimated directly with the MS program, whereas D* and F* were computed on the simulated data sets with a suite of R [57] routines specifically designed for this purpose, and available from the corresponding author. The observed values were then compared to the 95% upper quantile of the distribution obtained for each combination of parameters. When the observed value fell below the simulated 95% threshold, it was considered as compatible with the demographic scenario defined by the parameters. In this way, we have devised a strictly conservative test for the detection of balancing selection: the presence of selection is assumed only when no past change in effective population size can be claimed to be responsible for the observed departure from standard neutral equilibrium. (2) The same samples analysed for sequence diversity were also genotyped at eight chloroplast loci using universal primer pairs (ccmp1, ccmp2, ccmp3, ccmp4, ccmp5, ccmp6, ccmp7, ccmp10 [58]). The patterns of diversity obtained on these eight loci were analysed with the "sign test" method included in the BOTTLENECK software package [59] to test for (recent) demographic events in the populations, that may be detected by tests of selection on sequences and thus confound the results. The Stepwise Mutation Model and 1000 replications were used for these tests. The results from chloroplast microsatellites were used as the neutral reference, for which departures from the equilibrium can only be caused by demographic events. Departures from neutral equilibrium, observed on PIP sequences, were compared to results obtained on cpSSR markers to make inferences on the processes that shaped diversity patterns at gene sequences. For this analysis the loci, although fully linked, were tested independently because combining multiple sites, each mutating independently, into a synthetic genotype may introduce a bias on significance thresholds, which are based on independent loci, and not a linear combination of repeat lengths of multiple loci. We are unaware of mutational models that actually take into account linear combinations of SSR loci. On the other hand, no multi-locus test was performed, as they also assume independence. We instead applied a Bonferroni correction to significance thresholds for each of the SSR loci, considered as multiple realisations of the same expected statistical distribution (i.e. the expected frequency spectrum of SSR loci at mutation-drift equilibrium, conditioned on observed number of alleles). Since the method is not affected by departure from Hardy-Weinberg equilibrium [60] it can be applied to haploid data, provided they are from a panmictic population.


  1. ter Steege H, Pitman NCA, Phillips OL, Chave J, Sabatier D, Duque A, Molino JF, Prevost MF, Spichiger R, Castellanos H, von Hildebrand P, Vasquez R: Continental-scale patterns of canopy tree composition and function across Amazonia. Nature. 2006, 443: 444-447. 10.1038/nature05134.

    Article  CAS  PubMed  Google Scholar 

  2. Swaine MD: Rainfall and soil fertility as factors limiting forest species distributions in Ghana. J Ecol. 1996, 84: 419-428. 10.2307/2261203.

    Article  Google Scholar 

  3. Wright SJ: Seasonal drought, soil fertility and the species density of tropical forest plant-communities. Trends Ecol Evol. 1992, 7: 260-263. 10.1016/0169-5347(92)90171-7.

    Article  CAS  PubMed  Google Scholar 

  4. Hammond DS: Tropical forests of the Guiana shield: ancient forests in a modern world. 2005, Wallingford: CABI publishing

    Chapter  Google Scholar 

  5. Phillips OL, Aragao LEOC, Lewis SL, Fisher JB, Lloyd J, Lopez-Gonzalez G, Malhi Y, Monteagdo A, Peacock J, Quesada CA, et al: Drought Sensitivity of the Amazon Rainforest. Science. 2009, 323: 1344-1347. 10.1126/science.1164033.

    Article  CAS  PubMed  Google Scholar 

  6. Newton RJ, Funkhouser EA, Fong F, Tauer CG: Molecular and physiological genetics of drought tolerance in forest species. For Ecol Manage. 1991, 43: 225-250. 10.1016/0378-1127(91)90129-J.

    Article  Google Scholar 

  7. Ingram J, Bartels D: The molecular basis of dehydration tolerance in plants. Annu Rev Plant Physiol Plant Mol Biol. 1996, 47: 377-403. 10.1146/annurev.arplant.47.1.377.

    Article  CAS  PubMed  Google Scholar 

  8. Riera M, Valon C, Fenzi F, Giraudat J, Leung J: The genetics of adaptive responses to drought stress: Abscisic acid-dependent and abscisic acid-independent signaling components. Physiol Plant. 2005, 123: 111-119. 10.1111/j.1399-3054.2005.00469.x.

    Article  CAS  Google Scholar 

  9. Street NR, Skogström O, Sjödin A, Tucker J, Rodríguez-Acosta M, Nilsson P, Jansson S, Taylor G: The genetics and genomics of the drought response in Populus. The Plant Journal. 2006, 48: 321-341. 10.1111/j.1365-313X.2006.02864.x.

    Article  CAS  PubMed  Google Scholar 

  10. González-Martínez SC, Krutovsky KV, Neale DB: Forest-tree population genomics and adaptive evolution. New Phytol. 2006, 170: 227-238. 10.1111/j.1469-8137.2006.01686.x.

    Article  PubMed  Google Scholar 

  11. Alexandersson E, Fraysse L, Sjövall-Larsen S, Gustavsson S, Fellert M, Karlsson M, Johanson U, Kjellbom P: Whole gene family expression and drought stress regulation of aquaporins. Plant Mol Biol. 2005, 59: 469-484. 10.1007/s11103-005-0352-1.

    Article  CAS  PubMed  Google Scholar 

  12. Dubos C, Plomion C: Identification of water-deficit responsive genes in maritime pine (Pinus pinaster Ait.) roots. Plant Mol Biol. 2005, 51: 249-262. 10.1023/A:1021168811590.

    Article  Google Scholar 

  13. Kaldenhoff R, Ribas-Carbo M, Sans JF, Lovisolo C, Heckwolf M, Uehlein N: Aquaporins and plant water balance. Plant Cell Environ. 2008, 31: 658-666. 10.1111/j.1365-3040.2008.01792.x.

    Article  CAS  PubMed  Google Scholar 

  14. Cochard H, Venisse J-S, Barigah TS, Brunel N, Herbette S, Guilliot A, Tyree MT, Sakr S: Putative role of aquaporins in variable hydraulic conductance of leaves in response to light. Plant Physiol. 2007, 143: 122-133. 10.1104/pp.106.090092.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  15. Johanson U, Karlsson M, Johansson I, Gustavsson S, Sjovall S, Fraysse L, Weig AR, Kjellbom P: The complete set of genes encoding major intrinsic proteins in Arabidopsis provides a framework for a new nomenclature for major intrinsic proteins in plants. Plant Physiol. 2001, 126: 1358-1369. 10.1104/pp.126.4.1358.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  16. Brown GR, Gill GP, Kuntz RJ, Langley CH, Neale DB: Nucleotide diversity and linkage disequilibrium in loblolly pine. Proc Natl Acad Sci USA. 2004, 101: 15155-15260.

    Google Scholar 

  17. Gonzalez-Martinez SC, Ersoz E, Brown GR, Wheeler NC, Neale DB: DNA sequence variation and selection of tag single-nucleotide polymorphisms at candidate genes for drought-stress response in Pinus taeda l. Genetics. 2006, 172: 1915-1926. 10.1534/genetics.105.047126.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  18. Heuertz M, De Paoli E, Kallman T, Larsson H, Jurman I, Morgante M, Lascoux M, Gyllenstrand N: Multilocus patterns of nucleotide diversity, linkage disequilibrium and demographic history of Norway spruce [Picea abies (L.) Karst]. Genetics. 2006, 174: 2095-2105. 10.1534/genetics.106.065102.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  19. Eveno E, Collada C, Guevara MA, Leger V, Soto A, Diaz L, Leger P, Gonzalez-Martinez SC, Cervera MT, Plomion C, Garnier-Gere PH: Contrasting patterns of selection at Pinus pinaster Ait. drought stress candidate genes as revealed by genetic differentiation analyses. Mol Biol Evol. 2008, 25: 417-437. 10.1093/molbev/msm272.

    Article  CAS  PubMed  Google Scholar 

  20. Pyhäjärvi T, Garcia-Gil MR, Knurr T, Mikkonen M, Wachowiak W, Savolainen O: Demographic history has influenced nucleotide diversity in European Pinus sylvestris populations. Genetics. 2007, 177: 1713-1724. 10.1534/genetics.107.077099.

    Article  PubMed Central  PubMed  Google Scholar 

  21. Fujimoto A, Kado T, Yoshimaru H, Tsumura Y, Tachida H: Adaptive and slightly deleterious evolution in a conifer, Cryptomeria japonica. J Mol Evol. 2008, 67: 201-210. 10.1007/s00239-008-9140-2.

    Article  CAS  PubMed  Google Scholar 

  22. Ingvarsson PK: Multilocus patterns of nucleotide polymorphism and the demographic history of Populus tremula. Genetics. 2008, 180: 329-340. 10.1534/genetics.108.090431.

    Article  PubMed Central  PubMed  Google Scholar 

  23. Joseph JA, Lexer C: A set of novel DNA polymorphisms within candidate genes potentially involved in ecological divergence between Populus alba and P. Tremula, two hybridizing european forest trees. Mol Ecol Resour. 2008, 8: 188-192. 10.1111/j.1471-8286.2007.01919.x.

    Article  CAS  PubMed  Google Scholar 

  24. Grivet D, Sebastiani F, González-Martinez S, Vendramin GG: Patterns of polymorphism resulting from long-range colonization in the Mediterranean conifer Aleppo pine. New Phytol. 2009, 184: 1016-1028. 10.1111/j.1469-8137.2009.03015.x.

    Article  CAS  PubMed  Google Scholar 

  25. Wachowiak W, Balk P, Savolainen O: Search for nucleotide diversity patterns of local adaptation in dehydrins and other cold-related candidate genes in Scots pine (Pinus sylvestris L.). Tree Genet Genomes. 2009, 5: 117-132. 10.1007/s11295-008-0188-3.

    Article  Google Scholar 

  26. The Arabidopsis Information Resource: []

  27. Kimura M: The neutral theory of molecular evolution. 1983, Cambridge: Cambridge University Press

    Chapter  Google Scholar 

  28. Mayle FE, Power MJ: Impact of a drier early-mid-Holocene climate upon Amazonian forests. Philos Trans R Soc Lond B Biol Sci. 2008, 363: 1829-1838. 10.1098/rstb.2007.0019.

    Article  PubMed Central  PubMed  Google Scholar 

  29. Ramirez-Soriano A, Ramos-Onsins SE, Rozas J, Calafell F, Navarro A: Statistical power analysis of neutrality tests under demographic expansions, contractions and bottlenecks with recombination. Genetics. 2008, 179: 555-567. 10.1534/genetics.107.083006.

    Article  PubMed Central  PubMed  Google Scholar 

  30. Jorde LB, Rogers AR, Bamshad M, Watkins WS, Krakowiak P, Sung S, Kere J, Harpending HC: Microsatellite diversity and the demographic history of modern humans. Proceedings of the National Academy of Sciences of the United States of America. 1997, 94: 3100-3103. 10.1073/pnas.94.7.3100.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  31. Hudson RR: Gene genealogies and the coalescent process. Oxford Surveys in Evolutionary biology. Edited by: Futuyma D, Antonovics J. 1991, Oxford, UK: Oxford University Press, 7: 1-44.

    Google Scholar 

  32. Provan J, Soranzo N, Wilson NJ, Goldstein DB, Powell W: A low mutation rate for chloroplast microsatellites. Genetics. 1999, 153: 943-947.

    PubMed Central  CAS  PubMed  Google Scholar 

  33. Wang RL, Stec A, Hey J, Lukens L, Doebley J: The limits of selection during maize domestication. Nature. 1999, 398: 236-239. 10.1038/18435.

    Article  CAS  PubMed  Google Scholar 

  34. Olsen KM, Womack A, Garrett AR, Suddith JI, Purugganan MD: Contrasting evolutionary forces in the Arabidopsis thaliana floral developmental pathway. Genetics. 2002, 160: 1641-1650.

    PubMed Central  CAS  PubMed  Google Scholar 

  35. Duminil J, Caron H, Scotti I, Cazal S-O, Petit RJ: Blind population genetics survey of tropical rainforest trees. Mol Ecol. 2006, 15: 3505-3513. 10.1111/j.1365-294X.2006.03040.x.

    Article  CAS  PubMed  Google Scholar 

  36. Stadler T, Haubold B, Merino C, Stephan W, Pfaffelhuber P: The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations. Genetics. 2009, 182: 205-216. 10.1534/genetics.108.094904.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  37. Hahn MW: Toward a selection theory of molecular evolution. Evolution. 2008, 62: 255-265. 10.1111/j.1558-5646.2007.00308.x.

    Article  CAS  PubMed  Google Scholar 

  38. Xu J, Zhang YX, Guan ZQ, Wei W, Han L, Chai TY: Expression and function of two dehydrins under environmental stresses in Brassica Juncea L. Mol Breed. 2008, 21: 431-438. 10.1007/s11032-007-9143-5.

    Article  CAS  Google Scholar 

  39. Richard S, Morency M-J, Drevet C, Jouanin L, Séguin A: Isolation and characterization of a dehydrin gene from White spruce induced upon wounding, drought and cold stresses. Plant Mol Biol. 2000, 43: 1-10. 10.1023/A:1006453811911.

    Article  CAS  PubMed  Google Scholar 

  40. Sachs MM, Freeling M, Okimoto R: The Anaerobic Proteins of Maize. Cell. 1980, 20: 761-767. 10.1016/0092-8674(80)90322-0.

    Article  CAS  PubMed  Google Scholar 

  41. Gregerson RG, Cameron L, McLean M, Dennis P, Strommer J: Structure, expression, chromosomal location and product of the gene encoding Adh2 in Petunia. Genetics. 1993, 133: 999-1007.

    PubMed Central  CAS  PubMed  Google Scholar 

  42. Hardy OJ, Maggia L, Bandou E, Breyne P, Caron H, Chevallier M-H, Doligez A, Dutech C, Kremer A, Latouche-Halle C, et al: Fine-scale genetic structure and gene dispersal inferences in 10 Neotropical tree species. Mol Ecol. 2006, 15: 559-571. 10.1111/j.1365-294X.2005.02785.x.

    Article  CAS  PubMed  Google Scholar 

  43. Doyle JJ, Doyle JL: A Rapid DNA Isolation Procedure from Small Quantities of Fresh Leaf Tissues. Phytochemistry Bulletin. 1987, 9: 11-15.

    Google Scholar 

  44. Colpaert N, Cavers S, Bandou E, Caron H, Gheysen G, Lowe AJ: Sampling Tissue for DNA Analysis of Trees: Trunk Cambium as an Alternative to Canopy Leaves. Silvae Genet. 2005, 54: 265-269.

    Google Scholar 

  45. Kiefer E, Heller W, Ernst D: A simple and efficient protocol for isolation of functional rna from plant tissues rich in secondary metabolites. Plant Mol Biol Rep. 2000, 18: 33-39. 10.1007/BF02825291.

    Article  CAS  Google Scholar 

  46. The microRNA database. []

  47. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, and Higgins DG: ClustalW and ClustalX version 2. Bioinformatics. 2007, 23: 2947-2948. 10.1093/bioinformatics/btm404.

    Article  CAS  PubMed  Google Scholar 

  48. Niu TH, Qin ZHS, Xu XP, Liu JS: Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. Am J Hum Genet. 2002, 70: 157-169. 10.1086/338446.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  49. Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R: Dnasp, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics. 2003, 19: 2496-2497. 10.1093/bioinformatics/btg359.

    Article  CAS  PubMed  Google Scholar 

  50. Watterson GA: On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 1975, 7: 256-276. 10.1016/0040-5809(75)90020-9.

    Article  CAS  PubMed  Google Scholar 

  51. Nei M: Molecular Evolutionary Genetics. 1987, New York: Columbia University Press

    Google Scholar 

  52. webpage [], []

  53. Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989, 123: 585-595.

    PubMed Central  CAS  PubMed  Google Scholar 

  54. Fu YX, Li WH: Statistical tests of neutrality of mutations. Genetics. 1993, 133: 693-709.

    PubMed Central  CAS  PubMed  Google Scholar 

  55. Fu YX: Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics. 1997, 147: 915-925.

    PubMed Central  CAS  PubMed  Google Scholar 

  56. Hudson RR: Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics. 2002, 18: 337-338. 10.1093/bioinformatics/18.2.337.

    Article  CAS  PubMed  Google Scholar 

  57. The R Project for Statistical Computing. []

  58. Weising K, Gardner RC: A set of conserved PCR primers for the analysis of simple sequence repeat polymorphisms in chloroplast genomes of dicotyledonous angiosperms. Genome. 1999, 42: 9-19. 10.1139/gen-42-1-9.

    Article  CAS  PubMed  Google Scholar 

  59. Luikart G, Allendorf F, Cornuet J-M, Sherwin W: Distortion of allele frequency distributions provides a test for recent population bottlenecks. J Hered. 1998, 89: 238-247. 10.1093/jhered/89.3.238.

    Article  CAS  PubMed  Google Scholar 

  60. Piry S, Luikart G, Cornuet J-M: BOTTLENECK: a computer program for detecting recent reductions in the effective population size using allele frequency data. Journal of Heredity. 1999, 90: 502-503. 10.1093/jhered/90.4.502.

    Article  Google Scholar 

Download references


This work was funded by the EU-funded INCO "SEEDSOURCE" project, by the French Ministry of Ecology and Sustainable Development "ECOFOR - ECOSYSTEMES TROPICAUX" program, and by the EU-funded PO-FEDER "ENERGIRAVI" program. The authors wish to thank Pauline Garnier-Géré (INRA, Cestas, France) for methodological discussions and help, Saint-Omer Cazal and Jean Weigel for sampling, and Jared Strasburg (Indiana University, Bloomington IN), Santiago Gonzalez-Martinez (INIA, Madrid, Spain) and five anonymous referees for critically reading the manuscript.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ivan Scotti.

Additional information

Authors' contributions

DB, IS, GGV contributed to experimental conception and setup; DA, CSS, IS, DB, PR contributed to sampling strategy choice and sampling; DA, AB, PR, GGV contributed to sequence and marker data collection; DA, AB, LB, IS, GGV contributed to gene isolation and choice of resequencing strategies; DA, IS, CSS contributed to data analyses. All authors read and approved the final manuscript.

Electronic supplementary material


Additional File 1: Supplementary table S1. Neutral confidence intervals for mutation-drift equilibrium statistics as a function of rho for each Gene/population (DOC 70 KB)


Additional file 2: Supplementary methods 1. (a) PCR and sequencing conditions for the isolation of gene sequences and (b) Conditions for Specific PCR amplifications. (DOC 52 KB)

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Audigeos, D., Buonamici, A., Belkadi, L. et al. Aquaporins in the wild: natural genetic diversity and selective pressure in the PIP gene family in five Neotropical tree species. BMC Evol Biol 10, 202 (2010).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: