Skip to main content

Short-wavelength sensitive opsin (SWS1) as a new marker for vertebrate phylogenetics



Vertebrate SWS1 visual pigments mediate visual transduction in response to light at short wavelengths. Due to their importance in vision, SWS1 genes have been isolated from a surprisingly wide range of vertebrates, including lampreys, teleosts, amphibians, reptiles, birds, and mammals. The SWS1 genes exhibit many of the characteristics of genes typically targeted for phylogenetic analyses. This study investigates both the utility of SWS1 as a marker for inferring vertebrate phylogenetic relationships, and the characteristics of the gene that contribute to its phylogenetic utility.


Phylogenetic analyses of vertebrate SWS1 genes produced topologies that were remarkably congruent with generally accepted hypotheses of vertebrate evolution at both higher and lower taxonomic levels. The few exceptions were generally associated with areas of poor taxonomic sampling, or relationships that have been difficult to resolve using other molecular markers. The SWS1 data set was characterized by a substantial amount of among-site rate variation, and a relatively unskewed substitution rate matrix, even when the data were partitioned into different codon sites and individual taxonomic groups. Although there were nucleotide biases in some groups at third positions, these biases were not convergent across different taxonomic groups.


Our results suggest that SWS1 may be a good marker for vertebrate phylogenetics due to the variable yet consistent patterns of sequence evolution exhibited across fairly wide taxonomic groups. This may result from constraints imposed by the functional role of SWS1 pigments in visual transduction.


Opsins, or visual pigments, form the first step in the visual transduction cascade in the photoreceptor cells of the retina. By means of a covalently-bound retinal chromophore, opsins are able to respond to light by changing conformation, which activates a second messenger G-protein, and triggers a biochemical cascade that eventually results in a neural signal to the brain that light has been perceived [1]. Opsins are a member of the extremely large superfamily of integral membrane G-protein coupled receptors (GPCR's), with thousands of genes present in the human genome alone [2]. This family is involved in a diverse array of physiological functions in vertebrates, including neurotransmission, learning, memory, and various endocrine and hormonal pathways. All of its members are thought to share the same tertiary structure, mechanisms of activation, and activation of G proteins, even if the downstream effectors of the G proteins may differ. Despite the vast array of functions mediated by this family of receptors, the highly conserved seven helical transmembrane structure of GPCR's as a whole (particularly the Class A type, of which opsins are a member) has ensured that insertions and deletions remain rare, particularly in transmembrane regions.

Visual pigments can vary widely in their wavelength of maximal absorption, ranging from the ultraviolet to the red. The molecular basis of spectral sensitivity depends on interactions between amino acids within the binding pocket of an opsin protein and its associated light-sensitive chromophore. Any variation in the amino acid sequence of a given opsin can, therefore, directly influence the spectral wavelengths an organism can detect. Phylogenetically, visual pigments are divided into 5 groups, roughly reflecting their function in vision, such as whether they are active during the day (cone opsins) or at night (rod opsins), and the spectral tuning of the wavelengths at which they are maximally sensitive [35]: red/green or long-wavelength sensitive cone opsins (LWS; approx. 500–570 nm), rod-like or medium-wavelength sensitive cone opsins (RH2; approx. 465–520 nm), ultraviolet/violet or short-wavelength sensitive type 1 cone opsins (SWS1; approx. 360–430 nm), blue or short-wavelength sensitive type 2 cone opsins (SWS2; approx. 430–460 nm), and the rod opsins active at low light levels (RH1; approx. 500 nm). The SWS1 opsins are the shortest wavelength sensitive opsins, and are generally expressed in a particular type of cone photoreceptor found throughout vertebrates that is characterized by an extremely short outer segment [6], though exceptions do exist [7]. Only a few types of vertebrates, such as those living in extreme low light environments (subterranean or deep sea habitats) are thought to lack this type of cone. For example, pseudogenes have been identified in the blind Ehrenberg's mole rat [8], as well in the bottle-nosed dolphin [9] and a number of whales [10]. Similarly, organisms with primarily nocturnal behaviours may also lack the SWS1 opsin; such as the owl monkey and the bushbaby [11].

Perhaps due to the highly conserved nature of its role in vertebrate vision, the SWS1 opsin (hereafter SWS1) occurs as a single copy nuclear gene in almost all animals investigated thus far. However, despite the fundamental importance of this gene for vision at short wavelengths, SWS1 exhibits considerable sequence variation across the diversity of vertebrates that have been investigated. This variation may be a product of SWS1 functional diversity, as measured by absorption sensitivities [1214], which in visual pigments have often been found to be optimized to specific visual environments [5, 15, 16]. To investigate the molecular evolution of SWS1, we conducted phylogenetic analyses of the gene using available vertebrate sequences. SWS1 genes have been cloned from a considerable variety of vertebrates, ranging from the lamprey to mammals. Surprisingly, we found that this single gene appears to reconstruct many of the commonly accepted relationships among vertebrates (Figure 1), for both deeper and more recent divergences. Indeed, SWS1 results were comparable to those obtained from more exhaustive analyses using multi-gene data sets [e.g. [1720]]. Here, we present a comprehensive phylogenetic analysis of vertebrate SWS1 sequences. We then investigate the characteristics of this gene that contribute to its evident success as a phylogenetic marker across a broad taxonomic range.

Figure 1

Summary of vertebrate evolutionary relationships, based on morphological and molecular data [18, 32-39]. Colours indicate taxonomic groups represented in the SWS1 data set.


Phylogenetic analyses

Sixty-two vertebrate SWS1 opsin nucleotide coding sequences were obtained from GenBank, aligned using ClustalX [21], and analyzed using a variety of phylogenetic methods including maximum parsimony [22, 23], maximum likelihood [24, 25], and Bayesian methods [26]. A series of nested likelihood ratio tests were performed using ModelTest [27] in order to determine which nucleotide model of those tested best fit the data. Of the nucleotide models commonly implemented for phylogenetic analysis, the general time-reversible model incorporating parameters for invariant sites, as well as among-site rate heterogeneity (GTR+I+Γ) [2830] was found to fit the data significantly better than any simpler model. This model was therefore used in subsequent likelihood and Bayesian phylogenetic analyses. Assessing confidence in nodes of the phylogeny was accomplished by bootstrap analysis [31] or Bayesian posterior probabilities [26]. The results of the phylogenetic analyses are shown in Figure 2, with the bootstrap values of the maximum parsimony (MP), maximum likelihood (ML) and posterior probabilities of the Bayesian analyses mapped onto the MP tree.

Figure 2

Maximum parsimony phylogeny. Strict consensus of 432 equally most parsimonious trees (length = 3965, CI = 0.35, RI = 0.75) found in a heuristic search with 10,000 replicates. Bootstrap percentages above 50% for MP analyses (1000 replicates), followed by those for ML analyses under the GTR+I+Γ model (100 replicates) are indicated above the nodes. Dashes represent less than 50% bootstrap support. An asterix denotes a posterior probability of ≥0.95 in the Bayesian analysis. Colours correspond to vertebrate groups as indicated in Figure 1.

In all analyses, the reconstructed clades were remarkably similar to currently accepted vertebrate relationships based on morphological and molecular analyses (Figure 1). Among the available sequences, however, there are quite a few groups not represented in our dataset, such as cartilaginous fish, monotremes, turtles, crocodiles and snakes. The lack of adequate sampling is particularly evident in the non-tetrapod vertebrates, with the only sequences available being some of the more recently derived ray-finned fish lineages. However, on the basis of the taxa available, the vertebrate clade is divided into two major groups: actinopterygians (ray-finned fishes), and tetrapods. The latter clade is further divided into modern amphibians (frogs and salamanders), mammals, and reptiles (birds+squamates). This is in surprisingly good agreement with the generally accepted relationships among the major vertebrate lineages according to molecular and morphological data as summarized in Figure 1[18, 3239].

Within the ray-finned fishes, our trees support the basal position of ostariophysans (carp, goldfish, and zebrafish), followed by the neoteleosts, salmonids, and smelt, a situation congruent with morphological [40], mitochondrial [35] and nuclear data [41]. Between the latter three clades, however, the relationships remain debated: in some morphological studies salmonids and smelts form a clade [42], whereas in other investigations salmonids group with the neoteleosts [40]. A monophyly of smelts and neoteleosts has also been proposed [43, 44], and has since received support from molecular analyses [20, 35]. Our analyses are in agreement with the foremost hypothesis, having salmonids and smelts as a monophyletic sister group to neoteleosts.

In our analyses, there is only weak support for the monophyletic grouping of modern amphibians (30% MP bootstrap, 24% ML bootstrap, 0.66 Bayesian posterior probability). The paraphyly of amphibians has been suggested by Carroll [45] using morphological and paleontological data; by contrast, most research, including the remaining paleontological studies and all molecular analyses, disagree with this hypothesis and maintain the monophyly of modern amphibians [46, 47]. It should be mentioned, however, that only four amphibian sequences were available for this study, which might be the reason for the lack of resolution in our results.

Within birds, our results confirm the chicken as basal and sister to the Neoaves (all other birds), with the passerines (Passerida) as monophyletic and most derived, a result congruent with recent molecular studies [37, 48]. Previous studies by Sibley and Ahlquist [48] using DNA-DNA hybridization, which since has been supported by nuclear and mitochondrial sequence data [38, 49], divided Passerida into three major clades: Muscicapoidea (represented in our phylogeny by the bluethroat and Siberian rubythroat), Sylvoidea (tits), and Passeroidea (finch and bishops); patterns of these relationships were identified in all our reconstructed phylogenies.

Among mammals, marsupials are found to be the basal lineage within Mammalia, followed by the monophyletic groupings of rodents (minus the guinea pig), laurasiatherians, afrotherians, and higher primates; these relationships are all well supported by recent research [see [50] for review]. Moreover, the more recent divergences within these mammalian clades also resemble the results found by other investigations. For example, all three phylogenies show apes and Old World monkeys to be monophyletic, forming the catarrhines, with the New World monkeys, or platyrrhines, as a sister group, which together form the higher primates. Although their positions are unresolved in the MP consensus tree, there is some support for the prosimians (the lemur and tarsier) as the most basally positioned primates in the bootstrap (57% ML, 48% MP) and Bayesian analyses (0.9 posterior probability). Our results correspond not only with molecular phylogenies constructed using nuclear [51] as well as mitochondrial [52] datasets, but also with morphological data [53].

Patterns of sequence variation in the SWS1 data set

Given the utility of SWS1 for elucidating vertebrate evolutionary relationships across a range of divergences, we explored patterns of sequence variation in our data set by estimating parameters such as base composition, among-site rate heterogeneity, and informative sites using parsimony and likelihood phylogenetic methods, and then compared them to those published for other molecular phylogenetic data sets.

The aligned SWS1 nucleotide dataset consisted of 1083 characters, of which 686 (63%) were parsimony informative. The proportion of invariant sites in our data set was estimated in two ways: (1) by calculating the observed number of invariant sites in our alignment, and (2) by estimating the number of sites likely to be invariant under a particular model of evolution (Table 1). Within vertebrates, there were a relatively small proportion of sites estimated as invariant (I = 0.17). Within the data partitions corresponding to the major vertebrate groups, ML estimates do not exceed 0.34. SWS1 tended to have similar proportions of invariant sites as other nuclear markers commonly used for phylogenetic purposes, as estimated using ML methods, for example RAG1 (0.34, squamates [54], 0.36, amphibians [55], 0.40, rodents [56]) or RAG2 (0.24, frogs [55]). While it is generally assumed that a lower proportion of invariable sites might be indicative of greater phylogenetic information in the data set, this parameter is often difficult to estimate accurately [57, 58].

Table 1 Substitution frequencies and rate heterogeneity parameters

The parameter (α) describing the shape of the Γ-distribution used to account for among-site rate heterogeneity was estimated for the SWS1 data set using likelihood methods to be approximately 1.0 (Table 1). This suggests a fairly even distribution of different substitution rates across sites. Other nuclear genes widely used in phylogenetic analyses tend to have α estimates of at least 1. For example, RAG1 ranges from 1.0 in amphibians [55] to 1.7 in squamates [54]. When α is equal to 1, substitution rates are exponentially distributed, which is intermediate between the bell-shaped curves at higher values (α > 1) and 'L' shaped functions at lower values (α < 1). This exponential shape suggests that there is a more evenly distributed range of substitution rates across sites than would be the case for higher or lower α values. It has been suggested that higher values of α might be better in aiding phylogenetic resolution [59]. But, a more even range of slow to fast evolving sites may be best for phylogenetic analyses because it incorporates both slow sites to carry a signal from deeper divergences, as well as fast sites for more recent divergences. Past studies, based on both empirical data [60] and simulations [59, 61] have suggested that large amounts of among-site rate variation (as indicated by low values of α) such as those often found in some mitochondrial data sets, might tend to adversely affect phylogenetic signal.

Furthermore, the variability observed in SWS1 does not appear to be concentrated on third base positions only, as is often the case in many genes; relative to other molecular markers a great deal of variation is found at first and second positions as well. First, although about 92% of third codon positions were parsimony informative, the proportion of informative sites present in first and second codon positions was 55% and 41%, respectively. Second, estimates of invariant sites (I) were generally low across all three codon positions (Table 1), though the proportion of invariant sites was lowest at third positions, as expected. Third, relatively high α values were also found for first (0.84) and second codon positions (0.65), though highest at third positions (4.7).

Most protein-coding data sets show higher α values (and lower I values) at third codon positions compared to first and second positions [62]. However, the values of α at first and second codon positions in the SWS1 data set were comparatively high. For example, under a similar model of evolution (GTR+I+Γ), Dettai and Lecointre [63] estimated α values of 0.29 for both codon positions in a portion of the MLL gene in fish, as compared with 0.42 and 0.41 for first and second positions in fish SWS1 genes. This would suggest that variation in substitution rates tends to be more evenly distributed across codon positions in the SWS1 data set, and that more sites in the gene are phylogenetically informative.

Maximum likelihood estimates of substitution rate parameters in the SWS1 data set under the GTR+I+Γ model did not exhibit substantial skew when estimated across vertebrates, with values ranging only from about 1.0 to 3.3 (Table 1). This range of values is smaller than many other data sets, including cyt b in birds (0.5 to 8.8 [37]), RAG1 in squamates (1.0 to 7.6 [64]), and RAG1 in amphibians (1.0 to 5.7 [55]). A skewed rate matrix may decrease the number of states a given site can have, resulting in increased homoplasy and potential loss of phylogenetic information. Although this has yet to be investigated in detail in many data sets, a recent comparison of mitochondrial and nuclear genes in insects [65] found that nuclear genes tended to have more homogeneous patterns of among-site rate variation (i.e., larger α values), as well as more symmetrical transformation rate matrices, and that these patterns appeared to be associated with phylogenetic utility in their data sets.

Furthermore, substitution rate matrices were estimated in different data partitions of the SWS1 data set corresponding to the different vertebrate groups and/or codon position (Table 1). Overall, the rate matrices remain relatively homogeneous and unskewed across different partitions of the data set, although there were certain partitions exhibiting differences in particular rate parameters. For example, there is some variation in substitution rates between C and G in reptiles, as compared with fish and mammals (3.3, 0.1, and 0.73 respectively), and across vertebrates at second codon positions, as compared with first and third positions (7.3, 0.7, and 0.6 respectively). Also, fish generally tend to have higher substitution rates relative to other vertebrate groups at second codon positions (Table 1).

Base composition was estimated using likelihood methods under the GTR+I+Γ model for the complete data set, as well as within partitions of the data corresponding to individual codon positions, and the major vertebrate groups (Table 2). Base compositional biases can be problematic in phylogenetic analyses, particularly if there is convergence in bias across unrelated groups [6668]. The SWS1 data set does not appear to be affected in this manner. Despite a significantly heterogeneous base compositional bias overall (chi-square test of homogeneity p < 0.001, df = 183; Table 2), in the different data partitions, corresponding to the major vertebrate groups and/or codon position, the base frequencies are generally found to be homogeneously distributed throughout the data set, except in tetrapods and in third positions of many vertebrate groups (Table 2). Second positions showed a slightly increased frequency of T, but this was also found to be homogeneous across the data set (Table 2). The high frequency of T at second codon positions is also seen in genes such as rod opsin and cyt b, and presumably reflects a strong functional constraint in genes coding for transmembrane proteins, which contain many hydrophobic amino acids such as Ile (ATY), Phe (TYY) and, Leu (YTR) [69, 70]

Table 2 Base composition and χ2 tests of homogeneity

In summary, molecular patterns in the vertebrate SWS1 data suggest a substantial amount of variation across the three codon positions, as well as high among-site rate variation throughout the gene. As well, the substitution rates tend to be fairly homogeneous among the different classes. Although there is some bias in nucleotide composition across different codon positions, this bias appears to be non-convergent.


Despite the ever increasing number of gene sequences available in the databases, it remains surprisingly difficult to select genes that will be useful for phylogenetic analyses, particularly across a variety of taxonomic ranges. Known issues in molecular phylogenetics such as model mis-specification, gene paralogs and alignment ambiguities often contribute to misleading results [71], and accounting for effects such as among-site rate heterogeneity can substantially alter results. For example, early analyses of mtDNA data tended to place the root of the avian phylogeny within passerines [7275], a result in strong conflict with prior phylogenetic studies [48]. Further analysis of mitochondrial data that accounted for unequal evolutionary rates among sites, however, recovered the traditional division of birds between palaeognathans and neognathans, with passerines being a phylogenetically derived neognath lineage [76]. Similarly, correcting for base compositional bias in a mammalian data set of full mitochondrial genomes increased support for marsupials as the sister group of placentals [34], as opposed to the original analysis, which supported a marsupial-monotreme grouping [77]. Many characteristics have been identified as problematic; however, we know little about exactly what characteristics make a strong molecular marker. For example, in insects, comparisons of nucleotide substitution patterns and phylogenetic utility between nuclear ribosomal and protein coding genes [78] as well as between nuclear and mitochondrial genes [65] identified a number of features as useful for phylogenetic analyses, including larger values of α (parameter for among-site rate heterogeneity), and relatively unskewed substitution rate parameters. However, such studies remain relatively rare in the literature.

Mitochondrial genes have been widely used in molecular systematics due to the relative ease with which mitochondrial genes, or even whole genomes can be amplified and sequenced, as well as the absence of problematic features often associated with nuclear markers such as introns, heterozygosity, and paralogy. Mitochondrial genes can, however, suffer from some marked disadvantages. In most cases mitochondrial genes are thought to evolve much faster than nuclear genes [[79], but see [80]], and they may be subject to significant rate accelerations and decelerations in evolutionary history [81], possibly due to changes in factors such as metabolic rate [82]. Such rate differences can easily lead to positively misleading topological effects [83], and may be particularly problematic in resolving deeper relationships or rapid radiations such as those found within mammals [84]. Furthermore, mitochondrial data sets can often be biased in terms of base composition, which has been found to contribute to misleading signal in a number of data sets including basal vertebrates [66, 85], birds [76] and mammals [34, 86, 87]. However, their faster rate of evolution in comparison to nuclear genes can be useful for resolving more recent relationships [65].

Recent years have shown a dramatic increase in the number of nuclear genes that have been developed for vertebrate phylogenetics in order to complement and expand the set of useful phylogenetic markers [8890]. Some nuclear genes have been used with success in multiple vertebrate groups ranging from fish to mammals, for example RAG1 and 2 [9193], c-mos [37, 64], c-myc [38, 56], MLL [41, 63], and 18S [94, 95]. A host of other nuclear genes have been used primarily in particular vertebrate groups, such as rod opsin in ray-finned fish [63, 69], and more recently amphibians [55, 96]; β-fibrinogen introns in birds [97], and a variety of genes aimed at resolving higher level mammalian phylogenetics: IRBP, α-2B adrenergic receptor, aquaporin, β-basein, γ-fibrinogen, κ-casein, protamine, and von Willebrand Factor [84].

Despite the success of nuclear markers in resolving some of the longstanding problems in vertebrate phylogenetics [32, 50], several issues continue to plague many molecular data sets. First, alignment issues, though long recognized as problematic, often tend to be overlooked in many data sets. Some sequences, particularly ribosomal genes such as 18S, are known to be difficult to align properly [98], and these alignment ambiguities can significantly affect phylogeny reconstruction [95, 99]. However, these problems can easily be avoided by careful selection of molecular markers that are unambiguously alignable. Second, and even more importantly, some of the important assumptions currently incorporated into commonly used phylogenetic methods may not be reasonable for many molecular data sets. For instance, most molecular models of evolution assume that state frequencies, and even more importantly, substitution rate frequencies do not change over evolutionary time, assumptions incorporated in likelihood/Bayesian methods which tend to model molecular evolution as stationary, homogeneous Markov processes [100].

Even though nuclear data sets tend to suffer fewer problems than mitochondrial genes with respect to base compositional changes across a phylogeny, there are examples of nuclear data sets for which nonstationarity can yield positively misleading results, if there is convergence in base compositional bias [101, 102], or worse yet, codon bias [66, 103] among lineages. The use of phylogenetic methods that have been developed to take into account nonstationarity in base frequencies using either distance [67, 104] or likelihood approaches [105] can at least somewhat ameliorate these effects, though not for codon bias issues [66, 103].

More recently, the use of genome-based approaches has enabled more extensive investigations of sources of systematic bias, or inconsistency in phylogenetic analyses [102, 106108] and identified new effects difficult to detect in smaller data sets, such as site-specific changes in evolutionary rates among lineages, or heterotachy [109111]. However, these issues are only just being addressed, and the robustness of current phylogenetic models to such violations explored [112116]. Similarly, the issue of changes in substitution rate frequencies across a phylogeny, or nonhomogeneity, has received relatively little attention, though it has been recognized as a potential problem [67, 104, 117, 118]. Accounting for such effects in more complex models of evolution may be useful for genomic scale analyses, but it is not clear how much power such parameter-rich models may have for relatively small data sets. Selecting genes less susceptible to these effects would tend to obviate the necessity of implementing more complex models, and therefore tend to increase the statistical power of likelihood and Bayesian phylogenetic methods.

With regard to some of the issues plaguing many molecular data sets, SWS1 visual pigment genes offer several clear advantages. First, this gene has very few indels in its evolution across vertebrates, making for a largely unambiguous alignment, and it is a single copy nuclear gene with no paralogs of high sequence similarity. Second, for the vertebrate SWS1 data set, base frequencies were found to be fairly constant across the phylogeny. There was little evidence of base compositional heterogeneity; aside from third codon positions in fish, reptiles, and tetrapods. Third, the SWS1 data set exhibits a relatively unskewed distribution of substitution rate frequencies among the different types of substitutions, and a substantial amount of among site rate variation, both of which are characteristics that previous studies suggest might be important for phylogenetic utility [59, 65, 78]. Fourth, the substitution rate frequencies are not only unskewed, they are also relatively constant across the phylogeny, indicative of a homogeneous substitution process, which may be important in not attenuating phylogenetic signal across fairly large divergences.

Why does the SWS1 visual pigment gene exhibit useful phylogenetic characteristics across such a wide range of divergences in vertebrates? Factors important in contributing to its utility as a phylogenetic marker may be due, at least in part, to the highly conserved, yet somewhat variable nature of its functional role in visual transduction. The fundamental role of SWS1 genes in mediating visual sensitivities at the shortest wavelengths of the spectrum is highly conserved throughout vertebrates, along with its expression in a specific photoreceptor cell in the retina characterized by extremely short outer segments [119121]. Unlike many other genes used for systematic purposes, which are often housekeeping genes which tend to be expressed ubiquitously in many different cell types, or developmental genes that may be expressed in a different tissues mediating a variety of functions, SWS1 genes are generally only expressed in a particular photoreceptor cell type, though they have been found in rare instances to be co-expressed in other types of photoreceptors with longer wavelength-sensitive opsin genes, for example in the mouse [122], guinea pig [7], and tiger salamander [123]. Its overall tertiary 3D structure has remained unchanged, most likely due to constraints imposed by its role as an integral membrane protein, and the mechanisms of activation which require specific structural rotations of the helices which are thought to be conserved in many G-protein coupled receptors [124]. This conserved role in evolution may be important for maintaining homogeneous molecular evolutionary processes such as substitution rate frequencies across vertebrate evolution.

Along with its highly conserved role as the first step in visual transduction, vertebrate SWS1 visual pigments are well-understood examples of functional variation in spectral sensitivity: SWS1 pigments can range in maximal absorption from the ultraviolet to violet [see [12, 125] for reviews]. However, these shifts in function are thought to be mediated via a few specific residues in the protein [see [12]]. Despite the obvious constraints on protein structure, and adaptive changes in function such as spectral sensitivities, these factors may have only limited influence in restricting protein sequence evolution, as SWS1 clearly shows a substantial amount of evolutionary variation capable of carrying phylogenetic information. In fact, the constraints imposed by SWS1 structure and function may provide a framework in which the protein can vary in a more homogeneous fashion that allows for the retention of a strong phylogenetic signal.


The various phylogenetic methods used to analyse SWS1 produced strongly supported topologies showing remarkable congruence with most traditionally accepted hypotheses of vertebrate evolution from the consensus of morphological and molecular studies. This nuclear, protein coding gene recovers not only deep relationships, usually requiring combinations of genes, but also recent relationships that typically require markers with high rates of evolution, such as mitochondrial DNA. The few exceptions include the monophyly of the primates, the relationships among the major groups of placental mammals, and the position of the guinea pig, which remain inconclusive in many data sets.

The phylogenetic utility of the SWS1 gene may result from a number of features of the SWS1 gene found to be important in previous studies, including substantial among site rate variation. Its ability to carry a phylogenetic signal across a broad range of divergences in vertebrates may also be due to a number of features, such as fairly homogeneous substitution rate matrix parameters, which are potentially important but largely unexplored for other phylogenetic markers. Future studies will explore these characteristics in data sets of other nuclear markers, in order to assess how well they correlate with phylogenetic utility.


Sequence alignment

Sixty two vertebrate SWS1 opsin nucleotide and amino acid sequences were retrieved from GenBank, with accession numbers for all sequences used in the analyses presented here provided in Table 3. SWS1 coding sequences range in length from 1005 (salmonids) to 1056 (pig) nucleotides, with very few indels (only 6 indels in complete coding sequences in the entire alignment; see Table 3 and Additional file 1). All SWS1 opsin genes identified so far have four introns at highly conserved homologous positions (located at amino acid positions 120, 176, 231, and 311 in the macaque sequence [126]). The first two introns are generally short, ranging in length from 70–76 bp in fish (Dimidiochromis compressiceps), to 283–324 bp in mammals (Macaca fascicularis); whereas the second two introns tend to be longer (120–143 bp in D. compressiceps, 627–979 bp in M. fascicularis) [126, 127]. Only one copy of SWS1 has been found in all taxa investigated so far, with the exception of the smelt (Plecoglossus altivelis), which may be due to a unique duplication specific to this lineage of fish [128]. Only one smelt sequence was included in our analyses, as investigations including the second sequence showed it to be strongly monophyletic with the first, and had no other effect on the phylogeny (results not shown).

Table 3 Accession numbers and species identification for taxa in SWS1 data set

Sampling within the vertebrate groups was as follows: one lamprey (Geotria australis), 17 actinopterygians (all of which were teleosts); four lissamphibians (referred to in the text as amphibians); 13 birds; three squamates; and 23 mammals (Table 3). The amino acid sequences were aligned using ClustalX [21], Additional file 1). This amino acid alignment was then used to produce an equivalently aligned nucleotide sequence alignment.

Phylogenetic analyses

Phylogenetic analyses were performed using PAUP*v4b10, [129] for the maximum parsimony (MP) and likelihood (ML) methods, and MrBayes version 3.1 [130] for the Bayesian analyses. For the MP analysis all characters were assigned equal weight. Heuristic searches, with random addition of taxa and TBR branch swapping, were performed with 10000 random-addition sequences. A strict consensus tree was calculated from the equally most parsimonious trees found. To assess support for internal branches, bootstrap analyses [31] of 1000 replicates with 10 random-addition sequences for each replicate, were performed.

ModelTest [27] was used to perform a series of nested likelihood ratio tests in order to determine which nucleotide model of those tested best fit the data. This model was then used in subsequent model-based phylogenetic analyses such as likelihood and Bayesian analyses. Heuristic ML analyses were conducted with TBR branch swapping (10 random addition replicates), as well as bootstrap analyses with 100 replicates in order to assess the robustness of the clades recovered [31]. The Bayesian analyses were run for two million generations with default priors, sampling the chains every 100 generations. To ensure that our analyses were not trapped in local optima, four independent Markov Chain Monte Carlo (MCMC) runs were performed (with default heating values). Stationarity was assumed when the cumulative posterior probabilities of all clades stabilized. The first 5000 trees were considered 'burn-in' and discarded, and the remaining trees were saved. The associated Bayesian posterior probabilities were calculated from the sample points after the MCMC algorithm started to converge.

Nucleotide composition and substitution frequencies

Parameters such as base frequencies, substitution rate frequencies, among site rate variation (α), and invariant sites (I) were all estimated on the ML phylogeny using maximum likelihood methods under the GTR+I+Γ model [2830] as implemented in PAUP*. Chi-squared tests of base compositional homogeneity were also implemented in PAUP* [129]. Since estimates of invariant sites (I) can be problematic, particularly in reduced data partitions due to insufficient data [58], the number of invariant sites was therefore also calculated by simple counts of the observed number of constant sites in our data set, as implemented in MEGA3 [131].



short-wavelength sensitive 1 opsin gene


short-wavelength sensitive 2 opsin gene


rod opsin gene


rod opsin-like gene


G-protein coupled receptor


maximum parsimony


maximum likelihood.


  1. 1.

    Menon ST, Han M, Sakmar TP: Rhodopsin: Structural basis of molecular physiology. Physiological Reviews. 2001, 81 (4): 1659-1688.

    CAS  PubMed  Google Scholar 

  2. 2.

    Filipek S, Teller DC, Palczewski K, Stenkamp R: The crystallographic model of rhodopsin and its use in studies of other G protein-coupled receptors. Annu Rev Biophys Biomol Struct. 2003, 32: 375-397. 10.1146/annurev.biophys.32.110601.142520.

    PubMed Central  CAS  PubMed  Google Scholar 

  3. 3.

    Okano T, Kojima D, Fukada Y, Shichida Y, Yoshizawa T: Primary structures of chicken cone visual pigments: vertebrate rhodopsins have evolved out of cone visual pigments. Proc Natl Acad Sci USA. 1992, 89: 5932-5936. 10.1073/pnas.89.13.5932.

    PubMed Central  CAS  PubMed  Google Scholar 

  4. 4.

    Chang BSW, Crandall KS, Carulli JP, Hartl DL: Opsin phylogeny and evolution: a model for blue shifts in wavelength regulation. Mol Phylogenet Evol. 1995, 4 (1): 31-43. 10.1006/mpev.1995.1004.

    CAS  PubMed  Google Scholar 

  5. 5.

    Yokoyama S, Yokoyama R: Adaptive evolution of photoreceptors and visual pigments in vertebrates. Annu Rev Ecol Syst. 1996, 27: 543-567. 10.1146/annurev.ecolsys.27.1.543.

    Google Scholar 

  6. 6.

    Walls GL: The vertebrate eye and its adaptive radiation. 1942, Bloomfield Hills, Mich. , Cranbrook Institute of Science, 785 p-

    Google Scholar 

  7. 7.

    Lukats A, Dkhissi-Benyahya O, Szepessy Z, Rohlich P, Vigh B, Bennett NC, Cooper HM, Szel A: Visual pigment coexpression in all cones of two rodents, the Siberian hamster, and the pouched mouse. Invest Ophthalmol Vis Sci. 2002, 43 (7): 2468-2473.

    PubMed  Google Scholar 

  8. 8.

    David-Gray ZK, Bellingham J, Munoz M, Avivi A, Nevo E, Foster RG: Adaptive loss of ultraviolet-sensitive/violet-sensitive (UVS/VS) cone opsin in the blind mole rat (Spalax ehrenbergi). Eur J Neurosci. 2002, 16 (7): 1186-1194. 10.1046/j.1460-9568.2002.02161.x.

    CAS  PubMed  Google Scholar 

  9. 9.

    Fasick JI, Cronin TW, Hunt DM, Robinson PR: The visual pigments of the bottlenose dolphin (Tursiops truncatus). Vis Neurosci. 1998, 15 (4): 643-651. 10.1017/S0952523898154056.

    CAS  PubMed  Google Scholar 

  10. 10.

    Levenson DH, Dizon A: Genetic evidence for the ancestral loss of short-wavelength-sensitive cone pigments in mysticete and odontocete cetaceans. Proc Natl Acad Sci U S A. 2002, 270: 673-679.

    Google Scholar 

  11. 11.

    Jacobs GH, Neitz M, Neitz J: Mutations in S-cone pigment genes and the absence of colour vision in two species of nocturnal primate. Proc Biol Sci. 1996, 263 (1371): 705-710.

    CAS  PubMed  Google Scholar 

  12. 12.

    Hunt DM, Cowing JA, Wilkie SE, Parry JW, Poopalasundaram S, Bowmaker JK: Divergent mechanisms for the tuning of shortwave sensitive visual pigments in vertebrates. Photochem Photobiol Sci. 2004, 3 (8): 713-720. 10.1039/b314693f.

    CAS  PubMed  Google Scholar 

  13. 13.

    Yokoyama S: Molecular evolution of vertebrate visual pigments. Prog Retin Eye Res. 2000, 19 (4): 385-419. 10.1016/S1350-9462(00)00002-1.

    CAS  PubMed  Google Scholar 

  14. 14.

    Vorobyev M, Osorio D, Bennett AT, Marshall NJ, Cuthill IC: Tetrachromacy, oil droplets and bird plumage colours. J Comp Physiol [A]. 1998, 183 (5): 621-633. 10.1007/s003590050286.

    CAS  Google Scholar 

  15. 15.

    Spady TC, Seehausen O, Loew ER, Jordan RC, Kocher TD, Carleton KL: Adaptive molecular evolution in the opsin genes of rapidly speciating cichlid species. Mol Biol Evol. 2005, 22 (6): 1412-1422. 10.1093/molbev/msi137.

    CAS  PubMed  Google Scholar 

  16. 16.

    Bowmaker JK, Govardovskii VI, Shukolyukov SA, Zueva LV, Hunt DM, Sideleva VG, Smirnova OG: Visual pigments and the photic environment: the cottoid fish of Lake Baikal. Vision Res. 1994, 34 (5): 591-605. 10.1016/0042-6989(94)90015-9.

    CAS  PubMed  Google Scholar 

  17. 17.

    Hedges SB: Molecular evidence for the origin of birds. Proc Natl Acad Sci U S A. 1994, 91 (7): 2621-2624. 10.1073/pnas.91.7.2621.

    PubMed Central  CAS  PubMed  Google Scholar 

  18. 18.

    Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW, Springer MS: Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001, 294 (5550): 2348-2351. 10.1126/science.1067179.

    CAS  PubMed  Google Scholar 

  19. 19.

    Waddell PJ, Shelley S: Evaluating placental inter-ordinal phylogenies with novel sequences including RAG1, gamma-fibrinogen, ND6, and mt-tRNA, plus MCMC-driven nucleotide, amino acid, and codon models. Mol Phylogenet Evol. 2003, 28 (2): 197-224. 10.1016/S1055-7903(03)00115-5.

    CAS  PubMed  Google Scholar 

  20. 20.

    Ishiguro NB, Miya M, Nishida M: Basal euteleostean relationships: a mitogenomic perspective on the phylogenetic reality of the "Protacanthopterygii". Mol Phylogenet Evol. 2003, 27 (3): 476-488. 10.1016/S1055-7903(02)00418-9.

    CAS  PubMed  Google Scholar 

  21. 21.

    Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997, 25 (24): 4876-4882. 10.1093/nar/25.24.4876.

    PubMed Central  CAS  PubMed  Google Scholar 

  22. 22.

    Lake JA: A Rate-Independent Technique for Analysis of Nucleic-Acid Sequences - Evolutionary Parsimony. Mol Biol Evol. 1987, 4 (2): 167-191.

    CAS  PubMed  Google Scholar 

  23. 23.

    Fitch WM: Toward Defining the Course of Evolution - Minimum Change for a Specific Tree Topology. Syst Zool. 1971, 20 (4): 406-&. 10.2307/2412116.

    Google Scholar 

  24. 24.

    Felsenstein J: Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981, 17 (6): 368-376. 10.1007/BF01734359.

    CAS  PubMed  Google Scholar 

  25. 25.

    Kishino H, Miyata T, Hasegawa M: Maximum-Likelihood Inference of Protein Phylogeny and the Origin of Chloroplasts. J Mol Evol. 1990, 31 (2): 151-160. 10.1007/BF02109483.

    CAS  Google Scholar 

  26. 26.

    Yang Z, Rannala B: Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. Mol Biol Evol. 1997, 14 (7): 717-724.

    CAS  PubMed  Google Scholar 

  27. 27.

    Posada D, Crandall KA: MODELTEST: testing the model of DNA substitution. Bioinformatics. 1998, 14 (9): 817-818. 10.1093/bioinformatics/14.9.817.

    CAS  PubMed  Google Scholar 

  28. 28.

    Yang Z: Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol. 1994, 39 (3): 306-314. 10.1007/BF00160154.

    CAS  PubMed  Google Scholar 

  29. 29.

    Yang Z: Estimating the pattern of nucleotide substitution. J Mol Evol. 1994, 39 (1): 105-111. 10.1007/BF00178256.

    PubMed  Google Scholar 

  30. 30.

    Rodriguez F, Oliver JL, Marin A, Medina JR: The general stochastic model of nucleotide substitution. J Theor Biol. 1990, 142 (4): 485-501.

    CAS  PubMed  Google Scholar 

  31. 31.

    Felsenstein J: Confidence-Limits on Phylogenies - an Approach Using the Bootstrap. Evolution. 1985, 39 (4): 783-791. 10.2307/2408678.

    Google Scholar 

  32. 32.

    Meyer A, Zardoya R: Recent advances in the (molecular) phylogeny of vertebrates. Annu Rev Ecol Evol S. 2003, 34: 311-338. 10.1146/annurev.ecolsys.34.011802.132351.

    Google Scholar 

  33. 33.

    Cloutier R, Arratia G: Early diversification of actinopterygians. Recent advances in the Origin and Early Radiation of Vertebrates. Edited by: Arratia G, Wilson MVH, Cloutier R. 2004, Munich, Germany , Verlag Dr. Friedrich Pfeil, 217-270.

    Google Scholar 

  34. 34.

    Phillips MJ, Penny D: The root of the mammalian tree inferred from whole mitochondrial genomes. Mol Phylogenet Evol. 2003, 28 (2): 171-185. 10.1016/S1055-7903(03)00057-5.

    CAS  PubMed  Google Scholar 

  35. 35.

    Inoue JG, Miya M, Tsukamoto K, Nishida M: Basal actinopterygian relationships: a mitogenomic perspective on the phylogeny of the "ancient fish". Mol Phylogenet Evol. 2003, 26 (1): 110-120. 10.1016/S1055-7903(02)00331-7.

    CAS  PubMed  Google Scholar 

  36. 36.

    Takezaki N, Figueroa F, Zaleska-Rutczynska Z, Takahata N, Klein J: The phylogenetic relationship of tetrapod, coelacanth, and lungfish revealed by the sequences of forty-four nuclear genes. Mol Biol Evol. 2004, 21 (8): 1512-1524. 10.1093/molbev/msh150.

    CAS  PubMed  Google Scholar 

  37. 37.

    Garcia-Moreno J, Sorenson MD, Mindell DP: Congruent Avian Phylogenies Inferred from Mitochondrial and Nuclear DNA Sequences. J Mol Evol. 2002, 57 (1): 27-37. 10.1007/s00239-002-2443-9.

    Google Scholar 

  38. 38.

    Ericson GP, Johansson US: Phylogeny of Passerida (Aves: Passeriformes) based on nuclear and mitochondrial sequence data. Mol Phylogenet Evol. 2003, 29: 126-138. 10.1016/S1055-7903(03)00067-8.

    CAS  PubMed  Google Scholar 

  39. 39.

    Müller J, Reisz RR: Four well-constrained calibration points from the vertebrate fossil record for molecular clock estimates. Bioessays. 2005, 27 (10): 1069-1075. 10.1002/bies.20286.

    PubMed  Google Scholar 

  40. 40.

    Lauder GV, Liem KF: The evolution and interrelationships of the actinopterygian fishes. Bull Mus Comp Zool Harvard. 1983, 150: 95-197.

    Google Scholar 

  41. 41.

    Zaragueta-Bagils R, Lavoue S, Tillier A, Bonillo C, Lecointre G: Assessment of otocephalan and protacanthopterygian concepts in the light of multiple molecular phylogenies. Cr Biol. 2002, 325 (12): 1191-1207.

    Google Scholar 

  42. 42.

    Johnson GD, Patterson C: Relationships of lower euteleostean fishes. Interrelationships of Fishes. Edited by: Stiassny MLJ, Parenti LR, Johnson GD. 1996, San Diego , Academic Press, 251-332.

    Google Scholar 

  43. 43.

    Begle DP: Relationships of the osmeroid fishes and the use of reductive characters in phylogenetic analysis. Syst Zool. 1991, 40: 33-53. 10.2307/2992220.

    Google Scholar 

  44. 44.

    Begle DP: Monophyly and relationships of the argentinoid fishes. Copeia. 1992, 1992: 350-366. 10.2307/1446196.

    Google Scholar 

  45. 45.

    Carroll RL: Problems on the phylogenetic analysis of paleozoic choanates. Bull Mus Natl Hist Nat Paris 4eme ser. 1995, 17: 389-445.

    Google Scholar 

  46. 46.

    Zhang P, Zhou H, Chen YQ, Liu YF, Qu LH: Mitogenomic perspectives on the origin and phylogeny of living amphibians. Syst Biol. 2005, 54 (3): 391-400. 10.1080/10635150590945278.

    PubMed  Google Scholar 

  47. 47.

    Schoch RR, Milner AM: Structure and implications of theories on the origin of lissamphibians. Recent advances in the origin and early radiation of vertebrates. Edited by: Arratia G, Wilson MVH, Cloutier R. 2004, Munich , Verlag Dr. Friedrich Pfeil, 345-377.

    Google Scholar 

  48. 48.

    Sibley CG, Ahlquist JE: Phylogeny and Classification of Birds: a study in molecular evolution. 1990, New Haven, CT , Yale University Press

    Google Scholar 

  49. 49.

    Alstrom P, Ericson PGP, Olsson U, Sundberg P: Phylogeny and classification of the avian superfamily Sylvioidea. Mol Phylogenet Evol. 2006, 38 (2): 381-397. 10.1016/j.ympev.2005.05.015.

    PubMed  Google Scholar 

  50. 50.

    Springer MS, Stanhope MJ, Madsen O, de Jong WW: Molecules consolidate the placental mammal tree. Trends Ecol Evol. 2004, 19 (8): 430-438. 10.1016/j.tree.2004.05.006.

    PubMed  Google Scholar 

  51. 51.

    Goodman M, Porter CA, Czelusniak J, Page SL, Schneider H, Shoshani J, Gunnell G, Groves CP: Toward a phylogenetic classification of Primates based on DNA evidence complemented by fossil evidence. Mol Phylogenet Evol. 1998, 9 (3): 585-598. 10.1006/mpev.1998.0495.

    CAS  PubMed  Google Scholar 

  52. 52.

    Raaum RL, Sterner KN, Noviello CM, Stewart CB, Disotell TR: Catarrhine primate divergence dates estimated from complete mitochondrial genomes: concordance with fossil and nuclear DNA evidence. J Hum Evol. 2005, 48 (3): 237-257. 10.1016/j.jhevol.2004.11.007.

    PubMed  Google Scholar 

  53. 53.

    Benton MJ: Vertebrate palaeontology. 2005, Malden, MA , Blackwell Science, xi, 455 p-3rd

    Google Scholar 

  54. 54.

    Townsend TM, Larson A, Louis E, Macey JR: Molecular phylogenetics of Squamata: The position of snakes, Amphisbaenians, and Dibamids, and the root of the Squamate tree. Syst Biol. 2004, 53 (5): 735-757. 10.1080/10635150490522340.

    PubMed  Google Scholar 

  55. 55.

    van der Meijden A, Vences M, Hoegg S, Meyer A: A previously unrecognized radiation of ranid frogs in Southern Africa revealed by nuclear and mitochondrial DNA sequences. Mol Phylogenet Evol. 2005, 37 (3): 674-685. 10.1016/j.ympev.2005.05.001.

    CAS  PubMed  Google Scholar 

  56. 56.

    Steppan S, Adkins R, Anderson J: Phylogeny and divergence-date estimates of rapid radiations in muroid rodents based on multiple nuclear genes. Syst Biol. 2004, 53 (4): 533-553. 10.1080/10635150490468701.

    PubMed  Google Scholar 

  57. 57.

    Steel M, Penny D: Parsimony, likelihood, and the role of models in molecular phylogenetics. Mol Biol Evol. 2000, 17 (6): 839-850.

    CAS  PubMed  Google Scholar 

  58. 58.

    Sullivan J, Swofford DL, Naylor GJP: The effect of taxon sampling on estimating rate heterogeneity parameters of maximum-likelihood models. Mol Biol Evol. 1999, 16 (10): 1347-1356.

    CAS  Google Scholar 

  59. 59.

    Yang ZH: On the best evolutionary rate for phylogenetic analysis. Syst Biol. 1998, 47 (1): 125-133. 10.1080/106351598261067.

    CAS  PubMed  Google Scholar 

  60. 60.

    Sullivan J, Holsinger KE, Simon C: Among-site rate variation and phylogenetic analysis of 12S rRNA in sigmodontine rodents. Mol Biol Evol. 1995, 12 (6): 988-1001.

    CAS  PubMed  Google Scholar 

  61. 61.

    Bininda-Emonds ORP, Brady SG, Kim J, Sanderson MJ: Scaling of accuracy in extremely large phylogenetic trees. Pac Symp Biocomput. 2001, 6: 547-558.

    Google Scholar 

  62. 62.

    Page RDM, Holmes EC: Molecular evolution : a phylogenetic approach. 1998, Malden, MA , Blackwell Science, v, 346 p-

    Google Scholar 

  63. 63.

    Dettai A, Lecointre G: Further support for the clades obtained by multiple molecular phylogenies in the acanthomorph bush. Cr Biol. 2005, 328 (7): 674-689.

    CAS  Google Scholar 

  64. 64.

    Vidal N, Hedges SB: The phylogeny of squamate reptiles (lizards, snakes, and amphisbaenians) inferred from nine nuclear protein-coding genes. Cr Biol. 2005, 328 (10-11): 1000-1008.

    CAS  Google Scholar 

  65. 65.

    Lin CP, Danforth BN: How do insect nuclear and mitochondrial gene substitution patterns differ? Insights from Bayesian analyses of combined datasets. Mol Phylogenet Evol. 2004, 30 (3): 686-702. 10.1016/S1055-7903(03)00241-0.

    CAS  PubMed  Google Scholar 

  66. 66.

    Chang BS, Campbell DL: Bias in phylogenetic reconstruction of vertebrate rhodopsin sequences. Mol Biol Evol. 2000, 17 (8): 1220-1231.

    CAS  PubMed  Google Scholar 

  67. 67.

    Steel MA, Lockhart PJ, Penny D: Confidence in evolutionary trees from biological sequence data. Nature. 1993, 364: 440-442. 10.1038/364440a0.

    CAS  PubMed  Google Scholar 

  68. 68.

    Mooers AO, Holmes EC: The evolution of base composition and phylogenetic inference. Trends Ecol Evol. 2000, 15 (9): 365-369. 10.1016/S0169-5347(00)01934-0.

    PubMed  Google Scholar 

  69. 69.

    Chen WJ, Bonillo C, Lecointre G: Repeatability of clades as a criterion of reliability: a case study for molecular phylogeny of Acanthomorpha (Teleostei) with larger number of taxa. Mol Phylogenet Evol. 2003, 26 (2): 262-288. 10.1016/S1055-7903(02)00371-8.

    CAS  PubMed  Google Scholar 

  70. 70.

    Naylor GJ, Collins TM, Brown WM: Hydrophobicity and phylogeny. Nature. 1995, 373 (6515): 565-566. 10.1038/373565b0.

    CAS  PubMed  Google Scholar 

  71. 71.

    Sanderson MJ, Shaffer HB: Troubleshooting molecular phylogenetic analyses. Annu Rev Ecol Syst. 2002, 33: 49-72. 10.1146/annurev.ecolsys.33.010802.150509.

    Google Scholar 

  72. 72.

    Haring E, Kruckehauser L, Gamauf A, Reising MJ, Pinsker W: The Complete Sequence of the Mitochondrial Genome of Buteo buteo (Aves, Accipitridae) Indicates an Early Split in the Phylogeny of Raptors. Mol Biol Evol. 2001, 18 (10): 1892-1904.

    CAS  PubMed  Google Scholar 

  73. 73.

    Harlid A, Arnason U: Analyses of mitochondrial DNA nest ratite birds within the Neognathae: supporting a neotenous origin of ratite morphological characters. Proc Biol Sci. 1999, 266 (1416): 305-10.1098/rspb.1999.0638.

    Google Scholar 

  74. 74.

    Harlid A, Janke A, Arnason U: The complete mitochondrial genome of Rhea americana and early avian divergences. J Mol Evol. 1998, 46 (6): 669-679. 10.1007/PL00006347.

    CAS  PubMed  Google Scholar 

  75. 75.

    Mindell DP, Sorenson MD, Dimcheff DE, Hasegawa M, Ast JCU: Interordinal relationship of birds and other reptiles based on whole mitochondrial genomes. Syst Biol. 1999, 48: 138-152. 10.1080/106351599260490.

    CAS  PubMed  Google Scholar 

  76. 76.

    Paton T, Haddrath O, J. BA: Complete mitochondrial DNA genome sequences show that modern birds are not descended from transitional shorebirds. Proc Biol Sci. 2002, 269: 839-826. 10.1098/rspb.2002.1961.

    PubMed Central  CAS  PubMed  Google Scholar 

  77. 77.

    Janke A, Magnell O, Wieczorek G, Westerman M, Arnason U: Phylogenetic analysis of 18S rRNA and the mitochondrial genomes of the wombat, Vombatus ursinus, and the spiny anteater, Tachyglossus aculeatus: increased support for the Marsupionta hypothesis. J Mol Evol. 2002, 54 (1): 71-80. 10.1007/s00239-001-0019-8.

    CAS  PubMed  Google Scholar 

  78. 78.

    Danforth BN, Lin CP, Fang J: How do insect nuclear ribosomal genes compare to protein-coding genes in phylogenetic utility and nucleotide substitution patterns?. Systematic Entomology. 2005, 30 (4): 549-562. 10.1111/j.1365-3113.2005.00305.x.

    Google Scholar 

  79. 79.

    Brown WM, George M, Wilson AC: Rapid Evolution of Animal Mitochondrial-DNA. P Natl Acad Sci USA. 1979, 76 (4): 1967-1971. 10.1073/pnas.76.4.1967.

    CAS  Google Scholar 

  80. 80.

    Monteiro A, Pierce NE: Phylogeny of Bicyclus (Lepidoptera: Nymphalidae) inferred from COI, COII, and EF-1alpha gene sequences. Mol Phylogenet Evol. 2001, 18 (2): 264-281. 10.1006/mpev.2000.0872.

    CAS  PubMed  Google Scholar 

  81. 81.

    Vawter L, Brown WM: Nuclear and Mitochondrial-DNA Comparisons Reveal Extreme Rate Variation in the Molecular Clock. Science. 1986, 234 (4773): 194-196. 10.1126/science.3018931.

    CAS  PubMed  Google Scholar 

  82. 82.

    Martin AP, Palumbi SR: Body size, metabolic rate, generation time, and the molecular clock. Proc Natl Acad Sci U S A. 1993, 90 (9): 4087-4091. 10.1073/pnas.90.9.4087.

    PubMed Central  CAS  PubMed  Google Scholar 

  83. 83.

    Felsenstein J: Cases in which parsimony and compatibility will be positively misleading. Syst Zool. 1978, 27: 401-410. 10.2307/2412923.

    Google Scholar 

  84. 84.

    Springer MS, DeBry RW, Douady C, Amrine HM, Madsen O, de Jong WW, Stanhope MJ: Mitochondrial versus nuclear gene sequences in deep-level mammalian phylogeny reconstruction. Mol Biol Evol. 2001, 18 (2): 132-143.

    CAS  PubMed  Google Scholar 

  85. 85.

    Naylor GJP, Brown WM: Amphioxus mitochondrial DNA, chordate phylogeny, and the limits of inference based on comparisons of sequences. Syst Biol. 1998, 47: 61-76. 10.1080/106351598261030.

    CAS  PubMed  Google Scholar 

  86. 86.

    Schmitz J, Ohme M, Suryobroto B, Zischler H: The colugo (Cynocephalus variegatus, Dermoptera): The primates' gliding sister?. Mol Biol Evol. 2002, 19 (12): 2308-2312.

    CAS  PubMed  Google Scholar 

  87. 87.

    Lin YH, McLenachan PA, Gore AR, Phillips MJ, Ota R, Hendy MD, Penny D: Four new mitochondrial genomes and the increased stability of evolutionary trees of mammals from improved taxon sampling. Mol Biol Evol. 2002, 19 (12): 2060-2070.

    CAS  PubMed  Google Scholar 

  88. 88.

    Zardoya R, Meyer A: Vertebrate phylogeny: limits of inference of mitochondrial genome and nuclear rDNA sequence data due to an adverse phylogenetic signal/noise ratio. Major events in early vertebrate evolution. Edited by: Ahlberg PE. 2001, London , Taylor and Francis, 135–156-

    Google Scholar 

  89. 89.

    Cotton JA, Page RDM: Going nuclear: gene family evolution and vertebrate phylogeny reconciled. Proc Biol Sci. 2002, 269: 1555–1561-10.1098/rspb.2002.2074.

    PubMed Central  PubMed  Google Scholar 

  90. 90.

    Takezaki N, Gojobori T: Correct and incorrect vertebrate phylogenies obtained by the entire mitochondrial DNA sequences. Mol Biol Evol. 1999, 16 (5): 590-601.

    CAS  PubMed  Google Scholar 

  91. 91.

    Groth JG, Barrowclough GF: Basal Divergences in Birds and the Phylogenetic Utility of the Nuclear RAG-1 Gene. Mol Phylogenet Evol. 1999, 12 (2): 115-123. 10.1006/mpev.1998.0603.

    CAS  PubMed  Google Scholar 

  92. 92.

    Lovejoy NR, Collette BB: Phylogenetic relationships of New World needlefishes (Teleostei: Belonidae) and the biogeography of transitions between marine and freshwater habitats. Copeia. 2001, 2001: 324-338. 10.1643/0045-8511(2001)001[0324:PRONWN]2.0.CO;2.

    Google Scholar 

  93. 93.

    Brinkmann H, Venkatesh B, Brenner S, Meyer A: Nuclear protein-coding genes support lungfish and not the coelacanth as the closest living relatives of land vertebrates. P Natl Acad Sci USA. 2004, 101 (14): 4900-4905. 10.1073/pnas.0400609101.

    CAS  Google Scholar 

  94. 94.

    Hedges SB, Moberg KD, Maxson LR: Tetrapod phylogeny inferred from 18S and 28S ribosomal RNA sequences and a review of the evidence for amniote relationships. Mol Biol Evol. 1990, 7 (6): 607-633.

    CAS  PubMed  Google Scholar 

  95. 95.

    Xia X, Xie Z, Kjer KM: 18S ribosomal RNA and tetrapod phylogeny. Syst Biol. 2003, 52 (3): 283-295. 10.1080/10635150390196948.

    PubMed  Google Scholar 

  96. 96.

    Hoegg S, Vences M, Brinkmann H, Meyer A: Phylogeny and comparative substitution rates of frogs inferred from sequences of three nuclear genes. Mol Biol Evol. 2004, 21 (7): 1188-1200. 10.1093/molbev/msh081.

    CAS  PubMed  Google Scholar 

  97. 97.

    Edwards SV, Jennings WB, Shedlock AM: Phylogenetics of modern birds in the era of genomics. Proc Biol Sci. 2005, 272 (1567): 979-992. 10.1098/rspb.2004.3035.

    PubMed Central  CAS  PubMed  Google Scholar 

  98. 98.

    Hickson RE, Simon C, Perrey SW: The performance of several multiple-sequence alignment programs in relation to secondary-structure features for an rRNA sequence. Mol Biol Evol. 2000, 17 (4): 530-539.

    CAS  PubMed  Google Scholar 

  99. 99.

    Lee MSY: Unalignable sequences and molecular evolution. Trends Ecol Evol. 2001, 16 (12): 681-685. 10.1016/S0169-5347(01)02313-8.

    Google Scholar 

  100. 100.

    Felsenstein J: Inferring phylogenies. 2004, Sunderland, Mass. , Sinauer Associates, xx, 664 p-

    Google Scholar 

  101. 101.

    Tarrio R, Rodriguez-Trelles F, Ayala FJ: Shared nucleotide composition biases among species and their impact on phylogenetic reconstructions of the Drosophilidae. Mol Biol Evol. 2001, 18 (8): 1464-1473.

    CAS  PubMed  Google Scholar 

  102. 102.

    Phillips MJ, Delsuc F, Penny D: Genome-scale phylogeny and the detection of systematic biases. Mol Biol Evol. 2004, 21 (7): 1455-1458. 10.1093/molbev/msh137.

    CAS  PubMed  Google Scholar 

  103. 103.

    Harris DJ: Codon bias variation in C-mos between squamate families might distort phylogenetic inferences. Mol Phylogenet Evol. 2003, 27 (3): 540-544. 10.1016/S1055-7903(03)00012-5.

    CAS  PubMed  Google Scholar 

  104. 104.

    Lake JA: Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci U S A. 1994, 91 (4): 1455-1459. 10.1073/pnas.91.4.1455.

    PubMed Central  CAS  PubMed  Google Scholar 

  105. 105.

    Galtier N, Gouy M: Inferring Phylogenies from DNA-Sequences of Unequal Base Compositions. Proc Natl Acad Sci U S A. 1995, 92 (24): 11317-11321. 10.1073/pnas.92.24.11317.

    PubMed Central  CAS  PubMed  Google Scholar 

  106. 106.

    Jeffroy O, Brinkmann H, Delsuc F, Philippe H: Phylogenomics: the beginning of incongruence?. Trends Genet. 2006, 22 (4): 225-231. 10.1016/j.tig.2006.02.003.

    CAS  PubMed  Google Scholar 

  107. 107.

    Delsuc F, Brinkmann H, Philippe H: Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. 2005, 6 (5): 361-375. 10.1038/nrg1603.

    CAS  PubMed  Google Scholar 

  108. 108.

    Philippe H, Delsuc F, Brinkmann H, Lartillot N: Phylogenomics. Annu Rev Ecol Evol S. 2005, 36: 541-562. 10.1146/annurev.ecolsys.35.112202.130205.

    Google Scholar 

  109. 109.

    Lopez P, Casane D, Philippe H: Heterotachy, an important process of protein evolution. Mol Biol Evol. 2002, 19 (1): 1-7.

    CAS  PubMed  Google Scholar 

  110. 110.

    Baele G, Raes J, Van de Peer Y, Vansteelandt S: An Improved Statistical Method for Detecting Heterotachy in Nucleotide Sequences. Mol Biol Evol. 2006, 23 (7): 1397-1405. 10.1093/molbev/msl006.

    CAS  PubMed  Google Scholar 

  111. 111.

    Misof B, Anderson CL, Buckley TR, Erpenbeck D, Rickert A, Misof K: An empirical analysis of mt 16S rRNA covarion-like evolution in insects: site-specific rate variation is clustered and frequently detected. J Mol Evol. 2002, 55 (4): 460-469. 10.1007/s00239-002-2341-1.

    CAS  PubMed  Google Scholar 

  112. 112.

    Kolaczkowski B, Thornton JW: Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature. 2004, 431 (7011): 980-984. 10.1038/nature02917.

    CAS  PubMed  Google Scholar 

  113. 113.

    Steel M: Should phylogenetic models be trying to "fit an elephant"?. Trends Genet. 2005, 21 (6): 307-309. 10.1016/j.tig.2005.04.001.

    CAS  PubMed  Google Scholar 

  114. 114.

    Thornton JW, Kolaczkowski B: No magic pill for phylogenetic error. Trends Genet. 2005, 21 (6): 310-311. 10.1016/j.tig.2005.04.002.

    CAS  PubMed  Google Scholar 

  115. 115.

    Lockhart P, Novis P, Milligan BG, Riden J, Rambaut A, Larkum T: Heterotachy and tree building: a case study with plastids and eubacteria. Mol Biol Evol. 2006, 23 (1): 40-45. 10.1093/molbev/msj005.

    CAS  PubMed  Google Scholar 

  116. 116.

    Philippe H, Zhou Y, Brinkmann H, Rodrigue N, Delsuc F: Heterotachy and long-branch attraction in phylogenetics. BMC Evol Biol. 2005, 5: 50-10.1186/1471-2148-5-50.

    PubMed Central  PubMed  Google Scholar 

  117. 117.

    Galtier N, Gouy M: Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol Biol Evol. 1998, 15 (7): 871-879.

    CAS  PubMed  Google Scholar 

  118. 118.

    Herbeck JT, Degnan PH, Wernegreen JJ: Nonhomogeneous model of sequence evolution indicates independent origins of primary endosymbionts within the enterobacteriales (gamma-Proteobacteria). Mol Biol Evol. 2005, 22 (3): 520-532. 10.1093/molbev/msi036.

    CAS  PubMed  Google Scholar 

  119. 119.

    Hisatomi O, Kayada S, Taniguchi Y, Kobayashi Y, Satoh T, Tokunaga F: Primary structure and characterization of a bullfrog visual pigment contained in small single cones. Comp Biochem Physiol B Biochem Mol Biol. 1998, 119 (3): 585-591. 10.1016/S0305-0491(98)00032-7.

    CAS  PubMed  Google Scholar 

  120. 120.

    Loew ER, Lythgoe JN: The ecology of cone pigments in teleost fishes. Vision Res. 1978, 18 (6): 715-722. 10.1016/0042-6989(78)90150-5.

    CAS  PubMed  Google Scholar 

  121. 121.

    Sillman AJ, Govardovskii VI, Rohlich P, Southard JA, Loew ER: The photoreceptors and visual pigments of the garter snake (Thamnophis sirtalis): a microspectrophotometric, scanning electron microscopic and immunocytochemical study. J Comp Physiol [A]. 1997, 181 (2): 89-101. 10.1007/s003590050096.

    CAS  Google Scholar 

  122. 122.

    Rohlich P, van Veen T, Szel A: Two different visual pigments in one retinal cone cell. Neuron. 1994, 13 (5): 1159-1166. 10.1016/0896-6273(94)90053-1.

    CAS  PubMed  Google Scholar 

  123. 123.

    Makino CL, Dodd RL: Multiple visual pigments in a photoreceptor of the salamander retina. J Gen Physiol. 1996, 108 (1): 27-34. 10.1085/jgp.108.1.27.

    CAS  PubMed  Google Scholar 

  124. 124.

    Strader CD, Fong TM, Tota MR, Underwood D, Dixon RA: Structure and function of G protein-coupled receptors. Annu Rev Biochem. 1994, 63: 101-132. 10.1146/

    CAS  PubMed  Google Scholar 

  125. 125.

    Hunt DM, Wilkie SE, Bowmaker JK, Poopalasundaram S: Vision in the ultraviolet. Cell Mol Life Sci. 2001, 58: 1583-1598. 10.1007/PL00000798.

    CAS  PubMed  Google Scholar 

  126. 126.

    Onishi A, Koike S, Ida-Hosonuma M, Imai H, Shichida Y, Takenaka O, Hanazawa A, Komatsu H, Mikami A, Goto S, Suryobroto B, Farajallah A, Varavudhi P, Eakavhibata C, Kitahara K, Yamamori T: Variations in long- and middle-wavelength-sensitive opsin gene loci in crab-eating monkeys. Vision Res. 2002, 42 (3): 281-292. 10.1016/S0042-6989(01)00293-0.

    CAS  PubMed  Google Scholar 

  127. 127.

    Carleton KL, Harosi FI, Kocher TD: Visual pigments of African cichlid fishes: evidence for ultraviolet vision from microspectrophotometry and DNA sequences. Vision Res. 2000, 40 (8): 879-890. 10.1016/S0042-6989(99)00238-2.

    CAS  PubMed  Google Scholar 

  128. 128.

    Minamoto T, Shimizu I: Molecular cloning of cone opsin genes and their expression in the retina of a smelt, Ayu (Plecoglossus altivelis, Teleostei). Comp Biochem Physiol B Biochem Mol Biol. 2005, 140 (2): 197-205. 10.1016/j.cbpc.2004.09.028.

    PubMed  Google Scholar 

  129. 129.

    Swofford DL: PAUP* - Phylogenetic analyses using parsimony. 1999, Sunderland, Massachussets , Sinauer Associates

    Google Scholar 

  130. 130.

    Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001, 17 (8): 754-755. 10.1093/bioinformatics/17.8.754.

    CAS  PubMed  Google Scholar 

  131. 131.

    Kumar S, Tamura K, Nei M: MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform. 2004, 5 (2): 150-163. 10.1093/bib/5.2.150.

    CAS  PubMed  Google Scholar 

Download references


We would like to thank N. Lovejoy for many valuable comments and discussion; C. Weadick for helpful suggestions on the manuscript. This study was funded by an NSERC Grant (BC), a CIHR fellowship (IVH), and the Deutsche Forschungsgemeinschaft (JM).

Author information



Corresponding author

Correspondence to Belinda SW Chang.

Additional information

Authors' contributions

IVH, JM and BC conceived of the study, assembled and refined the data set. IVH and FS carried out phylogenetic analyses. IVH carried out statistical analyses of sequence data. BC and IVH drafted the manuscript, with help from JM and FS. BC guided the analyses, and was involved in all aspects of the work. All authors were responsible for interpretation of results in an evolutionary context as well as read and approved the final version of the manuscript.

Electronic supplementary material

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

van Hazel, I., Santini, F., Müller, J. et al. Short-wavelength sensitive opsin (SWS1) as a new marker for vertebrate phylogenetics. BMC Evol Biol 6, 97 (2006).

Download citation


  • Codon Position
  • Visual Pigment
  • Opsin Gene
  • Cone Opsin
  • Phylogenetic Utility