ITS2 secondary structure improves phylogeny estimation in a radiation of blue butterflies of the subgenus Agrodiaetus (Lepidoptera: Lycaenidae: Polyommatus )

Background Current molecular phylogenetic studies of Lepidoptera and most other arthropods are predominantly based on mitochondrial genes and a limited number of nuclear genes. The nuclear genes, however, generally do not provide sufficient information for young radiations. ITS2 , which has proven to be an excellent nuclear marker for similarly aged radiations in other organisms like fungi and plants, is only rarely used for phylogeny estimation in arthropods, although universal primers exist. This is partly due to difficulties in the alignment of ITS2 sequences in more distant taxa. The present study uses ITS2 secondary structure information to elucidate the phylogeny of a species-rich young radiation of arthropods, the butterfly subgenus Agrodiaetus. One aim is to evaluate the efficiency of ITS2 to resolve the phylogeny of the subgenus in comparison with COI , the most important mitochondrial marker in arthropods. Furthermore, we assess the use of compensatory base changes in ITS2 for the delimitation of species and discuss the prospects of ITS2 as a nuclear marker for barcoding studies. Results In the butterfly family Lycaenidae, ITS2 secondary structure enabled us to successfully align sequences of different subtribes in Polyommatini and produce a Profile Neighbour Joining tree of this tribe, the resolution of which is comparable to phylogenetic trees obtained with COI+COII . The subgenus Agrodiaetus comprises 6 major clades which are in agreement with COI analyses. A dispersal-vicariance analysis (DIVA) traced the origin of most Agrodiaetus clades to separate biogeographical areas in the region encompassing Eastern Anatolia, Transcaucasia and Iran. Conclusions With the inclusion of secondary structure information, ITS2 appears to be a suitable nuclear marker to infer the phylogeny of young radiations, as well as more distantly related genera within a diverse arthropod family. Its phylogenetic signal is comparable to the mitochondrial marker COI . Compensatory base changes are very rare within Polyommatini and cannot be used for species delimitation. The implementation of secondary structure information into character-based phylogenetic methods is suggested to further improve the versatility of this marker in phylogenetic studies.


Background
Molecular phylogenetic studies aim to reconstruct species trees, e.g. to infer the evolution of morphological characters or life history traits. While in the early days of genetic analyses, the data sets were often confined to single gene fragments, it is now generally acknowledged that analyses should include several genes [1][2][3]. The use of multiple genes not only provides a greater resolution over different time scales but yields a more accurate estimate of the species tree which may not correspond to a single gene tree, especially in radiations of closely related species [4,5]. Unfortunately, the number of genes which are routinely used for phylogenetic analysis, especially in species rich arthropod assemblages, have remained limited [6]. In the mitochondrial genome, the cytochrome c oxidase subunit I (COI ) has become the most commonly used marker in molecular phylogenetic studies of arthropods, in part due to it being the focal genetic marker for DNA barcoding studies [7]. This marker is now routinely supplemented by the nuclear marker elongation factor 1 alpha (ef1 ) and sometimes wingless (wg ) [3,6]. These nuclear markers, however, continue to be of limited use in resolving the phylogeny of young radiations because of their slow evolutionary rate. Recently, novel nuclear genes have been tested in species of Lepidoptera, four of which (Tektin, CAD, DDC, IDH ) appear promising for such radiations [6,8]. However, experience with these remains limited or lacking.
The internal transcribed spacer 2 (ITS2 ), which separates the nuclear ribosomal genes 5.8S and 28S , constitutes a rapidly evolving nuclear DNA fragment and has proved very useful when inferring phylogenetic relationships of closely related species in groups of organisms such as plants and fungi [9]. The highly conserved flanking regions can be used as an anchor for universal primers. However, ITS2 studies on the phylogeny of metazoans are relatively rare. In arthropods, only 11,927 ITS2 sequences from 2720 species have been deposited in GenBank [10] as of 02 Feb 2009 compared to 13,347 ef1 sequences from 7353 species and 375,287 COI sequences from 46,385 species in BOLD [11]. This may, in part, be explained by alignment problems which have limited use of ITS2 in phylogenetic studies of more distantly related taxa. Advances in predicting the secondary structure of ITS2 enables alignment of ITS2 data from more distantly related taxa and increases its utility above the genus level [12,13]. In this paper we show that the inclusion of secondary structure information improves phylogeny estimation with ITS2 in a large radiation of blue butterflies and renders ITS2 a useful nuclear marker in phylogenetic studies. Furthermore, we suggest that ITS2 is a promising nuclear candidate for barcode studies, in addition to the mitochondrial marker COI .
The Lycaenidae are the second largest family of butterflies with about 6000 species worldwide. Among them is a large radiation of ca 130 Palaearctic species, i.e., the subgenus Agrodiaetus . It is extraordinary in Metazoa for its extreme interspecific variation of chromosome numbers, which is present even among closely related species that are often very similar or identical in phenotype [14][15][16][17]. Recently, the radiation has become the focus of several molecular phylogenetic studies in order to unravel the evolution of morphological and karyological characters [18][19][20][21] and to evaluate the barcoding approach [22]. All these studies employed COI as the main genetic marker. Wiemers [18] additionally used ITS2 as a secondary marker, but phylogenetic resolution without the inclusion of COI remained unsatisfactory, and the alignment had to be confined to the subtribe Polyommatina due to alignment problems. Kandul et al. [19] included ef1 as an additional nuclear marker in a small subset of taxa, but the marker hardly provided any phylogenetic signal and was therefore abandoned in subsequent studies [20,21]. Our aim is to compare and evaluate the phylogenetic trees based on COI with independent evidence from the nuclear ITS2 incorporating sequence, as well as, secondary structure information.
Without doubt, DNA sequence data are an extremely valuable source of information to infer phylogenetic relationships. Another usage of these data has recently come into the focus of both biological scientists and stakeholder groups and attracted much controversy among them: their usage to delimit and identify species [22][23][24][25][26][27][28][29][30][31][32][33]. Although COI has been the marker of choice for the barcoding campaign, ITS2 is a successful alternative. This is especially true in groups where COI fails to work well, e.g. in fungi [34], where it was used in combination with ITS1 , and, most recently, in diatoms [35]. Furthermore, it has been recently claimed that structural differences in ITS2 are predictive of species limits. In this view, pairings of CBCs (= compensatory base changes) provide an indication for sexual incompatibility [36], while their absence indicates intercrossing ability [37]. As the investigated taxonomic group provides an interesting and opportune example, a further aim of this study is to test, whether these claims also apply for the large and very recent radiation of the subgenus Agrodiaetus with an origin about 2.51-3.85 million years ago [19,21]. sequences were obtained, probably caused by polymerase slippage at positions with highly repetitive motifs. Usually, it was still possible to obtain a complete sequence by sequencing from 5' and 3' ends such that the sequences only rarely remained incomplete after extended sequencing efforts. Incomplete sequences were excluded from the analysis as they may be result from co-amplified pseudogenes or not homogenized ITS2 copies. No obvious problems with intragenomic sequence variation were encountered in the remaining sequences --all electropherograms obtained were readable over their entire length. Thus, we assume to have no problems associated with non-homogenized ITS2 copies, what has been reported in other ITS studies [38][39][40][41] and is discussed in several reviews [42,43]. Sequence length varied between 450 bp (in Tarucus theophrastus ) and 602 bp (in Allotinus portunus and Lysandra corydonius ). Sequence length variation in Agrodiaetus was between 530 bp (in A. kurdistanicus ) and 563 bp (in A. dama ). Nucleotide composition was typical for RNA with a slight overrepresentation of guanine (U : C : A : G = 0.234 : 0.261 : 0.203 : 0.302).
Alignment was successful for all sequences of the tribe Polyommatini (including six subtribes), as well as for the outgroup (Miletini: Allotinus portunus ). Alignment difficulties were encountered with sequences of three other tribes (Theclini, Eumaeini and Lycaenini) which were therefore excluded from the analysis.
The alignment had 1024 positions of which 419 were variable and 235 were parsimony-informative (with gaps treated as missing data). Within Agrodiaetus , 131 positions were variable and 58 were parsimony informative.

Phylogeny of Polyommatus
According to the Profile Neighbour Joining (= PNJ) tree ( fig. 1), the genus Polyommatus represents a monophyletic unit with the exception of its subgenus Lysandra . The subgenus Lysandra is clearly monophyletic but its placement within Plebejus s.l. is unsupported. Some systematic treatments have united Lysandra with Meleageria , but the two subgenera appear distinctly distant from each other in our analysis.
The remaining subgenera (Agrodiaetus, Meleageria, Polyommatus s.str., Neolysandra ) together form a monophyletic group with a bootstrap support of 88%. Regarding these subgenera, the monophyly of the subgenus Agrodiaetus is supported with a bootstrap value of 74%. The sister group to Agrodiaetus appears to be either the subgenus Meleageria or Polyommatus s.str. The latter subgenus includes taxa which have sometimes been placed in subgenera Sublysandra and Plebicula . While the taxa attributed to Sublysandra (P. cornelia, P. aedon and P. myrrhinus ) appear to form a monophyletic cluster at the base of the remaining species of Polyommatus , the subgenus Plebicula (in which P. dorylas, P. escheri, P. amandus and P. thersites have sometimes been included) does not appear as a monophyletic entity. The taxa of the subgenus Neolysandra appear at a basal position relative to the other Polyommatus subgenera. The relationships of the remaining Polyommatina genera with each other and with Polyommatus are not well supported, except for the monophyly of Aricia . Nonetheless, the subtribe Polyommatina received high bootstrap support (95%) and the members of all other Lycaenidae tribes are positioned outside this cluster.

Phylogeny of Agrodiaetus
Agrodiaetus damon (the two sequences from France and Turkey are identical) appears to be the sister taxon to all other Agrodiaetus . Unfortunately, the bootstrap support for this position is low. However, a single base-pair substitution is present at position 918 in the alignment that is a further support for the basal position of A. damon (although weak). At this position, all other Agrodiaetus sequences bear a guanine while A. damon and the remaining species of the genus Polyommatus bear an adenine base. The following major clades are supported by bootstrap values ≥ 50 among the remaining Agrodiaetus species as indicated in fig. 1 (bootstrap values in brackets): admetus clade (54%), dolus clade (81%), carmon clade (50%), actinides clade (62%), iphigenia clade (59%), glaucias clade (56%), poseidon clade (79%).
The remaining three species cluster with low bootstrap support: A. valiabadi as sister to the admetus and dolus clades (40%), A. pierceae as sister to the carmon clade (37%), and A. klausschuriani as sister to the poseidon clade (52%).
The phylogenetic relationships among the clades are usually poorly supported by bootstrap values with the exception of the admetus and dolus clades which form a clade together with A. valiabadi with a bootstrap support of 64%.
A classification based on Agrodiaetus clades with bootstrap support ≥ 50% is presented in fig. 1, together with classifications based on previous publications. A comparison of molecular based classifications reveals that 7 major clades are repeatedly found. Their support values are given in table 1. Bootstrap support values and profile identities > 95% are indicated on branches above nodes. Upperside wing colouration of males is indicated by branch colouration, using 6 different classes following Lukhtanov et al. (2005) [20]. Modal chromosome numbers are indicated in brackets after the species name (bold = gene sequence and karyotype data obtained from the same specimen; italics = sequence and karyotype data of a different individual from the same population [18][19][20][21]). Classification schemes of the present and other studies are coded by coloured rings around the tree. References to the corresponding studies are given in square brackets.

Biogeographical patterns in Agrodiaetus
According to the dispersal-vicariance model implemented in DIVA, the origin of Agrodiaetus remains uncertain, but the ancestral biogeographical areas of most major clades are quite precisely inferred ( fig. 2, table 2 &3). An exception is the admetus clade whose ancestral area appears to encompass almost the entire range of the subgenus, with the exception of the Central Eurosiberian and Lebanese regions. The reason for this result, however, might be due to the poor taxonomy of this clade. It consists only of monomorphic species which hardly differ in phenotype and possess high chromosome numbers. The precise count of such high chromosome numbers is very difficult with standard karyological techniques [18]. Molecular results (of ITS2 as well as COI [18]) indicate that A. ripartii , the most widespread member of this clade, is not monophyletic and consists of several distinct species. The ancestral area of the closely related dolus clade also remains ambiguous but is confined either to the Mediterranean, the Central Anatolian, the Armenian, or Kurdistanian region. Most members of the dolus clade are also monomorphic or have high chromosome numbers. Therefore its taxonomy is contentious as well and this might have influenced the results. An illustrative example is given in the following section. The ancestral areas of the remaining clades appear to be restricted to four biogeographical regions. The Kurdistanian region is home to the carmon clade (as well as to the small Iranian shahrami clade) while the iphigenia and poseidon clades seem to have originated in the neighbouring Armenian region. (The latter clade might also have originated from both.) With the exception of the Turkestanian actinides clade, the remaining smaller clades (erschoffii, posthumus, glaucias ) appear to have originated in the Central Iranian region.

Compensatory base changes (CBCs) in Agrodiaetus
A maximum of only 3 CBCs are found among the 140 investigated species-level taxa of Lycaenidae. One of them occurs between members of the Agrodiaetus +Polyommatus +Meleageria clade and the remaining Lycaenidae species (with the exception of Neolysandra fatima ). In 64% of pairwise species comparisons (and even 99.8% of congeneric comparisons) no CBCs are found. Within Agrodiaetus hardly any species is distinguished by a CBC, but some major clades can be delimited by hemi-CBCs such as the iphigenia and dolus clade. Due to the low number of CBCs and hemi-CBCs, the NJ trees created from CBC or hemi-CBC distance matrices provide little resolution (data not shown).
Although CBCs are uncommon within Polyommatini, most species differ in their ITS2 sequence. Identical haplotypes were only found in very few sets of taxa (table 4). Most of them concern taxa with questionable species status [18,44]. For example, A. karacetinae differs only in karyotype and COI sequence from A. alcestis , but not in any morphological characters ("karyospecies"). Its position in fig. 1 (as sister to A. ainsae ) is an artefact caused by a single missing nucleotide at position 628 in the alignment which causes a change in secondary structure making it similar to A. ainsae . The sequence of the latter taxon is most similar to that of A. fulgens , and its distant position to this species in fig. 1 can also be explained by several missing nucleotides. According to recent karyological A. ainsae appears to be conspecific with A. fulgens and the name A. ainsae was therefore synonymised with A. fulgens [45].

Secondary structure information improves phylogenetic signal in ITS2
Wiemers [18] used a mostly comparable set of taxa for phylogenetic inference from ITS2 but did not include secondary structure information. Although most major clades recovered in our analysis were also found in the Bayesian analysis by Wiemers [18], none of our major clades were recovered with bootstrap support values ≥ 50% in the Maximum Parsimony (MP) analysis of Wiemers [18]. The poseidon clade was also not recovered in the Bayesian 80% consensus tree presented. (This clade -with the exclusion of A. putnami -only received a Bayesian support of 0.65, Wiemers unpubl., table 1). In a Neighbour Joining (NJ) analysis calculated without secondary struc-ture information only two of the major clades recovered in the PNJ analysis received bootstrap values ≥ 50% while two clades received lower bootstrap values and the remaining two were not recovered at all (table 1). Thus, in a direct comparison of two NJ algorithms (with vs. without secondary structure, table 1), secondary structure information apparently amplifies the phylogenetic information in the data set. Further improvement in phylogeny estimation is to be expected if secondary structure information can be incorporated in Maximum Likelihood (ML) or Bayesian inference (BI) methods, because these character-based methods can be superior compared to distance based methods which discard character-state information.
One disadvantage of using secondary structure information appears to be its sensitivity to missing data in stem regions. Even small amounts of missing data can cause artefacts in phylogeny estimation of closely related taxa      PNJ tree of ITS2 and biogeographical regions The  with very similar sequences (viz. A. alcestis and A. karacetinae ).

Phylogenetic signal of ITS2 is comparable to COI in Agrodiaetus
In agreement with COI analyses [18], ITS2 data support the monophyly of Polyommatina which includes the genera Chilades, Plebejus and Polyommatus . The monophyly of the genera Plebejus and Polyommatus , however, is not fully supported. This is due to the placement of the subgenus Lysandra within Plebejus , which however has no bootstrap support and is probably caused by long-branch attraction. Such a placement is also in conflict with the Bayesian analysis of COI which places Lysandra within the genus Polyommatus [18]. The ITS2 sequences of subgenus Lysandra are peculiar in having several longer inserts with repetitive motifs, e.g. in position 70-133 in the alignment. It is noteworthy, on the one hand, that none of the analyses supports a sister-relationship between Lysandra and Meleageria , even though members of these genera can hybridize with each other [46-48] and therefore were considered to be very closely related [15]. On the other hand, Cyaniris is found within Plebejus in the COI tree but basal within Polyommatus in the ITS2 tree, both times with low support values. Here, the COI analysis appears to be more affected by long-branch attraction.
Within Agrodiaetus , the phylogenetic analysis of ITS2 recovers clades which are mostly congruent to those obtained from an analysis of COI + COII (= cytochrome c oxidase II). Of particular interest is the confirmation of the sister relationship between A. damon and the remain-ing Agrodiaetus species that was not or only very weakly supported in the COI analyses. ITS2 and COI also agree in the monophyly and sister relationship of the admetus and dolus clades, only the position of A. valiabadi differs (within the dolus clade in COI , but sister to admetus +dolus in ITS2 ). The carmon clade is also recovered in the COI +COII analyses but includes the iphidamon clade in the analyses by Lukhtanov et al. [20] and Kandul et al. (2007) [21]. Kandul et al. (2004) [19] split this group into three clades although one of them (clade VII) only appears in the MP analysis and has no bootstrap support. In the COI analyses by Wiemers [18] and Wiemers & Fiedler [22], which are based on shorter sequences, the carmon group receives no bootstrap support. Similarly, the iphigenia clade is only recovered in the mtDNA analyses based on the long 1969 bp section of COI +COII . The poseidon clade is recovered in the COI analyses, as well. Kandul et al. [19] split this clade into three subclades but the addition of further taxa revealed that they are not monophyletic and thus should be combined [20,21]. Most interesting is the actinides clade in the ITS2 tree which suggests a close relationship between A. actinides, A. poseidonides and A. iphigenides . Although previous analyses have also suggested a close relationship among these taxa, it was never well supported. The relationships of the remaining clades (glaucias, erschoffii, posthumus, shahrami, phyllis ) are not well supported in the ITS2 tree. Previous analyses using COI [18][19][20] have suggested a close relationship of these clades, but their combination into an inclusive erschoffii clade was only very weakly supported by the latest COI analysis [21], probably due to the inclusion of additional taxa (such as A. eckweileri ). The only major discrepancy is the placement of A. klausschuriani in the ITS2 analyses (sister to the poseidon clade) compared to the COI analyses (within the erschoffii clade), but both placements are only very weakly supported. The missing support for the relationships between the major clades also applies to the COI analyses. Most analyses, however, agree in the basal position of the admetus +dolus clade and all of them recover the poseidon clade at the tip of the tree.
We conclude that the phylogenetic signal of ITS2 is comparable to the signal of a much longer fragment of COI / COII . This is surprising since the rate of parsimonyinformative characters is lower in ITS2 than in COI [18]. Apparently these characters are, however, less "noisy" than those of COI , which are almost completely confined to 3 rd codon positions. Fig. 1 reveals little congruence between previous classifications based on morphological characters [14,15,49] and those on molecular data (COI or ITS2 ). The main reason for this is the small number of available morphological characters (mostly slight differences in wing colouration) . Discolouration of males is coupled with an expansion of the androconial patches, apparently due to a switch from a visual to a scent-based mate recognition system [18]. Although the molecular analyses also recover a clade containing exclusively discoloured males (the clade formed by the admetus and dolus sister-clades), the molecular data reveal that single discoloured species or small groups of them are also found in most other clades. Discoloured species also appear in many other subgenera of Polyommatus and related genera which usually have bluish males. In the sister species pair, M. daphnis/M. marcida , the discolouration of the latter taxon (which possibly represents only a conspecific population of the former) is probably an adaptation to the specific climatic conditions (low solar radiation) on the north side of Elburs mountains [50]. Such sister species pairs with differing male upperside colouration are also found in Agrodiaetus , e.g.
In some butterfly groups with similar wing patterns, genitalia provide important features for identification and classification. Unfortunately, they are very similar in all Agrodiaetus species, possess only few usable characters and therefore have only rarely been evaluated. The little available evidence, however, appears to be more congruent with molecular data than with wing pattern characters. Coutsis [51] analyzed the genitalia of several Agrodiaetus taxa which had previously been regarded as subspecies of Agrodiaetus iphigenia due to their similar wing colouration, among them A. iphidamon and A. iphigenides . He concluded that genitalia differences rule out conspecifity. According to the molecular results these taxa belong to different clades. A. iphidamon and A. dizinensis have been placed in different groups according to wing pattern characters [49], but they share a synapomorphic character in their genitalia: the shape of the labides is short, pointed and "dagger-like" (Coutsis, pers. comm.). Molecular results also clearly show that they are closely related. The monomorphic Agrodiaetus species of the admetus and dolus clades differ in karyotype but are difficult or impossible to identify based on wing pattern characters. Members of these two clades, however, differ in the length of their valves relative to their body size, those in the admetus clade (with the possible exception of A. admetus ) being shorter than those in the dolus clade [52][53][54]. A comprehensive treatment of the genitalia of Polyommatina is currently in preparation (Coutsis, pers. comm.).

Historical biogeography
The results of our DIVA analysis confirm earlier assumptions (e.g. [18]) that Eastern Anatolia, Transcaucasia and Iran are the main centres of Agrodiaetus radiation. Although the origin of the subgenus could not be inferred with this method, the ancestral biogeographical areas of most major clades are placed in this region. Most interestingly, the origin of each of these clades seems to be confined to a single region (or possibly two neighbouring regions in one case). These results support the evolutionary significance of the clades obtained from the molecular analyses (ITS2 as well as COI/COII ).

CBCs as predictors of sexual incompatibility and the utility of ITS2 to delimit species
Due to the low number of CBCs (and hemi-CBCs) in Lycaenidae, these structural markers cannot be used to predict species limits in the family. Although this does not preclude the possibility that a CBC is a sufficient condition to distinguish species [36], an absence of CBCs cannot be used to predict intercrossing ability as suggested by Coleman [37].
This deficiency does not mean that ITS2 sequences cannot be used to delimit species. Even in the young radiation of Agrodiaetus , scarcely any two species have identical ITS2 haplotypes, while the same haplotype may be found in distant populations of the same species, e.g. Agrodiaetus damon from France and Turkey. On the other hand, sequence differences among populations and among individuals in a single population do exist [18], and we currently lack sufficient intraspecific ITS2 sequence data to check for the existence of a barcode gap or diagnostic DNA characters [22,25]. Available intraspecific ITS2 sequences usually cluster together in the PNJ tree. Exceptions occur in species complexes with disputable species borders (A. ripartii and A. altivagans ) and in Polyommatus icarus : the Iranian P. icarus sequence does not cluster with conspecific sequences but with the almost identical sequence of P. forsteri , and is even identical with that of an Iranian specimen (voucher code ILL071) of Polyomma-Male wing vouchers of sister species pairs with different upperside colouration tus icadius [44]. The latter is a Central Asian species, whose phenotype is very similar to P. icarus , but which is well differentiated in ITS2 and was only recently discovered in Iran [44]. The phenotype of the Iranian P. icarus specimen, however, is typical for P. icarus and its COI sequence is almost identical to those of P. icarus from Greece and Anatolia, where P. icadius does not occur [22]. Therefore it is possible that the specimen (MW00412) actually represents a hybrid between P. icarus and P. icadius . Some evidence for introgressive hybridization between these two taxa comes from the Altai where P. icarus and P. icadius share identical COI haplotypes [55]. Although this complex needs further research it is an example for the importance of analysing a fast nuclear locus in addition to the mitochondrial COI .

Conclusions
Our analyses show that ITS2 can be a suitable phylogenetic marker not only for closely related groups of species, but also for higher taxa. In the family Lycaenidae, secondary structure information enabled the alignment of sequences from different subtribes of the tribe Polyommatini.
In Agrodiaetus, six major clades were obtained which are corroborated by independent evidence from mitochondrial DNA, genitalia structure, as well as our biogeographical analysis. These clades, however, do not correspond with traditional classifications, which were mainly based on the very limited set of wing pattern characters.
The use of secondary structure information with Profile Neighbour Joining also increased resolution and bootstrap support in the subgenus Agrodiaetus to the extent that ITS2 phylogenetic trees provide a resolution comparable to COI .
In insects, ITS2 currently appears to be the only available and well tested nuclear DNA marker which is informative enough to resolve the phylogeny of young radiations such as Agrodiaetus . Therefore we recommend the use of this marker as an addition to mitochondrial markers (like COI ) in order to prevent erroneous estimation of species trees caused by introgressive hybridization, incomplete lineage sorting or horizontal gene transfer. Although introgression of mitochondrial DNA (mtDNA) appears to be less common in Lepidoptera than in most other Metazoa due to their female-heterogametic sex chromosome system [56] and Haldane's rule [57], recent work shows that such cases exist (Wiemers unpublished; [58]) and therefore should not be ignored.
We cannot, however, corroborate the use of CBCs to delimit species, because CBCs are very rare even among distantly related species in Lycaenidae and, at least, for this group their absence is not a useful predictor for sexual compatibility as claimed by Coleman et al. [37].

Material
A total of 156 Lycaenidae ITS2 sequences were included for our analysis. Of these, 17 were exclusively determined for this study. The remainders were selected from the phylogenetic analysis of the PhD thesis by the first author [18]. Five of these sequences were improved in quality by repeating the sequencing procedure.
Generally, only one sequence per species was retained, except for taxa with a large range or with considerable geographic variation. In the latter case, two sequences representing this variation were retained. Selection criterion was the sequence quality in order to minimize ambiguities. For three species, the only available sequence was of insufficient quality and therefore these taxa were excluded from the analysis (Agrodiaetus surakovi, Aricia eumedon, Plebejides pylaon ).
Most sequences belong to Agrodiaetus (97), the others to closely related genera of the same subtribe Polyommatina (54) or other subtribes within the tribe Polyommatini (5 sequences). Allotinus portunus (Miletinae) was chosen as outgroup because it was the only non-Polyommatini sequence available within Lycaenidae which could successfully be aligned. Alignment of sequences from the tribes Lycaenini, Theclini and Eumaeini failed, despite the fact that they are held to be more closely related to Polyommatini according to the morphology-based classification by Eliot [59].
All sequences have been deposited in GenBank [10] with LinkOuts provided to images of the voucher specimens deposited with MorphBank [60] (table 5). Annotation changes of existing entries after HMM-Annotation were as well submitted to this database. No further complete ITS2 sequences of Lycaenidae are currently available from Gen-Bank. The voucher specimens and DNA extractions are currently stored by the first author at the Department of Animal Biodiversity, Vienna University, but will eventually be deposited at the Alexander Koenig Research Institute and Museum of Zoology in Bonn (Germany).
In many Agrodiaetus species groups, especially among the monomorphic, i.e., "brown" species, karyotypes are important for species identification. Therefore in most of the specimens included in molecular analysis, the karyotypes were studied [18] using squash techniques [61,62].
Upperside wing colouration of males was classified according to the method of Lukhtanov et al. (2005) [20]. One additional colour class ("golden" for golden brown) was added for Agrodiaetus peilei , a species which was not assessed in their study.
The status of many taxa in the genus Polyommatus is questionable, especially in the subgenus Agrodiaetus which includes many recently described species, some based on disputable evidence. Taxonomic revisions and further research are needed to clarify the status of these taxa. At present, we have retained most species in order to facilitate comparisons with published studies, although some have been synonymised recently. For example, Agrodiaetus ainsae has been synonymised with A. fulgens [45] and Vodolazsky et al. [44] treat several Polyommatus taxa as subspecies or synonyms of P. eros (P. kamtshadalis, P. eroides and P. menelaos ) and P. icarus (P. andronicus and P. juno ).
PCR was conducted on thermal cyclers from BIOMETRA ® (models UNO II or T-GRADIENT) or ABI BIOSYSTEMS ® (model GENEAMP ® PCR-System 2700) using the following profiles: initial 4 minutes denaturation at 94°C and 35 cycles of 30 seconds denaturation at 94°C, 30 seconds annealing at 55°C and 1 minute extension at 72°C. PCR products were purified using purification kits from PROMEGA ® or SIGMA ® and checked with agarose gel electrophoresis before and after purification.
Cycle sequencing was carried out on BIOMETRA ® T-GRADI-ENT or ABI BIOSYSTEMS ® GENEAMP ® PCR-System 2700 thermal cyclers using sequencing kits of MWG BIOTECH ® (for LI-COR ® automated sequencer) or ABI BIOSYSTEMS ® (for ABI ® 377 automated sequencer) according to the manufacturers' protocols and with the following cycling times: initial 2 minutes denaturation at 95°C and 35 cycles of 15 seconds denaturation at 95°C, 15 seconds annealing at 49°C and 15 seconds extension at 70°C. Primers used were the same as for the PCR reactions for the ABI (primer 1 for forward and primer 2 for independent reverse sequencing). Electrophoresis of sequencing reaction products was carried out on LI-COR ® or ABI ® 377 automated sequencers using the manufacturer's protocols. Electropherograms were edited and aligned using the LaserGene ® Software SeqMan Pro Version 7.1.0 by DNASTAR ® .

Data analysis Secondary Structure Prediction
Data analysis followed the method described in Schultz & Wolf [64] for secondary structure phylogenetics. All retained ITS2 sequences were delimited and cropped with the HMM-based annotation tool present at the ITS2 database ( [65]; E-value < 0.001, metazoan HMMs). This tool furthermore integrates a visual check for the 5.8S/28S hybridization as the ITS2 proximal stem. Incorrect folding of this region is a good indication for pseudogenes [66]. All sequences of this study passed this test with a correct folding, so that we are confident to exclude pseudogenes in this study. Furthermore, according to Álvarez & Wendel [42], ITS pseudogenes have lowered secondary structure stability and an increase in AT content via deaminations. This was not the case for our complete ITS2 sequences, since their secondary structures were stable and the GC content of each sequence was clearly above 50%. The proximal stem (25 nucleotides of 5.8S as well as 28S rDNA) was included to preserve a conserved margin of the alignment. For several sequences, nucleotides near the 3' end of the proximal stem were ambiguous. For these, nucleotides with more than 95% consensus within the remaining aligned sequences were adopted by the majority rule to preserve the marginal secondary structure of the RNA. The secondary structure of the ITS2 of Neolysandra coelestina (MW99013) was predicted with RNA structure 4.6 [67] and ported to Vienna format with CBCanalyzer 1.0.3 [68] (fig. 4). The structures of the remaining sequences were predicted by custom homology modelling at the ITS2 database [69][70][71][72] with the aforementioned structure as a template and at least 70% helix transfer (identity matrix, gap costs: gap open 15, gap extension 2). We further applied a Nussinov Algorithm (perl script) to each sequence to close additional base-pairs within helices, which were left open by homology modelling. For this procedure, no existing base pairs were removed, no pseudo-knots were allowed and exclusively Watson-Crick pairs were added (see fig. 5 for examples).

Alignment and Phylogenetic Analyses
Sequences and secondary structures were automatically and synchronously aligned with 4SALE 1.5 [73,74]. 4SALE translates sequence-structure tuple information prior to alignment into pseudo-proteins. Pseudo-proteins were coded such that each of the four nucleotides may be present in three different states: unpaired, opening basepair and closing base-pair. Thus, an ITS2 specific 12 × 12~scoring matrix was used for calculation of the alignment [73,74]. Sequence-structure alignment is available at the ITS2 database supplements page [75].
To determine evolutionary distances between organisms simultaneously on sequences and secondary structures we used Profile Neighbour Joining (PNJ) [76] as implemented in ProfDistS 0.98 [77,78]. The tree reconstructing algorithm works similar to the alignment method on a 12 letter alphabet comprised of the 4 nucleotides in three structural states (unpaired, paired left, paired right). We applied an ITS2 -specific general time reversible substitution model [73]. Profiles were automatically built for nodes with bootstrap support values (1000 replicates) above 70% or with at least 95% nucleotide identities. A profile is regarded as a sequence, however it is composed of probability distribution vectors instead of characters. PNJ is iterated until no more profiles can be defined according to our settings. The resulting tree was displayed with iTol v1.3.1 [79] and further refined with CorelDRAW X3 (Corel Corporation, Ottawa, Canada). We utilized CBCanalyzer 1.1 [73,74] to detect CBCs and hemi-CBCs between sequence-structure pairs and to calculate a CBC tree. We used MEGA 4.0.1 [80] to calculate a matrix of pdistances and TCS 1.21 [81] to detect identical haplotypes. MEGA was also used to calculate the bootstrap support values (1000 replicates) of the NJ tree without secondary structure information using the Tamura-Nei model of nucleotide substitution with heterogeneous pattern among lineages and gamma distributed rates among sites. The appropriate model and the gamma parameter (0.8365) were calculated with MODELTEST 3.7 [82].

Classification procedures
To evaluate the results of our approach we constructed a classification of Agrodiaetus based on major clusters with bootstrap values ≥ 50% and compared this classification with those constructed in similar ways from published studies which either used the same marker but without secondary structure information or the mitochondrial marker COI or both. The clusters were named after the taxonomically most senior taxon. Classifications from published studies were constructed in the following way: • A classification for ITS2 without secondary structure information was constructed using major clusters from the Bayesian analysis conducted by Wiemers [18]   from 105 Agrodiaetus taxa but did not provide a classification. We inferred one using major clades with support values MP ≥ 50%, ML ≥ 50% or BI ≥ 0.80.
• Wiemers & Fiedler [22] carried out a NJ analysis using a combination of COI sequences taken from Wiemers [18] and Lukhtanov et al. [20] which included a total of 116 Agrodiaetus species. Major clusters with bootstrap values ≥ 50% were used for the classification.
• A combined analysis of ITS2 and COI sequences of similar length (690 bp) from 88 Agrodiaetus species was carried out by Wiemers [18]. He proposed a classification based on clusters obtained with Bayesian inference using a support threshold for posterior probabilities of 0.95.

Biogeographical analysis
A dispersal-vicariance analysis was conducted with the programme DIVA 1.2 [83] to infer the ancestral distributions in the phylogeny of Agrodiaetus . Since outgroup relationships of Agrodiaetus were not well resolved in previous studies, A. damon was used as the outgroup to the remaining Agrodiaetus species according to our complete PNJ analysis (Fig. 1). The distribution area of Agrodiaetus was divided into 11 biogeographical regions which are based on floral biogeographical regions [84]: • C Eurosiberian: the Central European region (incl. the Central Siberian subregion) and the Pontic -South Siberian region • Mediterranean: the Submediterranean and Mediterranean regions excl. the South Anatolian and Palestinian -Lebanese provinces Conserved ITS2 secondary structure of the Polyommatina Figure 4 Conserved ITS2 secondary structure of the Polyommatina. The proximal stem of hybridized 5.8S (blue) and 28S (red) rDNA is included. Helices are numbered in Roman numerals. Two small helices are found near the beginning, which are referred to as helices I.a and I.b. The first (basal) internal bulge of helix II with two nucleotides mismatching one nucleotide is the typical U-U mismatch found in the second helix of ITS2 structures throughout the Eukaryota. Degree of conservation is displayed in colour grades from green (conserved) to red (unconserved). The complete structure represents the 51% consensus of aligned structures without gaps.
ITS2 secondary structure of Lysandra syriaca Figure 5 ITS2 secondary structure of Lysandra syriaca. In the distal loop of helix I.b an insertion of nucleotides is present in the genus Lysandra . Based on homology modelling with a template in which these nucleotides are absent (Neolysandra ), the nucleotide insertions remain unpaired. This is a distinctive feature for the genus. Information on the occurrence of Agrodiaetus species in these regions was gathered from published distribution maps and regional faunistic monographs [15,[85][86][87][88][89][90][91][92][93][94][95].