Phylogenetic and Spatial Distribution of Evolutionary Isolation and Threat in Turtles and Crocodilians (Non-Avian Archosauromorphs)

The origin of turtles and crocodiles and their easily recognized body forms dates to the Triassic. Despite their long-term success, extant species diversity is low, and endangerment is extremely high compared to other terrestrial vertebrate groups, with ~ 65% of ~25 crocodilian and ~360 turtle species now threatened by exploitation and habitat loss. Here, we combine available molecular and morphological evidence with machine learning algorithms to present a phylogenetically-informed, comprehensive assessment of diversification, threat status, and evolutionary distinctiveness of all extant species. In contrast to other terrestrial vertebrates and their own diversity in the fossil record, extant turtles and crocodilians have not experienced any mass extinctions or shifts in diversification rate, or any significant jumps in rates of body-size evolution over time. We predict threat for 114 as-yet unassessed or data-deficient species and identify a concentration of threatened crocodile and turtle species in South and Southeast Asia, western Africa, and the eastern Amazon. We find that unlike other terrestrial vertebrate groups, extinction risk increases with evolutionary distinctiveness: a disproportionate amount of phylogenetic diversity is concentrated in evolutionarily isolated, at-risk taxa, particularly those with small geographic ranges. Our findings highlight the important role of geographic determinants of extinction risk, particularly those resulting from anthropogenic habitat-disturbance, which affect species across body sizes and ecologies.


Introduction 31
Turtles and crocodilians (non-avian Archosauromorpha) are among the most charismatic 32 and widely recognized of all living things (1). Both groups date to the Triassic, and exhibit highly 33 derived, yet highly conserved body forms. Crocodilians are famous for their extraordinary size 34 (up to 6m and 1,000kg), long snouts and tails, and bony armor under the skin; and turtles for 35 their bony or cartilaginous shell, and the size of some marine and terrestrial species (1.4m and 36 400kg on land, 2m and 1,000kg in the sea). Both groups are well-represented in the fossil 37 record and previously attained even more massive sizes, up to 12m and 8,000kg for the extinct 38 To evaluate temporal variation in diversification rate, we used algorithms implemented in the 154 TESS package (54), which estimate speciation and extinction rates through time across a clade 155 while allowing for shifts, as well as testing for mass extinctions. We do not specify the 156 hypothesized location of any rate shifts or mass extinctions beforehand, and the model is thus 157 agnostic to their temporal location. However, we generally hypothesize that if significant shifts 158 or extinction periods are observed, they would be localized near the K-Pg boundary. 159 Specifically, we inferred rate estimates for speciation and extinction with a Bayesian variable 160 'birth-death' model (55), conditioned on turtles and crocodilians separately. Jointly with this 161 analysis, we also searched for potential evidence of past mass extinctions using the 'compound 162 Poisson process on mass-extinction analysis' or 'CoMet' model (56). 163 We sampled ten trees from the posterior of each of the two clades and modelled a 164 variable 'birth-death processes' with explicit mass-extinction events following the guidelines in 165 (54), with the sampling fraction set to 1 and the expected survival of a mass extinction equal to 166 0.05 (95% extinction). We generated empirical hyperpriors through an initial Bayesian Markov 167 chain Monte Carlo analysis under a constant-rate birth-death-process model to determine 168 reasonable hyperparameter values for the diversification priors. We allowed up to two mass-169 extinction events and two rate change events and set the shape parameter for the mass-170 extinction prior to 100; varying these parameters yielded qualitatively similar results. After 171 burnin, we ran final analyses for 100,000 generations and diagnosed models using effective 172 sample size (>200) and the Geweke statistic. We evaluated significance of rate shifts and mass 173 extinctions using Bayes factors, mentioning instances >6 and considering >10 strong support. 174

Body-Size Evolution 176
We used the program levolution, which models the rate, strength, and phylogenetic position of 177 evolutionary jumps using a Lévy process (57) to identify saltational changes in Brownian Motion 178 regimes in a trait across a tree (58). This approach has a variety of statistical, theoretical, and 179 computational advantages over existing methods (59). These include computational simplicity 180 and reduced computing time, greater identifiability of parameters, and better theoretical fit to 181 instances of rapid evolutionary bursts punctuating apparent stasis in phenotype. We only 182 evaluated the length data (including the single imputed value for Osteolaemus sp.), as analyzing 183 the phylogenetically-imputed masses on the same phylogeny would be too circular for comfort, 184 while the observed data alone would be too sparse for confidence in the results. For details 185 about how these data were gathered see Trait Data, below and the supplement. 186 We first estimated the value of the hyperparameter a using the included dynamic peak-187 finder algorithm, which permutes a until the highest likelihood is found. We used the 188 recommended settings of 5000 sampled jump-vectors, 1000 burnin samples, and 2nd-189 generation thinning for the MCMC chains, and a starting value and step value of 0.5 (on a log 190 scale), optimized 5 times. We calculated ED of all 384 species using the 'evol.distinct (type = "fair.proportion")' function in 228 the R package 'picante' (69) taking the median value per species across the posterior 229 distribution of 10,000 trees. We then imputed the 114 DD or UA threat statuses using both 230 traditional statistical methods (26) and novel machine-learning approaches (36). First, we 231 generated linear models (17) linking the trait, range, climatic, and ecological data to status in 232 the 270 assessed species using a GLM approach, while including terms for phylogenetic and 233 spatial covariance. Continuous variables were log-transformed, centered, and scaled, while 234 categorical predictors were converted using binary one-hot (1-of-k) encoding as presence or 235 absence of each feature. Preliminary variable selection highlighted 15 significant predictors (see 236 supplement). We then used PLGM estimators to impute the missing statuses as a continuous 237 estimate from 1 to 6, corresponding to the six statuses (17, 26) . We then estimated the 238 sensitivity and specificity of this model using leave-one-out cross-validation, removing each of 239 the 270 known statuses one-by-one, re-estimating them from the remaining data, and 240 generating a confusion matrix of the known versus predicted statuses.
Second, we used the machine-learning (ML) techniques 'random forests' (RF) and 242 'artificial neural networks' (ANN), which have become widespread in ecology (70, 71) and 243 conservation biology (70, 72), as implemented in the R package 'caret' (73). Advantages over 244 approaches such as GLMs include higher classification accuracy, greater flexibility for 245 addressing problems such as regression and classification in a single modeling framework, and 246 the intrinsic ability to detect and incorporate complex non-linear interactions between 247 variables without a priori specification (70, 71). 248 For the RF models, we optimized the 'mtry' variable over 500 trees including all 28 249 categorical and continuous predictors, plus 2-dimensional PCoA estimates of phylogenetic and 250 spatial covariance, for a total of 32 predictors. Both ML techniques integrate out uninformative 251 variables and are thus not particularly susceptible to overparameterization compared to 252 classical statistical approaches such as PGLM. The RF methods are also insensitive to the 253 relative scale of input features, which are assessed sequentially and thus do not interact, so the 254 32 predictors were not transformed prior to analysis. We assessed model accuracy using 255 repeated k-fold cross-validation and assessment of relative variable-importance. 256 For the ANN models, we used feed-forward multi-layer perceptrons with a single hidden 257 layer, trained using back-propagation under a root-mean-squared-error cost function from the 258 270 species with known status and the 32 categorical variables. Predictor variables were 259 standardized on the interval [0-1] using min-max scaling. We tuned the model using repeated k-260 fold cross-validation to select the number of neurons in the hidden layer and the weight-decay 261 parameter to regularize the weights across neurons. These weights were then used to calculate 262 variable importance for input features.
Estimates from all three methods were then pooled across species, and the cross-264 correlation and pairwise identity were assessed across methods. For the final estimate of each 265 species, we took the "mean" estimate on the interval [1-6], though most methods agreed for 266 most species (see Results). Pooling the assessed and imputed statuses for all 384 species, we 267 then tested for a relationship between ED and extinction risk across categories using the notch 268 test on the associated bar-plot, and between threatened and non-threatened species using a 269 two-sample t-test. Despite the fact that threat status (or at least, many of the underlying 270 predictors) may exhibit strong phylogenetic signal, this is explicitly a non-phylogenetic test, 271 because we are solely interested in whether the correlation exists at time zero given current 272 conditions, for each species independently. 273 274

Phylogeny and Diversification 276
All data and results, including the tree topology with support values (Appendix S1), are available 277 in the Supplemental Material and DataDryad repository (to be deposited upon publication; 278 contact TJC tim@maddreptiles.com or RAP rpyron@colubroid.org for data). Our phylogeny is 279 robust overall, with monophyly of all families, subfamilies, and genera strongly supported. Of 280 particular note is the continued weak support for relationships among tortoise genera 281 (Testudinidae), as well as broader uncertainty in higher-level relationships among and 282 placement of Cheloniidae, Chelydridae, Emydidae, and Platysternidae (44, 74-76). 283 We recover a unique topology from these previous four analyses, with successive 284 divergences of Chelydridae+Dermatemydidae+Kinosternidae, Cheloniidae+Dermochelyidae, Platysternidae, Emydidae, and Testudinidae+Geoemydidae (77). This is based on the same or 286 similar underlying datasets and may reflect issues such as mito-nuclear discordance and 287 phylogenetic signal in the available loci. However, our increase in both taxon and characters 288 sampling may indicate convergence on a more robust topology. Future analyses sampling more 289 loci may be able to resolve these relationships with greater support. 290 Several turtle clades have been identified by previous authors (15,16,74) as 291 representing extraordinary instances of evolutionary diversification, including Galapagos 292 tortoises (Chelonoidis), and New World emydids (Deirochelyinae). Similar results were 293 recovered in our preliminary analyses. However, we believe strongly that these are artifactual, 294 and represent an inconsistent application of species concepts and delimitation in turtles. At the 295 current juncture, we are constrained by existing taxonomic frameworks, and these issues must 296 thus be left to future studies. However, interpretation of our downstream results (see below) 297 should be colored by knowledge of these issues. We attempt, when possible, to use analyses 298 and interpret models in a way that is resilient to the possibility of low-level taxonomic biases. 299 Results for turtles and crocodilians showed roughly constant rates of speciation and 300 extinction, with no support for any shifts therein, or any mass-extinction events (Fig. 1a). 301 Turtles exhibit speciation rates of ~0.07 lineages per million years and extinction rates of ~0.03-302 0.04, for net diversification of ~0.03-0.04 and turnover probabilities of ~43-57% from the 303 Jurassic to the present. For crocodilians, rates are ~0.05 and ~0.02 respectively, for net 304 diversification of ~0.03 and turnover of ~40% from the Cretaceous to the present. For body size, 305 few strongly-supported jumps are estimated by the model (Fig. 1b). Thus, most variation can be 306 attributed to steady drift over time. One jump occurs along the branch leading to extant 307 crocodilians, reflecting the difference in body size between the two groups, crocodilians being 308 longer and heavier on average. A second occurs in the relict species Carettochelys insculpta, a 309 large freshwater species, which is the sister lineage of the radiation of softshell turtles 310 (Cyclanorbinae and Trionychinae) that contains both large and medium sized species. Three 311 terminal species are also estimated as jumps, each of which is significantly smaller than its 312 congeners. These are Pelodiscus parviformis, Pseudemys gorzugi, and Trachemys adiutrix. As 313 noted above, these radiations and Pelodiscus have questionable species boundaries (78). Thus, 314 we refrain from interpreting these results further. 315

ED and Threat Status 317
We find a bi-modal distribution of ED values for extant turtles (Fig. 2). The primary mode is 318 17Ma, older than amphibians at 16.5Ma (28), squamate reptiles at 11Ma (43), birds at 6.2Ma 319 (23, 25), or mammals at 4.8Ma (79). A secondary mode at 5Ma is dominated by the recent 320 "radiations" in Deirochelyinae and Testudininae (see Discussion), for which multi-locus nuclear 321 datasets generally do not support the higher number of morphologically-delimited lineages. 322 Thus, we refrain from examining among-lineage rate variation for speciation or extinction. predictors. The second-ranked variable for the PGLM model was occurrence in the Oriental 328 ecoregion; for the RF model there is a tie between AET (ecology) and HEI (anthropogenic disturbance); and for the ANN model it was spatial dissimilarity, indicating the geographic 330 clustering of threat status (see below). Body size (length or mass) were not particularly 331 important, unlike for amphibians (17), birds (18), mammals (19,20), and some squamates (21, 332 82). Training accuracy for the three models was higher (PGLM = 38%, RF= 49%, ANN = 48%) than 333 random (17%) and confusion matrices showed high sensitivity and specificity (up to 86%), with 334 errors usually involving only a single step in either direction. 335 Overall, 21 species were predicted identically by all three approaches, and 69 agreed for 336 two out of the three, for ~79% concordance overall. On a pairwise basis for 342 comparisons 337 (114 species times three models), 141 were identical (41%) and an additional 144 (42%) were 338 adjacent (one category different), for a total of 83% identical or adjacent predictions. Given the 339 high level of agreement and lack of apparent bias towards higher or lower categories for any 340 model, we simply used the mean of the three predictions for our final estimate of threat status. 341 Our final estimate included 17 species classified as LC, 27 NT, 47 VU, 15 EN, and 8 CR. 342 Imputed statuses showed a similar distribution of ED to known species (Fig. S14). Qualitatively, 343 the imputed species were heavily concentrated in tropical pleurodiran lineages, particularly 344 those with many recently-described or resurrected species such as Pelusios (83) and 345 Pelomedusa (84). Nearly all of the primarily sub-Saharan African pelomedusines, and ~2/3 of the 346 primarily Australasian chelodinines were imputed. No other subfamily with more than 2 species 347 had fewer than 50% assessments. The final breakdown of assessed plus estimated threat 348 statuses used for subsequent analyses was 71 LC, 61 NT, 119 VU, 63 EN, 61 CR, and 9 EX. Thus, 349 66% of modern turtle and crocodilian species are threatened or extinct. 350 Comparing ED across threat statuses (Fig. 3), non-threatened species (LC, NT) have 351 significantly lower median ED (18Ma vs. 21Ma) than threatened species (VU, EN, CR), using a 352 two-sample t-test (t=-2.2, P=0.03). This is driven primarily by the higher median ED of VU and 353 CR taxa, while EN covers a broad range of high-and low-ED species (Fig. 2). Thus, imminent 354 extinction of threatened species would represent a disproportionate loss of total evolutionary 355 history across crocodilians and turtles, preferentially removing unique and derived lineages 356 from the Tree of Life, indicating their more-precarious stature in the conservation landscape 357 compared to other terrestrial vertebrates such as amphibians 358 Combining the ED and threat statuses to calculate EDGE scores indicating the 359 intersection of distinctiveness and risk, we highlight the top-10 at-risk turtles and crocodilians 360 ( Fig. 4; Table S1). These data also indicate that the highest-ranked turtles exceed crocodilians in 361 their combination of ED and threat. We find that our estimated ED scores are similar to those 362 from (4) when estimated directly from their phylogenetic dataset, but that a large cluster of 363 their imputed values appear to be inflated (Fig. S15). Estimated EDGE scores are more similar 364 between the two datasets, but variation in ED led previous authors to overemphasize some 365 taxa (such as sea turtles; Cheloninae) and underemphasize others (such as flapshell turtles; 366 Cyclanorbinae). Our dataset thus provides a more robust and presumably more accurate 367 picture of ED and EDGE in the group (see Table S1).  . 5a). Secondary hotspots occur in tropical western Africa, and the eastern Amazon River 374 basin. In contrast, "arks" of non-threatened species are observed in the eastern Nearctic, 375 Australia-New Guinea, eastern Africa, and the western Amazon River basin (Fig. 5b). We note 376 that the prominence of the eastern Nearctic is reduced when species richness is weighted by 377 EDGE scores (Fig. S4) (Table S1)  Consistent with their evolutionary durability and morphological conservatism in the fossil 391 record (7-12), both turtles and crocodilians exhibit roughly constant rates of speciation and 392 extinction across time, with no evidence for any significant mass-extinction events (Fig. 1a). This 393 is unlike birds (25), mammals (79), amphibians (28), and squamate reptiles (85), which show strong evidence for shifts in diversification rate or mass-extinction events coincident with the K-395 Pg boundary. Similarly, turtles and crocodilians are defined by a single-rate background 396 Brownian motion process of body-size evolution, with little evidence for significant among-397 lineage variation in rates, a few singleton species excepted (Fig. 1b). 398 Our estimate of turtle and crocodilian phylogeny is the most complete to date, with 95% 399 of extant species sampled for up to ~21kb (Fig. 2). However, it is far from the last word on the 400 subject, and we suspect that our (and previous) results are beset by confounding factors such 401 as voucher identification (75), poorly defined species boundaries (86, 87), proper identification 402 of targeted sequencing regions (88, 89), and mito-nuclear discordance (90, 91). We attempted 403 to alleviate these issues by selecting well-characterized voucher specimens, but errors may still 404 remain. We are generally confident in strongly supported relationships to the genus level, but 405 caution that some species-level relationships may reflect the problems listed above. Similarly, 406 as only 17 species required imputation, strong support for the monophyly of most genera 407 provides at least preliminary confidence for the placement of imputed species. Overall, we 408 consider the general concordance of our results with previous estimates to provide a sufficient 409 basis for broad-scale analysis of global patterns, particularly with respect to biogeography, 410 diversification, macroecology, and spatial distributions of extinction risk. 411 However, most turtle genera have not yet been subject to modern systematic revisions 412 in an integrative taxonomic framework that combines morphological and molecular data with 413 explicit criteria for delimiting species (92, 93). For instance, (87) showed that the eight 414 traditionally-recognized species of Pseudemys reduced to only three species-level lineages in 415 their molecular analyses. Similarly, (94) showed that the 14 currently-recognized species of eight or nine species-level lineages at most. These findings were confirmed (96) by authors who 418 nevertheless continued to recognize 14 species. Similar instances of mismatches in the level of 419 mitochondrial, nuclear, and morphological divergence have been noted in Trachemys (97) and 420 Cuora (86,89). In the Galapagos, nearly every island has a "species" of Chelonoidis that can 421 putatively be diagnosed morphologically, but these have exceptionally low genetic divergence, 422 and often interbreed and produce viable hybrids (98,99). Many other apparently valid species 423 of turtle (e.g., Cuora) also appear to hybridize naturally, in the wild, with great frequency ( amphibians and squamate reptiles with small ranges also tend to be threatened, but primarily 448 due to ecological and demographic effects (17, 21). For squamates, human persecution or 449 exploitation for food or commerce is generally limited to a few larger species (e.g., South 450 American tegus and African bullfrogs) and not a common threat for the majority of taxa. 451 An additional critical factor facing turtles and crocodilians is the spatial distribution of 452 their diversity; perhaps more so than for any other group of terrestrial vertebrates, "geography 453 is destiny." Birds, mammals, amphibians, and squamate reptiles exhibit massive species-454 richness in sparsely-populated tropical areas such as the Amazon and Congo river basins, the 455 Andes mountains, and the Greater Sunda Islands (61). In contrast, while there are some turtle 456 and crocodilian species in those areas, their peak species-richness (Fig. 5) is coincident with the 457 highest population-density and strongest exploitation pressures on the planet, in South and 458 Southeast Asia (37). Concomitantly, our PGLM model identifies occurrence in the Oriental 459 ecoregion (39) as the second-strongest predictor of threat status behind area. Similar effects 460 are seen in other heavily deforested and densely populated tropical regions such as western 461 Africa and the eastern Amazon river basin (108). In contrast, areas such as Australo-Papua, 462 eastern Africa, the western Amazon, and the Nearctic have lower population densities and less 463 deforestation (46) and exhibit higher richness of non-threatened species (Fig. 5b). 464 A final pattern of great importance is that, despite the high concentration of threat in 465 particular geographic regions such as the Indo-Gangetic plain (37), the distribution of species 466 with the greatest combination of ED and threat (so-called 'EDGE' species (4)) is far more 467 dispersed and idiosyncratic (Fig. S4, Table S1). This reflects the ancient history of both groups, 468 across numerous tropical and temperate landmasses and remote oceanic islands (44). The top-