### Phylogenies

We searched the literature for published species level relationships. The best available phylogenies on the basis of species coverage, inclusion of mtDNA and where possible nuclear DNA, and inclusion of branch length information, are as follows: Anseriformes (mtDNA only), 118 spp. (73 %) - [44]; Galliformes, 170 spp. (59 %) - [10]. These phylogenies are derived from Maximum Likelihood and/or Bayesian inference and together cover all families and 63 % of extant species across the two orders [45]. Each analysis was conducted in two ways to maximise taxonomic coverage and account for phylogenetic uncertainty: first, we used the consensus tree representing all species per phylogeny. Second, to examine the robustness of our results, we collapsed branches with low Bayesian probability (<=0.95) into twigs (Additional file 1: Figure S1-S2). In Anseriformes, this resulted in 33 species being collapsed into four twigs (*see* Additional file 1 for further detail). In Galliformes, collapsing low probability branches resulted in 78 species being collapsed into twelve twigs.

We collected plumage pattern information from each species (nominate subspecies where applicable) represented in these phylogenies from museum skins at the Natural History Museum at Tring and the University Museum of Zoology, Cambridge.

### Data collection and coding

Current developmental evidence suggests that the default plumage phenotype in males and females in Anseriformes and Galliformes is the male plumage, in the sense that estrogen is required to produce female plumage but that testosterone is not required to make male plumage [38]. Therefore, we collected plumage pattern information for the seven patches of plumage over the body for the males of each sample species (Fig. 2). We assigned the character state of each of the seven feather patches as scales, bars, spots, or an absence of patterns, following the description by Prum and Williamson [13]. Variation in scoring plumage patterns between ten ornithologists is modest [43]. In the study reported here all plumage patterns were scored by T-LG. Some species exhibit what appear to be longitudinal stripes along feathers, but on closer inspection are an angular version of scales with a central pigment patch, and were scored as scales, e.g. the breast and nape plumage of the vulturine guineafowl (*Acryllium vulturinum*) (Fig.1a
*in* [13]). A small number of species in this study have chevron patterns – Anseriformes – 2 *spp.*, Galliformes – 5 *spp.*. Given that chevrons are rare in these orders and that they are similar to patterns made of bars, in that the borders do not meet to create a central pigment patch, we scored chevrons as bars (Fig. 1e and 6e *in* [13]).

For most species sampled, the type of within-feather patterning across the vane of each feather, as well as between individuals of the same species, was the same for each patch considered. However, in a rare number of cases there was variation between individuals. To focus on the most developmentally relevant patterns in these rare cases we recorded the pattern that covered the majority of the feathers, in the patch under consideration, and where relevant, the predominant pattern in the majority of individuals sampled. For example, in the Natal francolin (*Pternistes francolinus*), the feathers in the flanks can have both bars and scales (Fig. 2). In the example depicted, bars cover the majority of the feathers in the flanks, and this individual would have been assigned as having bars. However, in most individuals of the sample population of the Natal francolin, scales predominantly covered most of the feathers in the flanks, and were considered representative of this species.

An additional type of pattern, mottled plumage, is present in many birds. It is currently unknown whether all mottled patterns can be considered homologous, or whether they may be classified into discrete types based on the size, shape and distribution of pigmentation across the vane of the feather. Therefore, mottled plumage was scored as unknown. In the tail of Anseriformes, only one type of plumage pattern has evolved which is only exhibited in 7 species that are in derived clades (Additional file 1: Figure S1). Therefore, we removed the tail patch from the analysis.

In the analyses where branch lengths with a low probability were removed, the twigs were coded as having all of the pattern states of each species represented in the removed branch. For example, if the species in the branch that was collapsed into Twig X had spots and bars, Twig X was scored as having both pattern types.

To investigate whether within-feather pattern evolution in one patch of plumage may precede and/or promote evolution of patterning in other patches, we conducted further analyses over the whole body (Fig. 2). Where a species has multiple types of plumage patterns over the body, the pattern that had evolved most recently across all patches (as indicated by the summary model of local evolution) was considered representative. For example, if from an absence of patterns evolved bars followed by spots, species that contained both bars and spots were coded as having spots for analyses of the whole body model. In Anseriformes, in the summary model of local evolution within patches there is no conflict in the order of transitions and the order of evolution is clear in models derived from the all species phylogeny. However, in the robust analysis scales or bars could have evolved first. In Galliformes, bars evolve from an absence of patterns and the next pattern to evolve from bars could be either scales or spots in both the unmodified and modified phylogenies (*see* Results).

We took this uncertainty into account by examining each possible trajectory for comparison. For example, males of the satyr tragopan (*Tragopan satyra*) have scales on the flanks and vent, and spots on the breast. In the analysis of the whole body in Galliformes, where plumage patterns over the whole body is collapsed into one character, we compared whether assigning either scales or spots as the character that had evolved last created conflict in the analysis. Similarly, in the robust analysis for Anseriformes, we compared whether assigning either scales or bars as the character to evolve first created conflict in the analysis.

### Modeling of plumage pattern evolution

We modeled plumage pattern evolution over the phylogenies of each order to estimate the evolutionary transitions between patterns, allowing us to derive a model of the probable evolutionary pathways between plumage pattern phenotypes. Anseriformes and Galliformes live in different habitats, which may alter the evolutionary trajectory of each order [25, 26]. Therefore, we examined each order separately to assess for similarity and differences in their evolutionary history. To estimate plumage pattern evolution in each order, we used the Reversible Jump Markov Chain Monte Carlo Multistate option in BayesTraits v.2 [36, 46].

Markov Chain Monte Carlo (MCMC) is based on the proposition that traits can repeatedly evolve between any possible state on any branch of the tree. To estimate the rate of change between states, the Markov chain samples the plumage patterns at the internal nodes of the tree, in proportion to their probability, which is conditioned on the values at the tips. The rate of change between states was allowed to vary over each transition. New rate parameter values are proposed in successive steps in the Markov chain resulting in a posterior sample distribution of rate coefficients and ancestral states. The rate coefficients of each model of pattern evolution is visited in direct proportion to its posterior probability in the sample distribution [36]. Given that there are four pattern states, which in turn offer many parameters that describe evolution between plumage patterns, we used Reversible Jump MCMC (RJMCMC).

RJMCMC integrates rate restrictions by searching the posterior distributions of model parameters to avoid over parameterization. As such, we allowed BayesTraits to propose transition rates of plumage pattern evolution without restriction (e.g. we did not constrain any rate parameters to equal 0 based on *a priori* predictions) thereby making the analysis conditional on the data rather than our hypothesis [42]. For example, we hypothesized that the greatest number of steps are required to evolve spots as a consequence of having the strictest developmental parameters and therefore do not evolve directly from an absence of patterns (Fig. 1). In transition rate models that support this hypothesis, a rate parameter between an absence of patterns and spots equals 0, and therefore does not occur. This allows both incremental and non-sequential changes to occur in any direction and avoids imposing potentially false hypothesis based predictions.

Potential models of plumage pattern evolution visited by the Markov chain are distinct from the most probable model of plumage pattern evolution. The former describes the proposed models of plumage pattern evolution that make up the posterior sample distribution, whereas the latter is derived from statistically evaluating the posterior sample distribution. Each model of plumage pattern evolution is composed of a unique combination of transition rate parameters with values fixed to zero or are sampled as free parameters with positive values. Rate parameters fixed to zero were interpreted as an evolutionary transition that does not occur, and free rate parameters with a positive value were interpreted as evidence for an evolutionary transition that does occur. Therefore, qualitatively, each unique model of plumage pattern evolution is composed of transitions that do not occur, and transitions that do occur.

Null model testing and model comparisons were conducted by assessing the posterior distribution of unconstrained models. If there were no developmental constraint, such as where natural selection drives plumage patterns to evolve in any direction, forward and backward evolutionary transitions between all pattern states would occur – the full (null) model (Fig. 1). Therefore, if plumage pattern evolution is random, the full model would be visited more frequently than expected by chance. Conversely, if sequential or non-sequential evolutionary transitions were more probable, then models with these transitions would be most probable. In assessing the models of evolution without constraining any transitions, each unique model of pattern evolution is compared with every other possible model of pattern evolution (statistical methods are described in the next section).

In BayesTraits we modeled the rates of plumage pattern evolution using a hyperprior with a gamma distribution defined by an empirical Bayes estimator [36]. For each analysis, we discarded the burn-in. The Markov chain was allowed to run for an infinite number of iterations and was terminated when convergence was reach across four independent runs (<1 lnHM). The number of chains required to reach convergence varied between patches of plumage (Additional file 1: Table S1). After convergence was reached, we checked the posterior sample distribution for autocorrelation. Where autocorrelation was present, the posterior sample distribution was reduced. The average rate of transition for each patch of plumage and over the whole body, are presented in the online appendices.

### Model priors and modeling parameters

The prior density on the free transition rate parameters were estimated using an empirical Bayes estimator (where the interval of the hyperprior is defined by the average and standard deviation of the maximum likelihood of all rate parameters) to reduce bias and uncertainty in choice of priors [36]. We used a hyperprior approach with a gamma distribution as our empirical Bayes estimator values had an intermediate range. The intervals were estimated for each analysis, for each patch of plumage or the whole body, in each group separately. For the analysis of independent evolution within patches of plumage, in each phylogeny, we sampled every 100,000th generation (7 × 2 = 14 individual analyses). The first 500,000 generations of RJMCMC (burn-in) were discarded to ensure parameter space was sufficiently explored.

Each analysis of separate plumage patches and the global model, per order, was run four times to ensure convergence had been reached within analyses as indicated by a stable harmonic mean that varied by <1 lnHM across all four runs. We checked for autocorrelation using the Box-Ljung test statistic in SPSS v22 at lag 1 (IBM Corp.). A Ljung-box *P* > 0.05 was interpreted as indicating no autocorrelation (Additional file 1: Table S1). There was autocorrelation in four models (all species - Anseriformes: wing and rump, Gallifromes: belly; robust – Anseriformes: rump) and we thinned the posterior sample distribution of these models of plumage pattern evolution to every 10th iteration (1,000,000th model), preserving the order in which the models were visited. This resulted in a posterior sample distribution of ~20,000-53,000 per model.

### Statistical analysis

The most probable models of plumage pattern evolution, each with their own most probable ancestral state of patterning, are visited in proportion to their Bayesian posterior probability. Given that there are four states of patterning in this analysis, each model has twelve possible evolutionary transitions. Each transition between each type of pattern can have a transition rate that is above zero (occurs) or a rate of zero (does not occur). To account for the effect of varying numbers of zero and non-zero transitions on the probability of each model of pattern evolution we calculated the prior odds of each model using binomial numbers for transitions that do *not* occur, and bell numbers for transitions that occur, combined (Additional file 1: Table S1 & S2**;**
*see* [47] for a detailed explanation of calculations used). The posterior odds were derived from the posterior sample distribution, and compared with the prior odds using Bayes Factors.

There can be multiple probable models of evolution in the top model set. As a consequence there is uncertainty in the most probable model of plumage pattern evolution. Therefore, we treated the analysis of the posterior sample distribution of models in a multiple-model framework using multimodel inference [37, 48, 49]. Similar approaches are used in multimodel inference using AIC. However, AIC is not philosophically equivalent to Bayesian modelling. Instead we used BayesFactors to rank our competing models. To derive a top model set we used a threshold of a BayesFactor of > =2, which is considered positive evidence [48, 50].

The ancestral state of plumage as well as rate parameters of each unique model of pattern evolution can vary widely in whether they are fixed to zero, or sampled as free parameters with positive values. Therefore, we calculated the marginal probability (MP) per ancestral pattern and of each transition parameter occurring, or not occurring. To account for uncertainty, we calculated MP from the entire posterior sample distribution. For example, the MP of each type of pattern being the ancestral state in the top model set = the number of models in which this pattern is ancestral/n total models. For each individual transition parameter in each unique model of plumage pattern evolution in the top model set MP = the number of models in which this transition parameter occurs/n total models, and MP = the number of models in which this transition parameter does not occur/n total models.

The final marginal probability was calculated by cumulatively adding the MP of each model in the top model set for each ancestral state of patterning and for each evolutionary transition where it does *not* occur, as well as where it occurs, for comparison [46]. For example, in the breast of the galliform birds, the marginal probability (MP) of an absence of patterning *not* being the most probable ancestral state is 0.00 in the top model set whereas the MP of an absence of patterning *being* the most probable ancestral state is 0.89 (e.g. Table 1). In addition, the MP of scales, bars and spots *not* being the most probable ancestral state is 0.89 versus 0.00 of *being* the ancestral state. Assessing a transition from an absence of patterns to spots, the MP of the transition rate parameter describing it as *not* occurring is 0.87 and its MP of being non-zero is 0.01 (Fig. 5). Together this shows that an absence of patterns is most probably the ancestral state in the breast of galliform birds, and a transition from an absence of patterns to spots most probably does *not* occur. The MP in the top model set accounts for variation in the entire posterior sample distribution, therefore the sum of the MP of a transition *not* occurring and occurring rarely equals 1 as this requires every model in the posterior sample distribution to have the same result for that transition.