A (hopefully) constructive crit of the paper. Darren Patrick Martin, UCT 20 October 2009 Second paragraph of the background: Reference 15 is not rock solid: The Greenland ice-core sequences that are occasionally used to demonstrate slow rates of tobamovirus evolution are potentially/probably contamination artifacts – see Michael Worobey 2008 Phylogenetic Evidence against Evolutionary Stasis and Natural Abiotic Reservoirs of Influenza A Virus. J Virol 82:3769-3774 http://jvi.asm.org/cgi/content/full/82/7/3769?view=long&pmid=18234791 Paragraphs 3&4 of the background are very misleading. Although they have DNA genomes it is clear that geminiviruses have basal mutation rates that are very RNA-virus like (this is actually mentioned later in the text). It is unclear therefore why it is implied that codivergence hypotheses are more credible for geminiviruses than they are for RNA viruses. Paragraph 5 of the background line 1: But these viruses probably don’t have basal mutation rates consistent with high fidelity replication. Paragraph 5. line 5: The fairly broad host ranges of mastrevirus species (including WDV) needs to be reconciled with the codivergence hypothesis. Although I think it unlikely, I concede that it is possible that host-virus codivergence could have occurred despite the very high basal mutation rate and short term nucleotide substitution rates that have been measured for various geminiviruses. However, I think that the main problem with the hypothesis as it currently stands is that these viruses have fairly broad host ranges. The main question is how can these host ranges be reconciled with host-virus codivergence? Are the viruses codiverging only with a singe preferred host? Results Phyloegnetic analyis of viruses paragraph 2 It is of some concern that evolutionary distances seem to have been confused here with genetic/hamming/p-distances – i.e. evolutionary distances appear to have been subtracted from 1 and presented as similarity/identity %ages. For example whereas it is reported that “Nucleotide sequence identities were < 69% between wheat and barley isolates” the actual degree of similarity between the WDV and BDV sequences is ~84% (calculated with Mega 4 using 1-(p-distances) with pairwise deletion of gaps). Given this calculation error I’m not sure how reliable the regression analysis of host-virus genetic distances is. Table 1 It is also potentially problematic that the MSV-WDV genetic distance is used in the regression. It is extremely difficult to accurately align MSV and WDV sequences. This problem is perhaps compounded by the fact that the MSV-WDV alignment used in this paper was done by eye (I may be wrong here). The problem is that one cannot tell how credible the regression analysis is without some clarity on how the numbers used in the analysis were determined. Having said this, I must admit that I am also a bit staggered about how perfect the recorded correlations are. High medium and low estimates of divergence times (how were these bounds estimated?) are regressed with one another (the viral estimates assume that the MSV and WDV split was 100 time units ago and that there has been a molecular clock at play across the entire mastreviridae - the diversity covered here spans the entire genus). It is not just that the correlation of the medians/means is perfect but the correlations of the confidence intervals are also nearly perfect. I’m not implying that I think there has been intentional data manipulation – I’m simply concerned that there may have been a degree of cherry picking involved in getting this result – i.e. It would be interesting to know how many other viruses besides MSV were tested as an outgroup? Also what would the results have been if other MSV hosts were used other than maize? I bet the correlation would have been pretty rotten if wheat or some other non-PACCAD group species was used as the MSV host. MSV can happily survive on wheat and some MSV strains seem particularly well suited to infecting rye, wheat, barley and oats grown in Africa. For example, the MSV-B strain is especially prevalent in rye and wheat. Was the PACCAD split chosen because it gave the best result? Also see my comments below on paragraph 4 of the discussion. Dating divergences. In a recent paper evaluating a 32 year old experimental mastrevirus (sugarcane streak reunion virus) population (a paper I coauthored: Experimental evidence indicating that mastreviruses probably did not co-diverge with their hosts. Harkins GW, Delport W, Duffy S, Wood N, Monjane AL, Owor BE, Donaldson L, Saumtally S, Triton G, Briddon RW, Shepherd DN, Rybicki EP, Martin DP, Varsani A.Virol J. 2009 Jul 16;6:104.) we showed that the entire depth of Chinese WDV diversity is similar to that expected to be generated during ~30-40 years of SSRV evolution. It seems very unlikely therefore that, even if mutations are only tolerated at a small subset of genome sites, the Chinese WDV isolates have a 1.5million year old most recent common ancestor. The only way to reconcile this 1.5MY age and the experimental data is that the two lineages have either vastly different basal mutation rates (something which actually might be possible) or vastly different nucleotide substitution rates (i.e the rate at which mutations become fixed which also might be possible). Invoking either of these possibilities, however, damages the credibility of the divergence time linear regression analysis as it would imply that the molecular clock assumption of the analysis is seriously violated (MSV is a sister species of SSRV and also has a very high experimentally determined basal mutation rate/short term substitution rate). Mastrevirus sequence alignment Paragraph 3. It is not at all surprising that all of the genes display some degree of negative selection – this is the case for >95% of genes you might care to perform such analyses on. Importantly, the fact that there is evidence of an imbalance of synonymous substitutions implies that synonymous substitutions are tolerable. That a substantial number of sites may be free to evolve is, ironically, something which the amino-acid selection data is used to strongly dispute later in the discussion. Discussion Paragraph 2: It is unclear to me how the part about recombination supports the hypothesis that mastreviruses have codiverged with their hosts? Paragraph 4: It is not only the “PACCAD infecting” viruses that have been named after the hosts from which they were isolated – WDV, BDV and ODV have all been named (and tested here) based on the cultivated hosts that they have been isolated from. While these may be the natural hosts of WDV, ODV and BDV, I suggest that this is far from proven. This is an important point that the authors seem to think is irrelevant to their main analyses. Also, what would the results have indicated if wheat, rye, outs or barley had been used as the host of MSV? The last sentence of paragraph 4 (“In contrast, the BEP clade viruses, WDV, BDV and ODV specialize in infecting their respective host plants and thus likely have evolved entirely in the lineage for which they are named”) is seriously problematic – the reference given in support (to Schubert et al., 2008) provides absolutely no supporting evidence for the statement. There is in fact almost no available evidence to either support or reject the hypothesis that WDV, BDV and ODV are specialists at respectively infecting what, barley and oats. I know of no one other than people in the Kvarnheden group who are even looking for WDV-like viruses in weed species – See Ramsel et al 2008 for evidence that WDV can in fact be found infecting non-cultivated species. The truth is that nobody is even close to knowing what the preferred hosts are of these “BEP clade viruses.” The last paragraph of the discussion There is now substantial evidence that over 10’s of years MSV and SSRV are evolving at rates similar to those observed for begomoviruses. The argument about only a small fraction of sites evolving at such high rates may/may not be sound but this argument is certainly not supported by the observation that there is low genetic diversity amongst the WDV isolates – there could be low degrees of genetic diversity amongst WDV isolates because they all share a most recent common ancestor that is only ~100 years old. I do not completely understand why selection on AMINO ACIDS (what is measured by dN/dS ratios) is offered up as evidence that only a very small number of NUCLEOTIDE sites are free to evolve – such selection is simply indicative of there being enough neutral synonymous substitutions that the non-synonymous ones seem a bit unusual. The key thing though is that selection at the amino acid level would still enable loads of synonymous mutations and neither supports nor contradicts the codivergence hypothesis. The parting reference to the tobamoviruses is also unfortunate – it is far from certain whether the ice core data (upon which this statement is based) was sound. Even if it was sound though, the ice core data would imply that there are absolutely no sites within tobamoviruses that are free to evolve – i.e. some of the sequences that are supposedly over 100K year old are 100% identical to contemporary sequences (which, strangely are not themselves all also 100% identical). Finally, my main reservation about the codivergence hypothesis is not simply that observable mutation and substitution rates imply that codivergence is improbable/impossible – I sincerely think that there may be discernable signals of host-virus or virus/vector codivergence in amongst all the most conserved nucleotides of mastrevirus genomes. My main reservation is that I cannot imagine a situation in which a virus that is capable of and happy to infect multiple host species would codiverge with only one of these species. Competing interests I Coaouthored a paper in Virol J. suggesting that mastreviruses are probably not codiverging with their hosts. Experimental evidence indicating that mastreviruses probably did not co-diverge with their hosts. Harkins GW, Delport W, Duffy S, Wood N, Monjane AL, Owor BE, Donaldson L, Saumtally S, Triton G, Briddon RW, Shepherd DN, Rybicki EP, Martin DP, Varsani A.Virol J. 2009 Jul 16;6:104.