Cospeciation of coronavirus and paramyxovirus with their bat hosts in the same geographical areas

Bat-borne viruses are relatively host specific. We hypothesize that this host specificity is due to coevolution of the viruses with their hosts. To test this hypothesis, we investigated the coevolution of coronavirus and paramyxovirus with their bat hosts. Published nucleotide sequences of the RNA-dependent RNA polymerase (RdRp) gene of 60 coronavirus strains identified from 37 bat species, the RNA polymerase large (L) gene of 36 paramyxovirus strains from 29 bat species, and the cytochrome B (cytB) gene of 35 bat species were analyzed for coevolution signals. Each coevolution signal detected was tested and verified by global-fit cophylogenic analysis using software ParaFit, PACo, and eMPRess. Significant coevolution signals were detected in coronaviruses and paramyxoviruses and their bat hosts, and closely related bat hosts were found to carry closely related viruses. Our results suggest that paramyxovirus and coronavirus coevolve with their hosts.


Background
Bats are reservoirs of many zoonotic viruses, such as members of Filoviridae (e.g., Ebola and Marburg viruses), Paramyxoviridae (e.g., Hendra and Nipah viruses), and Coronaviridae (e.g., severe acute respiratory syndrome coronavirus, Middle East respiratory syndrome coronavirus, severe acute respiratory syndrome coronavirus-2) [1,2]. Bats live in a wide variety of environments with various feeding habits. They are flying mammals and are effective vehicles for spreading viruses [3].
Coronaviruses are taxonomically placed in the subfamily Coronavirinae under the family Coronaviridae (International Committee on Taxonomy of Viruses). Bats coronaviruses have been shown to be responsible for the outbreaks of severe acute respiratory syndrome (SARS) in 2002-2003, Middle East respiratory syndrome (MERS) in 2012 [4,5], and probably the current COVID-19 [2]. They cause serious respiratory and instetinal symptoms with substantial mortality rates [6].
Some paramyxoviruses such as Hendra virus (HeV) and Nipah virus (NiV) are highly pathogenic zoonoses. Both viruses belong to the genus Henipavirus of the family Paramyxoviridae (International Committee on Taxonomy of Viruses) and have been detected in flying fox bats (Pteropus spp.) [7,8]. Henipavirus causes severe symptoms associated with high mortality rates in humans and livestocks. HeV was first detected in Queensland, Australia in 1994, causing acute respiratory disease and febrile illness in horses and humans who have close contact with sick horses [9]. NiV was first detected in Malaysia in 1999 during the outbreak of encephalitis and respiratory illness in pig farmers. In the past few years, sporadic outbreaks of HeV and NiV have occurred in Oceania and Southeast Asia [10][11][12].
Both coronaviruses and paramyxoviruses have a certain degree of host specificity. Host-parasite specificity has also been observed in malaria parasites [13], bat flies [14], and bacteria [15]. We hypothesized that the host specificity of coronaviruses and paramyxoviruses is a result of co-speciation with their hosts. To test this hypothesis, we compared the nucleotide sequences of the cytochrome B (cytB) gene of 61 bat species with those of the RNA dependent RNA polymerase (RdRp) gene of 60 coronavirus strains and the RNA polymerase large (L) gene of 36 paramyxovirus strains.

Phylogenetic relationship of bat coronaviruses with their bat hosts
Phylogenetic analyses revealed that the bat coronavirus RaTG13/CN/MN996532 from Rhinolophus affinis found in Yunnan Province, China is on the same branch of the phylogenetic tree as the 18 human severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains examined, suggesting that it is a relative of SARS-CoV-2 ( Fig. 1). The following 18 bat coronavirus strains are clustered on the same branch of the phylogenetic tree as the six human severe acute respiratory syndrome coronavirus (SARS-CoV) analyzed, suggesting that they are phylogenetic relatives: Rf1 (Fig. 1).

Coevolution of bat coronaviruses with their bat hosts
Analyses of all bat cytB gene sequences and all bat coronavirus RdRp gene sequences as a whole by the Global test function of software ParaFit and PACo showed evidence of coevolution between bat coronaviruses and their bat hosts (ParaFitGlobal = 390.8896, P = 0.001; m 2 global value = 57.136, P ≤ 0.001). When each individual sequence was analyzed by the Individual host-parasite (H-P) link test function of ParaFit, 51 of the 60 bat coronavirus strains were found to have a significant coevolution relationship (link) with their bat hosts with a Par-aFit1 or ParaFit2 P value ≤ 0.05.
Bat coronavirus strains examined in this study are divided into alpha and beta groups (  (Fig. 2). This result suggests that some coronaviruses have a less stringent host specificity than others.
In this study, bat coroanviruses that are related to human SARS-CoVs are referred to as SARSr-CoVs, and those that are related to human MERS-CoVs are referred to as MERSr-CoVs. These viruses are clustered on the betacoronavirus branch of the phylogenetic tree ( Fig. 1). Most bat SARSr-CoVs are found in members of the Rhinolophidae bat family, and most bat MERSr-CoVs are derived from members of the Vespertilionidae bat family. Strains CMR66/CMR/MG693170, HKU9-1/CHN/ EF065513, and GCCDC1-356/CHN/KU762338 with 78.03-96.24% RdRp gene sequence identity are clustered together on the same branch of the phylogenetic tree and are all from members of the Pteropodidae bat family (Fig. 2).

Coevolution of bat paramyxoviruses with their bat hosts
Analyses of all bat cytB gene sequences and all paramyxovirus RNA polymerase large (L) gene sequences as a whole by the Global test function of ParaFit and PACo showed evidence of coevolution between bat paramyxoviruses and their bat hosts (ParaFitGlobal = 874.11, P = 0.049; m 2 global value = 15.49537, P = 0.015). When each individual sequence was analyzed by the Individual H-P link test function of ParaFit, 7 of the 36 bat paramyxovirus strains were found to have a significant hostparasite coevolution relationship (link) with a ParaFit1 or Parafit2 P value ≤ 0.05.
In this study, we classified unidentified bat paramyxoviruses into four groups PG1-PG4 according to their host speficity.  (Fig. 3).
The bat hosts of PG1 paramyxoviruses are distributed mainly in Africa and Asia, and those of PG2 paramyxoviruses are mostly living in Africa. PG3 paramyxoviruses are mostly found in bats in South and North America. The bat hosts of PG4 paramyxoviruses are distributed in Asia, Africa, and Europe. The bat hosts of Pararubulavirus and Hennipahvirus are found in areas from Asia to Oceania (Fig. 3).

Event-based cophylogeny
To confirm the coevolution of coronaviruses and paramyxoviruses with their bat hosts, analyses with the software eMPRess were performed. Results of such analyses refuted the null hypothesis that the host and virus trees and tip associations are formed due to chance at 0.01 level in both host-coronavirus and host-paramyxovirus relationships (P-value = 0.0099). Therefore, we, concluded that these host-virus pairs have coevolved (Fig. 4).

Disccusion
In this study, we investigated the coevolution relationship between bats and their viral parasites: coronaviruses and paramyxoviruses. These two groups of viruses were chosen because they have been shown to be zoonotic [17]. The sequences of the RNA-dependent RNA polymerase (RdRp) gene of 60 bat coronavirus strains, the RNA polymerase large (L) gene of 36 paramyxovirus strains, and the cytochrome B (cytB) gene of 61 bat species were used to build phylogenetic trees. Cophylo analyses were then performed to determine the relationship between bat genetic trees and those of coronaviruses and paramyxoviruses. Event-based eMPRess test and ParaFit and PACo Global and ParaFit Individual H-P tests were also performed. In eMPRess and Global ParaFit and PACo tests, both groups of the viruses were found to have a significant coevolution relationship with their bat hosts. In Par-aFit Individual H-P test, 51 (85%) of the 60 coronavirus (See figure on next page.) Fig. 2 Tanglegram of cophylogenetic relationship between bat hosts and coronaviruses. Black lines denote significant coevolution links between coronaviruses and their hosts (ParaFit tests P ≤ 0.05), and gray lines denote non-significant links. Different groups of coronaviruses and bat species with significant coevolution links are marked with boxes in different colors. Information on host geographical distribution was derived from Simmons (2005)  The construction of cophylogenetic trees revealed that closely related bat coronaviruses are found in closely related bat species (Fig. 2) (Fig. 3).
Results of ParaFit Individual H-P test showed that 85% (51/60) of coronavirus strains but only 19% (7/36) of paramyxovirus strains have a significant coevolution relationship with their bat hosts. Since significant coevolution was found in both groups of the viruses by eMPRess and ParaFit and PACo Global tests, this low positive individual H-P link rate in paramyxoviruses may be due to small sample size.
We postulate that that evolutionary relationship and close habitat of bat species contribute to inter-species transmission of viruses. One observation supporting this hypothesis is that coronavirus strains 206645-40/IT/ MG596802 (bat host: Hypsugo savii) and 206645-63/IT/ MG596803 (bat host: Pipistrellus kuhlii) share 99.46% nucleotide sequence identity in their RdRp gene and are found in bats living in the same geographical area, Italy. In addition, coronavirus strains 16BO133/ROK/ KY938558, JL2012/KJ473811, and JTMC15/KU182964 are highly related with > 99.6% nucleotide sequence identity in the RdRp gene and are found in bats distributing in areas near each other, including Jilin Province, China (for isolates JL2012/KJ473811 and JTMC15/KU182964) and South Korea (for isolate 16BO133/ROK/KY938558) that is close to China.
Most bat SARSr-CoVs are from Rhinolophus bats, but strains As6526/CHN/KY417142 and Yunnan2011/CHN/ JX993988 are found in Aselliscus stoliczkanu and Chaerephon plicatus, respectively. Although these two bat species are phylogenetically far apart, they live in the same geographical area, Yunnan Province, China. This observation suggests that distantly related bats in the same geographical location may harbor the same viruses, leading to inter-species transmission of the viruses.
Horseshoe bats are widely distributed in Europe, Asia, Australia, and Africa [18][19][20][21] and are potential reservoirs of epidemic coronaviruses [22]. The strain BM48-31/BGR/GU190215 was found in Rhinolophus blasii and clustered with other SARSr-CoVs in the coronavirus coevolution tree (Fig. 2). Its relationship with human and civet SARS-CoVs is farther than that of other SARSr-CoVs (Fig. 1), suggesting that they diverged from the same ancestor. Results of previous studies suggest that Rhinolophus bats are the natural hosts of human SARS-CoVs [23][24][25]. Many bat SARSr-CoVs are detected in Rhinolophus sinicus and Rhinolophus ferrumequinum that distributed in Asia, Africa, and Europe. However, SARSr-CoVs that are highly related to human SARS-CoVs have not been found in Rhinolophus ferrumequinum that live in Africa and Europe, probably due to geopgraphical isolation from SARSr-CoVs that are mainly found in Yunnan Province, China [26].
As Rhinolophus sinicus bats are found only in China, Nepal, Vietnam, and North India [36] and SARSr-CoVs are mainly found in Yunnan Province, China, SARS-CoV outbreaks have not occurred in other places such as Europe, Africa, Oceania, and America. SARS-CoV-2 is responsible for the COVID-19 pandemic (International Committee on Taxonomy of Viruses). In this study, the strain RaTG13/CHN/MN996532 is found to be very close to SARS-CoV-2 with > 97% nucleotide sequence identity in the RdRp gene and 96% identity at the whole genome level. There are approximately 1100 bases that are different between the genomes of RaTG13/MN996532 and various strains of SARS-CoV-2 suggesting that RaTG13/MN996532 requires at least one intermediate host to transmit to humans [2]. As the host of RaTG13/MN996532 is Rhinolophus affinis residing in Yunnan Province, China, it has been speculated that SARS-CoV-2 is derived from Rhinolophus bats roosting in areas near Yunnan Province, China, such as Southwest China, Myanmar, Laos, Vietnam, and other Southeast Asian countries [37].
Divergence of bats can be traced back to tens of million years ago [38,39]. It has been estimated that coronaviruses diverged tens of thousand years ago [40]. This difference may be due to the fact that the genome of coronaviruses is RNA that is more prone to mutations than DNA. It has also been estimated that coronaviruses have been infecting birds and bats for tens of million years, thus providing the opportunity for coevolution with their hosts [41].

Conclusion
We have found evidence suggesting that both coronavirus and paramyxovirus coevolve with their bat hosts. Understanding the coevolutionary patterns of these viruses with their hosts will allow a better prediction of transmission between bats and humans.

Phylogenetic analysis
The database of Bat-associated Viruses (DBatVir, http:// www. mgc. ac. cn/ DBatV ir/) [42] contains information on various bat-associated viruses, including genome size, lengths of identified genes, date and place (city and country) of isolation, names of the viruses, and Gen-Bank accession numbers of nucleotide sequences of the entire genome or individual genes. With the "browse by virus" function, 60 coronavirus and 36 paramyxovirus strains with all aforementioned information available were found. As this database does not contain actual The optimal reconciliation cost of the coevolution trees is indicated with a red line, and the optimal cost of the same trees constructed with tip associations permuted at random is shown in blue columns sequences, nucleotide sequences of selected viral strains were downloaded from the GenBank. These sequences included those of the RNA dependent RNA polymerase (RdRp) gene (2734 bp) of the 60 coronaviruse strains and the RNA polymerase large (L) gene (559 bp) of the 36 paramyxovirus strains. The 60 coronaviruses were derived from 37 different bat species, and the 36 paramyxoviruses were derived from 29 different bat species. Among them, 5 bat species including Myotis daubentoniid, Eidolon helvum, Hipposideros abae, Hipposideros ruber, and Hipposideros pomona were found to harbor both coronavirus and paramyxovirus. Altogether, 61 different species of bats were identified to be the hosts of the coronavirus and paramyxovirus strains examined in this study. As the cytochrome B (cytB) gene is one of the most conserved gene in bats, its sequence was used for coevolution analyses. The cytB sequences of 59 of the 61 bat species were downloaded from the GenBank. Since the cytB gene sequences of Miniopterus pusillus and Tylonycteris robustula, that were the hosts of coronavirus strains 1B/ CHN/EU420137 and HKU33/CHN/MK720944, respectively, were not available, they were determined in this study. These two bat species were captured from Menghai, Yunnan and Kau O Bat Cave, Macau, respectively. Anal swabs of Miniopterus pusillus and a small portion of the patagiums of Tylonycteris robustula were obtained. The captured bats were released back to their roosts after sampling. DNA was isolated from these samples and used as the template for amplication by polymerase chain reaction (PCR) of a portion (1140 bp) of the cytB gene with primers L14727ag (5′-ATG ATA TGA AAA ACC ATC GTTG) and H15915ag (5′-TTTCCNTTT CTG GTT TAC AAGAC) [43]. PCR conditions were as follows: 94 °C for 3 min, followed by 20 cycles of 94 °C for 20 s, 46 °C to 52 °C (+ 0.3 °C/cycle) for 30 s, and 72 °C for 90 s and 30 cycles of 94 °C for 20 s, 60 °C for 30 s, 72 °C for 90 s and then maintained at 72 °C for 10 min. The PCR products were sequenced, and the sequences thus obtained have been deposited in the Genbank with accessing numbers MN366287 and MN366288.

Comparison of host and virus phylogenies
To determine the degree of congruence between bat and virus topologies on phylogenetic trees and identify individual associations contributing to the cophylogenetic relationship the software ParaFit was used [47]. ParaFit tests the null hypothesis that the evolutions of a clade of hosts and a clade of parasites are independent. ParaFit has two types of tests: a global test of coevolution and a test on each host-parasite (H-P) link. The matrices of patristic distances used in global-fit analysis were calculated from the maximum likelihood trees of host and virus phylogenies using the "cophenetic" function of the software package ape in R 3.6.0 [48,49]. ParaFit analyses were also performed in R with 999 permutations for both global and individual H-P link tests. Each individual host-virus link was considered as significant when its ParaFit 1 or Parafit 2 P-value was ≤ 0.05 [15].
To obtain comparable global goodness-of-fit statistics with Parafit global values, the software package Procrustean Approach to Cophylogeny (PACo) [50] in R was used in conjunction with packages ape and vegan [51]. PACo determines the dependence of one phylogeny upon the other and produces a Procrustes superimposition plot for a graphical assessment of the fit of the parasite phylogeny onto the host phylogeny and a goodness-of-fit statistic. As ParaFit values may be variable, virus matrix in PACo was rotated and scaled to fit the host matrix in order to evaluate the dependence of parasite phylogeny upon host phylogeny. A goodness-of-fit test based on 1000 randomizations was used to assess the significance of such dependence. The associated squared residuals were used to assess the significance of coevolution of each host-virus link [52]. Cophylogenetic trees were generated using the "cophylo" function of the R package phytools [53].

Even-based cophylogeny
The event-based program eMPRess [54] was used to determine whether the pairs of coronavirus, paramyxovirus, and their hosts coevolve with each other. eMPRess is a tool for reconciling pairs of phylogenetic trees based on the Duplication-Transfer-Loss (DTL) model [55], which is performed using a maximum parsimony formulation to determine the associated cost of each coevolution event. eMPRess computes and displays the distances between every pair of maximum parsimony reconciliation (MPR). The distance between two MPRs is the number of events in one MPR. The maximum likelihood (ML) trees of hosts and viruses were used as inputs, and the analysis was conducted using the following eMPRess parameters: duplication cost = 1, transfer cost = 2, and loss cost = 1. The optimal reconciliation cost for each dataset was compared with that of the same tree with tip associations permuted at random.