Genome resequencing and genetic variation
We generated whole-genome sequences of 91 domestic yaks from 31 different locations and 1 wild yak from the QTP (Additional file 1: Fig. S1). Sequence data from a further 17 wild yaks were downloaded from NCBI for analysis (Additional file 1: Table S1). The samples of domestic yak were widely distributed and include most of the nationally recognized varieties. In total, ~ 21 billion raw reads and ~ 2078 Gb of aligned high-quality data with an average depth of 7.2× were generated using Illumina sequencing technology (Additional file 1: Table S2). After SNP calling and subsequent stringent quality control, we obtained 44,296,018 high-quality SNPs for all 108 individuals, with a range of 6,019,569 to 14,518,382 SNPs per individual. Most of the SNPs were located in intergenic regions: 144,673 in exonic regions and 27,947 nonsynonymous SNPs. The SNP distribution characteristics were similar to those of other livestock like pigs [15] sheep [16].
The mutations or genotypes specific to the wild yak genomes will provide an important resource for breeding. We identified 330,962 wild group-specific SNPs; 2733 of these are located in coding regions and involve 1009 genes that were enriched for olfactory-related functions. The living environment of wild yaks is worse than domestic yaks. Wild yaks living on the higher altitude and don’t have stable pastures, they need not only to prowl for food, but also to avoid predators (wolf). The unusual olfactory might helped them to adapt to their environment and avoid predators. We also identified variations in gene regions specific to each group of domestic yaks. Although the differences among groups were not obvious, the group-specific variation at the genomic level was significative and will help to provide an insight into the relevant unique traits. In our samples, the most obvious phenotypic character divergence is the pure white hair specific to the Tianzhu group [17]. We identified 10 specific SNPs shared in all three TZ individuals, but no nonsynonymous SNPs were found.
Population genetic phylogeny
To identify the genetic relationships between domestic yaks, we constructed a phylogenetic tree of yak SNPs using the neighbor-joining method with wild yaks as an outgroup (Fig. 1). The results showed a relatively close genetic distance and indicated that the domestic yaks were divided into three main branches.
The branch containing BQ, JC, JD, CT, ZD, and DQ separated first from wild yak. These areas are on the southeastern edge of the QTP, and, except for ZD, they are located at very similar latitudes. The Changdu area (N31°, E97°), at the center of these regions, was historically the populated area of the QTP. In conclusion, we suggest that the Changdu area was most likely the origin of the domesticated yaks and that they first dispersed to the east and west to BaQing and JingChuan, respectively.
The second branch in the phylogenetic tree contained NR, GD, RD, SR, LZ, CN, ZB, SZ, SS, (KB, PL) and showed a small genetic distance from the first branch. These areas represent the core region of Tibet, including the central, west, and south of Tibet, suggesting that following successful domestication, yaks were gradually brought to the hinterland of the QTP. We defined this as the second dispersal of domestic yaks through the east of Tibet to the south and west.
The third branch of phylogenetic tree contained XJ, JL, SB, LWQ, HH, GN, TZ, MW, DT, NY, GY, and BZ. Most of these are located at the edge of QTP in Qinghai, Gansu, Sichuang, Shanxi, Yunnan, and XinJiang. This suggested that the third dispersal of domestic yaks was from the center to the periphery of the plateau. Moreover, we found that the diversity of the third branch was higher than that of the other two branches. This is not only related to the hybridization of yaks in trade, but also to interspecies hybridization with wild yaks or cattle [18, 19], for example the DaTong yak is a new breed created by hybridization with wild yaks. In comparison, most breeds in the second branch showed a lower diversity that was consistent with a more closed trading environment, harsher living environment, and smaller population (Additional file 1: Table S3). In addition, we constructed a phylogenetic tree including all the domestic yak data of Qiu et al. [14]. The topology was similar for the samples from the present study, but the downloaded data formed narrow disordered clusters in the second branch and the third branch. This may reflect our more extensive sampling and more accurately defined breeds in the present study (Additional file 1: Fig. S2).
Population component, diversity, and linkage disequilibrium
Principal component analysis (PCA) and Bayesian model-based clustering analysis were employed to examine the phylogenetic groupings and provide any additional evidence. The PCA did not show significant separation among the samples of domestic yak, consistent with a single origin of domestication and the close genetic distance shown in the phylogenetic tree (Additional file 1: Fig. S3). However, when the wild yaks and ambiguous breeds (KB, PL, XJ) were excluded from the analysis, the three separate groupings of domestic yaks were evident (Fig. 1). Furthermore, the first principal component (PC1) separated group 1 and group 2 from group 3; PC2 separated group 1 and group 3 from group 2; and PC3 seemed to separate group 1 from group 2 and group 3, but this was not clear (Additional file 1: Fig. S4). In the clustering analysis performed using ADMIXTURE, with K = 2, the yaks were genetically divided into wild and domestic samples. As K increased (K = 3–5), the domestic samples were not separated into distinct breeds, suggesting extensive genetic admixture among living domesticated yaks (Fig. 1). The genome-wide average θΠ value for domestic groups was 0.93–1.2 × 10–3 was similar to that for the wild yaks (1.1 × 10–3), which is consistent with previous research [14] (Additional file 1: Table S3). In addition, the higher θΠ of group 3 and the lower θΠ of group 2 support the inferences related to trade and geographical enclosure. Linkage disequilibrium analysis suggested that the wild group exhibited a rapid decay rate and a low level of LD, whereas the group 3 yaks showed an overall slow decay rate and a high level of LD (Fig. 1).
Selective analysis and comparison
We calculated pairwise FST to quantify the genetic differentiation among the four groups. Pairwise FST ranged from 0.019 to 0.068 with an average of 0.043, which is smaller than that between diverged taurine cattle breeds [20], and is consistent with the gene flow occurring between wild and domestic yaks. Moreover, the FST between groups of domestic yaks ranged from 0.019 (group 2 vs group 3) to 0.027 (group 1 vs group 2), consistent with a very low degree of differentiation among the domestic breeds. FST between domestic and wild yaks ranged from 0.058 to 0.068, with the lowest FST occurring between group 3 and the wild group. This indicates a higher degree of crossbreeding between group 3 breeds and wild yaks, creating hybrid breeds such as the DaTong yak. Group 3 had a lower FST compared to those of the other domestic groups (Additional file 1: Table S4). This further indicated the higher degree of hybridization of group 3 and is consistent with the ADMIXTURE analysis results.
Regions under directional selection should show specific signatures of variation, including high population differentiation, lower levels of nucleotide diversity, and long-range haplotype homozygosity [21]. To determine whether directional selection might have occurred in groups of domestic yaks, we first explored the genomic landscape of the population differentiation to identify candidate genes. Comparing the extremely high FST values (top 0.1%) with wild yaks using a sliding window analysis, we identified 37, 56, and 39 potential positively selected genes in groups 1, 2, and 3, respectively (Additional file 1: Fig. S5, Table S5). There were 11 genes (THEMIS2, XKR8, SMPDL3B, RPA2, TNFSF15, PACSIN2, TCIRG1, NDUFS8, CHKA, ALDH3B1, and SUV420H1) shared by all the three groups (Fig. 2), which represent the core differentiation genes between domesticated and wild yaks. Four of these genes (TCIRG1, NDUFS8, ALDH3B1, CHKA) play an important role in metabolic pathways. TCIRG1 encodes a subunit of a large protein complex known as a vacuolar H+-ATPase (V-ATPase). This protein helps regulate the pH of cells and their surrounding environment, and V-ATPase-dependent organelle acidification is necessary for intracellular processes such as protein sorting, zymogen activation, and receptor-mediated endocytosis [22]. NDUFS8 encodes a subunit of mitochondrial NADH: ubiquinone oxidoreductase, a multimeric enzyme of the respiratory chain responsible for NADH oxidation, ubiquinone reduction, and the ejection of protons from mitochondria [23]. ALDH3B1 encodes an isozyme that may play a major role in the detoxification of aldehydes generated by alcohol metabolism and lipid peroxidation [24]. CHKA has a key role in phospholipid biosynthesis and may contribute to tumor cell growth [25]. RPA2 can activate the ataxia telangiectasia and Rad3-related protein kinases to involved in DNA metabolism such as DNA replication, repair, recombination, telomere maintenance, and the responses to DNA damage [26]. The PACSIN2 gene encodes a member of the protein kinase C and casein kinase substrate in neurons family and is associated with disease and immunity [27]. Significant differentiation between domesticated and wild yaks was identified in 16,17, and 4 genes in groups 1, 2, and 3, respectively, which suggests differences in selection among the different groups of domestic yaks (Additional file 1: Fig. S5).
For a more global comparison of the selection differences among domestic yaks, we relaxed the selection threshold to the top 1% FST region and performed the enrichment analysis of candidate genes. When comparing group 1 with group 2, the top 1% FST region contained 136 genes that were enriched for the GO terms MHC related (GO:0002504, GO:0042613) and molybdenum ion binding (GO:0030151). Several disease pathways were enriched such as type I diabetes mellitus, graft-versus-host disease, rheumatoid arthritis, asthma, Leishmaniasis, and autoimmune thyroid disease (Additional file 1: Table S6, Fig. S6). Some immune related pathways were also enriched, such as allograft rejection, intestinal immune network for IgA production, antigen processing and presentation, and hematopoietic cell lineage. Thus, some of these genes may contribute to adaptation to the extremely similar but locally distinct environments of grass, insects, and climate. The comparison between group 1 and group 3 identified 206 selection difference genes that were enriched for similar GO and KEGG pathways to those mentioned above, but not molybdenum ion binding and hematopoietic cell lineage (Additional file 1: Table S6). The comparison between group 2 and group 3 identified 185 genes that were not significantly enriched for any pathways. A greater number of genes were identified for group 3 than for the other groups, but there was less enrichment, which may be related to their extensive distribution or higher degree of hybridization.
To better understand the selection acting on the three domestic yak groups, we identified the potential selective-sweep region using the signatures of high FST and greater difference value of pi [14, 16] (Additional file 1: Figs. S7–S9). We identified 298, 365, and 383 selective-sweep genes in groups 1, 2, and 3, respectively, and 70 of these were shared by all domestic yak groups, which reflects the influence of domestication (Additional file 1: Fig. S10, Table S7). Some of these genes were related to energy metabolism (PFKFB1, SLC25A10, MRPL12), nerve development and growth (ATP2B2, CACNA1B, GHRL), and phagocytes (ARFGAP3, HGS, CCDC137, ACTG1, ZC3H3, XKR7). PFKFB1 encodes a member of the bifunctional 6-phosphofructo-2-kinase family that forms a homodimer that catalyzes both the synthesis and degradation of fructose-2,6-biphosphate using independent catalytic domains [28]. GHRL is the ligand for growth hormone secretagogue receptor type 1 (GHSR), which induces the release of growth hormone from the pituitary. This has an appetite-stimulating effect, induces adiposity, stimulates gastric acid secretion, and is involved in growth regulation [29, 30]. The selections of metabolic, organ development will be beneficial to their increased reproductive, the selection in nerve development probably have contributed to the taming of yaks, the selection in phagocytes and response to stimulus will be beneficial to their immunity and livability. For the selection genes specific to the three domestic groups, we found the main functions of group 1 genes are immunity and disease; two genes (KDM1A, VEGFD) related to primitive erythrocyte differentiation were found in group 2 [31, 32]; and the specific selection genes of group 3 function in disease and metabolism. In addition, the wild yak’s selective-sweep region may indicate the direction of natural selection. We found 24 overlap genes that were also identified for all three domestic groups and were enriched in functions of pathogenic Escherichia coli infection and immunity (leukocyte transendothelial migration, phagosome) (Additional file 1: Table S8).
Introgression analysis in Yaks
Introgression has occurred extensively in bovine species [18, 19, 33]. We identified gene introgression between domestic yaks, wild yaks, and Tibetan cattle using ChromoPainter [34] software, and found an interesting phenomenon of unbalanced introgression among these groups (Fig. 3, Additional file 2: Table S9). First, the introgression from Tibetan cattle to yaks was far less than that from yaks to Tibetan cattle, for both wild yaks (1.5 M vs 8 M) and domestic yaks (2.1 M vs 44.6 M). Previous studies also reported unbalanced introgression of 5.54 M (from Tibetan cattle to yaks) vs 30.7 M (from yaks to Tibetan cattle) [19]. It is noteworthy that introgression between Tibetan cattle and domestic yaks is more frequent than that between Tibetan cattle and wild yaks. Second, the introgression from wild yaks into domestic yaks was far less than that from domestic yaks into wild yaks (118 M vs 455 M).
To further explore the influence of introgression on yak, we analyzed the related genes in the introgression area. First, we found 521 genes that overlapped the area introgressed from yaks to Tibetan cattle, including 11 genes that function in bile secretion pathways, seven genes associated with endocrine and other factor-regulated calcium reabsorption, nine genes associated with gastric acid secretion, and six glyceride metabolism-related genes (Additional file 1: Table S10). These genes related to digestion and nutrient absorption infiltrated from yaks into Tibetan cattle, helping them better absorb nutrients and energy in the scarce food structure on the plateau. Some introgression genes related to disease and signal transduction also help Tibetan cattle better adapt to the plateau environment. We found 129 genes in the region introgressed from Tibetan cattle into yaks that are involved in signal transduction, physiological regulation (circadian rhythm), metabolism of phosphoinositide pathways, but no significant enrichment (Additional file 1: Table S11).