Skip to main content
  • Research article
  • Open access
  • Published:

Whole genome duplications have provided teleosts with many roads to peptide loaded MHC class I molecules



In sharks, chickens, rats, frogs, medaka and zebrafish there is haplotypic variation in MHC class I and closely linked genes involved in antigen processing, peptide translocation and peptide loading. At least in chicken, such MHCIa haplotypes of MHCIa, TAP2 and Tapasin are shown to influence the repertoire of pathogen epitopes being presented to CD8+ T-cells with subsequent effect on cell-mediated immune responses.


Examining MHCI haplotype variation in Atlantic salmon using transcriptome and genome resources we found little evidence for polymorphism in antigen processing genes closely linked to the classical MHCIa genes. Looking at other genes involved in MHCI assembly and antigen processing we found retention of functional gene duplicates originating from the second vertebrate genome duplication event providing cyprinids, salmonids, and neoteleosts with the potential of several different peptide-loading complexes. One of these gene duplications has also been retained in the tetrapod lineage with orthologs in frogs, birds and opossum.


We postulate that the unique salmonid whole genome duplication (SGD) is responsible for eliminating haplotypic content in the paralog MHCIa regions possibly due to frequent recombination and reorganization events at early stages after the SGD. In return, multiple rounds of whole genome duplications has provided Atlantic salmon, other teleosts and even lower vertebrates with alternative peptide loading complexes. How this affects antigen presentation remains to be established.


Currently our view of antigen processing, peptide loading and peptide presentation in teleosts mainly relies on extrapolation of functional data generated in humans, a few other mammals and chickens. In these species Major histocompatibility complex class I (MHCI) molecules are key players in discriminating self from non-self. Classical MHCI (MHCIa) molecules, consisting of an alpha chain and a non-covalently associated beta2-microglobulin (b2m) chain, are displayed at the surface of most cells where they normally present peptides derived from self-molecules chopped into smaller fragments in the cytoplasm. If foreign elements such as viruses are present in the cytoplasm, they are prone for degradation and presentation by MHCIa molecules. After peptide loading, the MHCIa molecules are transported to the cell surface for recognition by CD8+ T cells, thereby initiating an immune response when the peptide originates from a non-self protein.

Newly synthesized MHCIa molecules are processed in the endoplasmic reticulum (ER) to become mature properly folded molecules [reviewed in [1]]. This transition to maturity is aided by the molecular chaperones calnexin (CANX), calreticulin (CALR) and heat shock protein family A member 5 (HSPA5 alias BiP) that assist in correct folding and assembly of the MHCIa molecule with the beta2-microglobulin chain. Subsequent association with ERp57 (protein disulphide isomerase family A member 3 alias PDIA3), an enzyme that catalyzes disulfide bond formation, finalizes maturation of the MHCIa molecule. To form the peptide-loading complex, the CALR and ERp57-stabilized MHCIa molecule then associates with tapasin (TAPBP) and the transporters associated with antigen processing (TAP1- TAP2) heterodimer [2].

Peptide loading of MHCIa molecules is a multi-step process starting with the degradation of proteins in cytosol. Proteins tagged for destruction by ubiquitin are generally degraded by a proteasome consisting of an inner core with seven alpha and seven beta subunits associated with one or two regulatory particles [3]. Upon stimulation such as an infection, a MHCIa specific proteasome defined as the immune-proteasome is induced by interferon gamma where three of the seven proteasome beta-components PSMB8, PSMB9 and PSMB10 replace the constitutive components PSMB5, PSMB6 and PSMB7. This change in components produces peptides preferably with hydrophobic or positively charged residues at the C terminus, which are optimally suited for binding to the antigen transporter TAP and to MHC Ia molecules [4]. Two additional interferon-induced subunits called PSME1 and PSME2 provide an added regulatory element to the core immunoproteasome but how they influence the MHCI peptide repertoire is debated [5]. It should be noted that the interferon-inducible immunoproteasome components have not been identified in birds [6].

The protein degradation products generated in cytosol are then transported into the ER through an MHCIa specific transporter consisting of the molecules TAP1 and TAP2. This heterodimer preferentially translocates peptides with a length of 8–12 residues into the ER in an ATP-dependent process [7, 8]. Longer peptides may be transported, but then in a kinked conformation as TAP has a length restriction for the peptide N and C-terminal distance [9]. The three N-terminal and the last C-terminal residues of the peptide are critical for binding to TAP [10,11,12].

In the ER lumen, the MHCIa-b2m-CALR-ERp57 complex associates with tapasin (TAPBP) and transporter associated with antigen processing TAP1/TAP2 into what is defined as the peptide-loading complex (PLC) [13]. TAPBP acts as a bridge between TAP and MHCIa enhancing TAP stability and peptide translocation. TAPBP also stabilizes empty MHCIa molecules and optimizes MHCIa loading with peptides. MHCIa alleles have a variable TAPBP dependency where some alleles do not require TAPBP for loading with high affinity peptides [14]. TAPBP is also linked to the protein disulphide isomerase ERp57, contributing to structural stability of the PLC [15]. To finalize the functional PLC complex, ERp57 is linked to calreticulin, while calreticulin holds on to the MHCIa glycan located at the C-terminal end of the alpha 1 domain [16].

As the length of peptides translocated into the ER varies, some need N-terminal trimming to fit properly into the MHCIa groove as the C-terminus is already MHCIa compatible. ERAP1/ ERAP2 can trim both free peptides as well as MHCIa-bound peptides [17]. Optimal MHCIa peptide binding may also be influenced by the tapasin-related molecule (TAPBPR) although this molecule is not a member of the PLC [18, 19]. Similar to TAPBP, MHCIa alleles vary in their association with TAPBPR [19]. Recently, Neerincx et al. [20] showed that TAPBPR associates with UDP-glucose:glycoprotein glucosyltransferase 1 (UGT1), a folding sensor in the calnexin/calreticulin quality control cycle that is known to regenerate the Glc1Man9GlcNAc2 moiety on glycoproteins. Thus, TAPBPR could serve a dual role also routing empty or low-affinity bound MHCIa molecules back to the PLC for refolding.

In Chicken, there is a classical dominantly expressed and highly polymorphic MHCIa (BF2) gene residing within the major MHC region. This gene is flanked by TAP1 and TAP2 genes while a single TAPBP gene is located less than 40 kb away [21,22,23]. The chicken tapasin and TAP genes are polymorphic with each unique BF2, TAP1, TAP2 and TAPBP combination segregating as a stable functional haplotype. TAPBP and TAP polymorphism govern the peptides available for binding to the BF2 allele [24, 25]. Thus BF2 alleles in heterozygous animals can in theory bind peptides pumped by both haplotypes as opposed to alleles in homozygous animals, thus broadening the MHCIa peptide repertoire. The importance of peptide binding can be exemplified by Mareks disease in chicken, where MHCIa alleles binding a large variety of peptides induce protection while those binding a restricted number of peptides are linked to susceptibility. Presumably, presenting a larger repertoire of pathogen epitopes activate a wider range of T cell clones, a response needed for protection against this pathogen. To counteract against inducing autoimmune reactions, the MHCIa molecules with a wide peptide binding repertoire are only present in low copy numbers on the cell surface while MHCIa molecules with a narrow peptide repertoire are much higher expressed on the cell surface [24, 25]. A similar picture was described in humans, where also here lower surface expression levels corresponded with a broader peptide binding ability. Although TAPBP and TAP are not polymorphic in humans, allelic variation in how the molecules associate with TAPBP was suggested to influence the peptide binding repertoire.

Polymorphism in genes closely linked to MHCIa is also reported in other tetrapods. In rats, two allelic variants of TAP2 linked to specific subsets of MHCIa were shown to deliver different spectrum of peptides [26, 27]. In frogs, linked and highly divergent biallelic PSMB8, TAP1, TAP2 and MHCIa sequence variants of ancient origin have been described [28, 29]. Unfortunately the functional difference between these frog gene variants has not yet been investigated.

Similar to frogs both the PSMB8, TAP2 and tapasin molecules are encoded within the major MHC class I regions of teleosts [30]. This region also contains a teleosts specific duplication of the PSMB9 gene denoted PSMB12 and a gene duplicate of the human PSMB10 gene denoted PSMB13. The teleost PSMB10 gene is for some species located within the extended MHCIa region while for others translocated elsewhere [30, 31]. Polymorphism in the proteasome component PSMB8 has been reported [32, 33] where two distinct lineages of PSMB8 denoted PSMB8A and PSMB8F were found in sharks, cyprinids and salmonids [34]. The two variants existed as duplicate genes in shark, but were defined as alleles in cyprinids and salmonids by Tsukamoto et al. [34], while McConnell et al. defined them as paralogs [31]. When comparing gene sequences from three medaka MHCIa haplotypes, sequence polymorphism was observed both within the PSMB8A gene as well as in the PSMB10 gene [33]. Recently haplotypic variation was also reported in zebrafish where haplotypes contained a varying number of MHCIa genes were linked to polymorphic TAP2, tapasin and PSMB8 molecules [31]. Thus, functional haplotypes linked to the classical MHCIa genes may be a common trait in teleosts.

Atlantic salmon is a species with a single classical MHCIa gene [35] similar to chicken, but with a duplicated MHCI region due to the unique salmonid whole genome duplication (SGD) event that occurred 94 million years ago [30, 36]. The MHCIa region on chromosome 27 harbors the single classical MHCIa locus denoted UBA, while the paralog MHCIb region on chromosome 14 has one expressed non-classical MHCI gene denoted UDA and one or two pseudogenes denoted UCA [37, 38]. The MHCI genes in both regions are flanked by tapasin, the proteasome components PSMB8, PSMB9, PSMB12, PSMB13 and TAP2 alongside several other genes also present in the MHCI region of other teleosts. Similar to other teleosts, the TAP1 gene is located outside the major MHCI region. Based on the reported MHCIa haplotypes in sharks, rats, chicken, frogs, zebrafish and medaka, we set out to investigate the peptide loading machinery and functional MHCIa haplotypes in salmonids compared to other teleosts.


The fact that sequence variation has been found in genes closely linked to MHCI that influence peptide processing, transport and loading in a diverse range of species including the teleosts medaka and zebrafish (See Table 1) could indicate that this is a general trend. To investigate functional MHCIa-linked polymorphism in salmonids we looked at available genomes from Atlantic salmon [39], rainbow trout ([40] and unpublished assembly) and coho salmon (unpublished assembly) as well as Northern pike [41], a species basal to the salmonids that has not experienced the salmonid whole genome duplication event. For Atlantic salmon and trout there are also some BAC sequences covering the paralog MHCI regions [37, 42]. See Fig. 1 for phylogenetic relationships between the various teleost species.

Table 1 Polymorphism in MHC class I linked genes
Fig. 1
figure 1

Phylogeny of tetrapod and ray-finned species with approximate dating (MYA, Million Years Ago) shown on top. Timing of the first and second vertebrate whole genome duplication (VGD1 + 2), the teleost specific whole genome duplication (TGD) and the salmonid whole genome duplication (SGD) events are shown using red font. Tetrapods are shown using orange box, Ostariophysian species are shown using blue box, salmonids are shown using red box and neoteleosts are shown using green box. Relevant literature for the respective branch knots [71,72,73,74] is indicated

PSMB8F linked haplotypes

Based on data from medaka [33, 43, 44] and zebrafish [31] (summarized in Additional file 1; Fig. S1), a good marker for functional teleost MHCIa haplotypes is the PSMB8F gene variant. The term haplotype here defines a group of closely linked gene variants (alleles) residing within one chromosomal region that are inherited together from a single parent. Thus to investigate polymorphism in genes involved in antigen processing in salmonids, we first searched salmonid genome resources for presence of the PSMB8a and PSMB8F variants reported to exist as allelic variants in salmonids [34]. In the following we use -a and –b extensions for paralog genes originating from the salmonid whole genome duplication. In the case of MHCI, Ia and –a extensions refer to genes originating from the salmonid classical UBA regions and Ib to genes in the paralog region containing non-classical MHCI genes respectively. Each chromosomal region containing MHCI and physically linked genes is in the following shown as e.g. Atlantic salmon Ia_#A referring to a specific collection of gene variants in the Ia region while Ia_#B refers to another collection of gene variants in this same region.

None of the previously published MHCIa or MHCIb regions from Atlantic salmon [37, 38] nor those in rainbow trout [42], determined from sequencing of BAC clones, contained a bona fide PSMB8F gene (Figs. 2 and 3, Additional file 1: Figure S1, Additional file 1: Text S1). Neither was this presumed allelic variant present in the paralog MHCI regions of the Atlantic salmon genome originating from a double haploid [39]. However, in both assemblies of the rainbow trout genome [[40] & new unpublished GenBank assembly GCA_002163495.1)] we found the PSMB8F gene located in between the TAP2 and the BRD2 loci in the Ia region (Fig. 2, trout haplotype Ia_#B). Although automatically annotated as two separate open reading frames in the most recent genome assembly (70,115+ CDQ70116) this could well constitute a functional locus in trout as supported by the GenBank transcriptome shotgun assembly sequences GBTD01219587.1 and EZ768376.1. This Ia_#B haplotype also contained a PSMB8 pseudogene in between the UBA and the PSMB13a genes, similar to the location of the PSMB8 gene in the majority of other analysed teleosts [30]. The pseudo-nature of this locus is supported by a PSMB8 pseudogene in this position also in the previously published trout Ia_#A haplotype [42]. Looking back at the published trout MHCI BAC sequences, both these Ia_#A and Ib_#A haplotypes terminated immediately following the TAP2 locus so the PSMB8F gene could potentially also have been present in these haplotypes. Based on the pseudogene nature of the PSMB8 gene in both the trout Ia_#A and Ia_#B haplotypes, perhaps this is compensated for by a functional PSMB8F gene. For the coho (Oncorhynchus kisutch) genome, a PSMB8F gene is also located in between the TAP2 and BRD2 genes of the Ia region in this species (Fig. 2, Additional file 1: Figure S1, Additional file 1. Text S2), but based on the genome assembly this is a pseudogene.

Fig. 2
figure 2

Overview of the included MHC class Ia and Ib haplotypes from Atlantic salmon (Salmo salar), Rainbow trout (Oncorhynchus mykiss) and Coho salmon (Oncorhynchys kisutch) in addition to the assumed Northern pike (Esox lucius) allelic haplotypes represented by the Chr. 10 region (Ia) and the unplaced genomic scaffolds NW80 (NW_017859580.1) and NW71 (NW_017859271.1). Color coding for individual genes are shown on the bottom of the figure. Sequences originate from either previously published BAC sequences [Rainbow trout haplotype Ia_#A and Ib_#A: [42]; Atlantic salmon haplotypes Ia_#A&#B, Ib_#A&#B: [37, 38]], from genomes available in GenBank [Atlantic salmon Ia_#C and Ib_#C: [39], Northern pike: [41]] or unpublished genome assemblies for rainbow trout and Coho salmon. See Materials and Methods for additional references. A 7.2 Mb region dividing MHCI-related genes in the Atlantic genome Ib region is indicated by a black line. Gene boxes are colored red for MHCIA, dark green for PSMB subunits, yellow for TAP2, blue for TAPBP and grey for remaining genes. Individual genes are shown above the boxes using abbreviations B, C, D, E, F, L, G for individual U lineage genes; Z for Z lineage genes; 8, 8F, 9, 10, 12, 13 for individual PSMB genes; 1 through 7 for undefined MHCI lineage genes in Northern pike. X denotes pseudogenes. More regional details can be found in Additional file 1: Figure S1

Fig. 3
figure 3

Phylogeny of deduced PSMB8 amino acid sequences from selected species. The evolutionary history was inferred by using the Maximum Likelihood method based on the JTT matrix-based model [75]. The percentage of trees in which the associated taxa clustered together (100 bootstrap trials) are shown next to the branches. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. All positions with less than 95% site coverage were eliminated. Salmonid and Northern pike PSMB8 sequences originate from the haplotypes described in Materials and Methods with the exception of the Atlantic salmon PSMB8F sequence which is from GenBank with accession number in parenthesis. Zebrafish haplotype sequences are from McConnell et al. [31] and medaka haplotype sequences are from Hd-rR [43], HN1 [44] and cab [33]. Accession numbers are shown in parenthesis. Pseudogene fragments are not included in the phylogenetic analysis, but are included in Additional file 1: Text S1 and Additional file 1: Text S2. The tree is unrooted and some bootstrap values are not shown for clarity

Fig. 4
figure 4

Phylogeny of deduced TAP2 amino acid sequences from selected species. The evolutionary history was inferred by using the Maximum Likelihood method based on the Le and Gascuel 2008 model [76]. The percentage of trees in which the associated taxa clustered together (100 bootstrap trials) are shown next to the branches. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. Positions with less than 95% site coverage were eliminated. For sequences not originating from the selected haplotypes accession numbers are shown in parenthesis. Gene sequences linked to haplotypes containing the PSMB8F gene variant (Fig. 2) are shown using red font. Sequence references not shown in figure can be found in [31]. The tree is unrooted and some bootstrap values are not shown for clarity

Fig. 5
figure 5

A visual summary of the complexity in number of genes involved in MHCI peptide cleavage, peptide transport, peptide loading and editing in Atlantic salmon. Endoplasmatic reticulum (ER) lumen and cytosolic compartments are shown. Number of Atlantic salmon genes per each human gene ortholog is shown using red font. Like (L) extensions are used when Atlantic salmon sequences exist in duplicates with one groups clustering closer to the human ortholog than the gene-like sequences. Black arrows indicate movement through the compartments while red arrows point to location of molecules

Fig. 6
figure 6

Phylogeny of TAPBP, TAPBPR and TAPBPL amino acid sequences from selected species. The evolutionary history was inferred by using the Maximum Likelihood method based on the Whelan and Goldman model [77]. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The percentage of trees in which the associated taxa clustered together (100 bootstrap trials) are shown next to the branches. All positions with less than 95% site coverage were eliminated. Sequence references are as follows: Atlantic salmon (Salmo salar) TAPBPa_#C NP_001117077.1, TAPBPR NP_001133983.1, TAPBPL1a XP_014069540.1, TAPBPL1b XP_014017660.1, TAPBPL2 XP_014062182.1; Northern pike (Esox Lucius) TAPBP XP_010899738.2; Zebrafish (Danio rerio) TAPBP GDQH01003123.1, TAPBPR XP_001919985.2, TAPBPL AAI71514.1; Medaka (Oryzias latipes) TAPBPR XP_011483883.1, TAPBPL XP_004075780.1; Spotted gar (Lepisosteus oculatus) TAPBP GFIM01016833, TAPBPR XP_015193320.1, TAPBPL GFIM01040944.1; Frog (Xenopus laevis) TAPBPL XP_018100952.1; Turtle (Chrysemys picta bellii) TAPBPL XP_005298961.1; Chicken (Gallus gallus) TAPBP1 NP_001029988.2, TAPBPR NP_001026543.1, TAPBPL merged transcripts BU342879.1, BU369515.1 and BX257449.3; Kiwi (Apteryx australis mantelli) TAPBPL XP_013817376.1; Opossum (Monodelphis domestic) TAPBPL XP_007485846.1; Human (Homo sapiens) TAPBPR NP_060479.3, TAPBP NP_003181.3. See Additional file 1: Text S1 and Additional File 1: Text S4 for sequences and amino acid alignment. The tree is unrooted and some bootstrap values are not shown for claritys

As opposed to rainbow trout, there seems to be a bona fide coho PSMB8 gene in the expected position in between the UBA and PSMB13a loci. Thus, the trout and coho data do not support the proposed allelic nature of the PSMB8A and PSMB8F sequence variants suggested by Tsukamoto et al. [34], but rather suggests that they are two different loci residing at different locations within the salmonid MHCIa region. This is further supported by small PSMB8F pseudogene fragments in between the TAP2 and the BRD2 gene in all Atlantic salmon Ia and Ib haplotypes (Fig. 2, Additional file 1: Figure S1, Additional file 1: Text S1, Additional file 1: Text S2), supporting the location of this gene in this position in a predecessor to Atlantic salmon, coho salmon and rainbow trout. There are also PSMB8 fragments elsewhere in salmonid genomes such as in between the trout Ib UCA and UEA genes and the coho Ib region UEA and UDA genes suggesting this gene has been shuffled around during salmonid evolution. Northern pike, a diploid species basal to the salmonids [41], does not add any clarity to the evolution of these genes, as it contains a PSMB8F pseudogene in between a seemingly functional PSMB8 gene and two duplicate MHCI genes. We did however find expressed PSMB8F match for both Northern pike (GATF01015276.1) as well as Atlantic salmon (ACI66984.1) (Additional file 1: Text S2), suggesting this gene may be functional in some haplotypes.

In zebrafish the PSMB8F gene is located in a unique haplotype on chromosome 19 denoted 19D alongside a single classical MHCIa gene denoted UGA and two highly divergent TAP2 genes [31]. The two other haplotypes A and B described by McConnell et al. [31] contain the PSMB8A variant only in a location different from the PSMB8F gene in the 19D haplotype (Additional file 1; Figure S1). It should be noted that this zebrafish UGA gene is not identical to the non-classical gene denoted UGA in salmonids. Even if the PSMB8F gene has been silenced in most trout, coho and Northern pike haplotypes, there should still be remnants of other gene polymorphisms in these species. We thus contrasted closely linked gene sequence variants in PSMB8F positive regions against those in Atlantic salmon regions without the PSMB8F variant (Fig. 2). Based on data from zebrafish and medaka suggesting that TAPBP, TAP2 and PSMB variants have evolved to serve specific MHCIa alleles [31, 34], we first investigated the phylogenetic relationship amongst the MHCIa gene sequences. As teleosts have alpha 1 domain lineages that cluster into lineages shared between distantly related species [30, 33, 45] this analysis was performed based on individual alpha 1–3 domains.

Alpha I domain sequence of the rainbow trout UBA locus, the second pike UBA locus and the zebrafish UGA locus all cluster together alongside other alpha 1 domain sequences from lineage V (Additional file 1: Figure S2) [30, 45]. The two other PSMB8F containing regions did not comply with linkage of the MHCIa alpha 1 domain region and the PSMB8F gene sequence, as the coho a1 domain sequence clusters alongside lineage I sequences and the pike UBA1 sequence clusters alongside lineage III sequences. Comparing pike with the Atlantic salmon MHCIa region could suggest that the pike UBA1 locus is an ortholog of the Atlantic salmon ULA locus while the pike UBA2 locus is the ortholog of the salmon UBA locus, although the pike UBA1 locus still contains a transmembrane region as opposed to the salmon ULA gene. If so, the only deviation from a functional haplotype including MHCIa gene variants and the PSMB8F gene is coho based on the alpha 1 domain analysis. However, the sequenced Northern pike animal was not a double haploid as seen in the two unassembled scaffolds containing five additional U lineage loci and a duplicate PSMB-TAP2 region (Additional file 1: Text S1, scaffolds NW_017859271.1 and NW_017859580.1) which most likely represents an allelic variant. There is currently no data available enabling definition of these pike genes as classical or non-classical referring to polymorphic content, peptide binding ability and tissue expression distribution.

We know that the alpha 1 domain in salmonids fluctuate between different alpha 2 and downstream domains [30, 45,46,47,48] so potentially there has been a translocation or recombination of the coho alpha 1 domain, losing the alpha 1 domain V lineage in this haplotype. This is consistent with the clustering of the alpha 2 domain of coho and rainbow trout alpha 2 domains with other lineage I alpha 2 domains (Additional file 1: Figure S3). But for the alpha 2 domain zebrafish UGA clusters with alpha 2 domain lineage II sequences with a convincing bootstrap value and the pike sequences form a separate clade unlinked to the other PSMB8F-linked haplotype sequences. For the alpha 3 domain, the sequences cluster in a species specific manner providing no further clues as to functional haplotypes. Thus, the MHCI gene sequence analyses does not convincingly support functional haplotypes represented by the PSMB8F gene sequence in salmonids.

Zebrafish did not contain any polymorphism in the PSMB12 gene sequences, but some sequence variation in the PSMB9 gene [31]. For salmonids, the salmonid PSMB9 and PSMB12 genes had more species specific sequence variation than haplotypic variation and none of the variable sites in the PSMB9 sequences matched the sequence variation seen in zebrafish (Additional file 1: Figure S3).

Of the phylogenetically related zebrafish PSMB7, PSMB10 and PSMB13 genes, only the PSMB13 gene displayed haplotypic sequence variation. Zebrafish PSMB7 and PSMB10 genes were unlinked to any MHC genes residing on chromosome 21 and 4 respectively [31]. In salmonids both the PSMB13 and the PSMB10 genes reside within both the paralog MHCI region where the PSMB10 gene is located closer to the MHCI Z lineage genes than to the U lineage loci (Fig. 2 and Additional file 1: Figure S4). The Atlantic salmon PSMB7 gene resides on linkage group 1 with a pseudogene copy on linkage group 11 (Additional file 1: Text S1). As opposed to zebrafish, there is no indication of any haplotypic variation in any of these PSMB8F linked genes in any salmonid species. As an example the rainbow trout Ia_#A haplotype contains a UBA*0501 allele while the trout Ia_#B haplotype contains a new yet unnamed allele, but their PSMB13 sequences are identical. Ditto for Atlantic salmon where the PSMB13 sequences are identical in the Ia_#A and Ia_#C haplotypes containing UBA*0201 and UBA*0301 alleles respectively.

In zebrafish, the TAP2 alleles linked to the B and D haplotypes are highly divergent [31]. In salmonids the most prominent sequence variation is seen in the Atlantic salmon TAP2a_#C sequence where there are 11 amino acid differences within the first 31 amino acids in the N-terminal region when compared to TAP2 variants of the other two MHCIa haplotypes (Fig. 4 and Additional file 1: Text S3). This TAP2 sequence resembles both those in the rainbow trout Ia region as well as those in the paralog Ib region, but expressed support (transcriptome shotgun assembly sequence GBRB01034973.1) suggests it is a true sequence and not a genome assembly artefact. Northern pike on the other hand, does display considerable variation in the TAP2 gene sequences located in the main Ia region and the NW80 scaffold (Figs. 2 and 4). Sequence identity between the two pike TAP2 gene sequence variants is 87% which is intermediate in comparison to the 98%, 96%, 71% and 61% amino acid sequence identities observed between the chicken, rat, frog and zebrafish TAP2 sequence variants respectively. Nine residue positions coincide with polymorphic residues in rat or chicken and many variable positions are located within and surrounding the first transmembrane domain known to interact with tapasin [49].

For the salmonid tapasin (TAPBP) gene, there is less sequence information available, as many of the sequenced BACs did not cover this locus (Fig. 2). There are a few amino acid differences between salmon, trout and coho, but no apparent haplotypic variation. Atlantic salmon also has a third TAPBP gene linked to a duplicate UDA locus 7 Mb upstream from the major Ib region, but this TAPBP gene seems to be a pseudogene (Additional file 1: Figure S5).

Based on the above analyses of salmonid MHCI haplotype gene sequences, we did not find evidence for polymorphism in antigen processing genes with the exception of a TAP2 sequence variant in Atlantic salmon. This Atlantic salmon TAP2a_#C sequence differs in many of the first 31 amino acid residues of the N-terminal region a variation that could influence binding affinity for TAPBP [49], but does not match the sequence variation seen in zebrafish, rat or chicken TAP2 sequences (Additional file 1: Text S3). Lack of polymorphism in the selected genes is supported by current and previous sequence analyses using other GenBank resources. With Northern pike displaying a TAP2 polymorphism matching many of the polymorphic residue positions found in other species, the lack of functional haplotypes in salmonids may be unique.

What about other genes influencing MHCI assembly and peptide loading?

A paralog MHCI region with functional gene copies of PSMB8, PSMB9, PSMB12, PSMB13 and TAP2 (Table 2) provides salmonids with an added complexity in comparison with zebrafish and medaka. This complexity of having many duplicate immune genes which potentially have acquired slightly new functions as suggested by Lien et al. [39] could also relate to other genes involved in antigen degradation, transport and loading. This spurred us to investigate a broader set of genes influencing antigen processing, transport and loading. Calnexin, which binds newly synthesized MHCI prior to assembly with b2m, is encoded by two duplicate loci in Atlantic salmon (Fig. 5, Additional file 1: Figure S6). Both genes display fair expression in immunologically important tissues such as head kidney and spleen as well as other organs such as brain and ovary (Table 2). This gene duplication has previously been identified in trout [50] and is a remnant of the SGD displaying high sequence identity between paralogs. If one or both copies serve the classical MHCIa molecules remains to be established.

Table 2 Expression levels of Atlantic salmon genes in various tissues

The calnexin MHCI heavy chain complex then associates with beta2-microglobulin (b2m) (Fig. 5). This b2m gene is encoded by 3–10 loci in trout [51, 52] while the Atlantic salmon genome contains 13 loci where at least one seems to be a pseudogene (Additional file 1: Text S1). These 12–13 Atlantic salmon gene sequences cluster into two distinct clades with 82% amino acid identity (data not shown). Due to high sequence identity within each clade, it is difficult to assess the expressed status of each Atlantic salmon locus (Table 2). Many other teleosts also have two clades of b2m gene sequences [53], but here the amino acid sequence identity between clades is mostly lower than between the salmonid clades (Additional file 1: Figure S7). The gene duplications have occurred individually in many species with for instance unique gene duplications in Ostariophysi represented by zebrafish and Mexican tetra and another unique duplication in neoteleosts represented by tilapia, medaka and stickleback. Which molecules these divergent teleost b2m molecules support is unknown, but at least in medaka and stickleback the two b2m sequence variants can only serve the U and Z lineages as the remaining MHCI lineages S, L and P are not present in these species [30]. A more speculative idea would be that both b2m sequence groups bind to the classical MHCIa molecule and as such influence the peptide repertoire as previously reported for mouse MHCI molecules [54].

Once the b2m molecule is associated with the MHCI alpha chain, calnexin is replaced by calreticulin (Fig. 5). In Atlantic salmon there are three CALR genes, but also three calreticulin-like (CALRL) genes (Additional file 1: Figure S6) with amino acid sequence identity between the two groups ranging from 67 to 70% (data not shown). Phylogenetically, the gene sequences here denoted Atlantic salmon CALR1a, CALR.1b, CALR1.2 cluster with the human CALR gene sequence and also the previously published catfish sequence denoted CALRL2 [55]. The remaining three Atlantic salmon gene sequences form a separate cluster alongside the two other catfish sequences denoted catfish CALRL and CALR, clustering with a 99% bootstrap value to the other CALR sequences. Sequences belonging to the CALR and CALRL clades also exist in spotted gar, a species that split off from the teleost lineage prior to the teleost whole genome duplication (TGD) [56] (Fig. 1, Additional file 1: Figure S6), suggesting these genes are remnants of the second vertebrate genome duplication (VGD2). The genes represented by the CALRL1 and CALRL2 clades originated before Ostariophysi branched off from the remaining teleosts having orthologs also in catfish. They are thus a product of the TGD event that occurred approximately 350 million years ago after spotted gar split off from the main teleost lineage [56]. Based on sequence identity, phylogenetic clustering and chromosomal location, the CALR1a and CALR1b genes as well as the CALRL2a and CALRL2b genes originate from the SGD duplication event [39]. Human and chicken CALR3 sequences form an outgroup and have no detectable orthologs in Atlantic salmon.

Both the CALR as well as the CALR-like molecules display fair sequence identity as well as conservation of residues known to influence human CALR glycan binding suggesting both groups bind glycan moieties (Additional file 1: Figure S6). Both teleost CALR and CALRL sequence groups also have an ER retention signal (KDEL) and an acidic C-terminal region known to affect ER-retention and recycling [57], further supporting their functional resemblance of human CALR. However, the CALRL genes are generally expressed approximately 10 times higher than the CALR genes (Table 2). And CALRL expression levels are not restricted to immunologically important tissues as seen with high expression levels also in heart, liver and ovary for some of these genes. In catfish, they found both CALR and CALRL genes to be induced upon infection [55]. In rainbow trout, calreticulin was reported as a single copy gene [58], but this sequence is an orthologue of the Atlantic salmon gene here denoted CALRL2b. A later functional study of this trout CALRL2b orthologue reported limited response to endoplasmatic reticulum stress and also little response to stimulation with the viral mimicry stimulant polyI:C [59]. Future studies are needed to clarify the functional distinction of teleost CALR and CALRL genes.

The next molecule to associate with the MHCI/b2m/CALR complex is ERp57 alias PDIA3 (Fig. 5). As seen for the CALR and CALRL genes, a similar picture emerges for ERp57. Here there are two Atlantic salmon ERp57 genes, but also three additional ERp57-like (ERp57L) genes (Additional file 1: Figure S8). The Atlantic salmon ERp57a/b genes cluster with human and zebrafish ERp57 sequences while the ERp57L sequences form a separate clade. Amino acid sequence identity between the Atlantic salmon ERp57 and ERp57L groups is approximately 60% (data not shown). Spotted gar has both an ERp57 sequence as well as an ERp57L sequence, suggesting this duplication originates from the second VGD2 event (Fig. 1) [56]. Both the ERp57L1 sequence and the ERp57L2a/−L2b sequences have orthologs in zebrafish and is most likely a remnant of the TGD event where also here the duplicates have been retained as expressed copies in Atlantic salmon (Table 2). Both Atlantic salmon ERp57a and -b genes and the two ERp57L2a and -L2b genes originate from the unique SGD event with amino acid sequence identities of 95 and 92% respectively. An ERp57 ortholog to the gene sequence here defined as Atlantic salmon ERp57a has been described both in rainbow trout [60] as well as in seabass [61]. This trout gene was induced upon stimulation by a viral mimicry molecule, but also by ER stressors. How the ERp57L genes would respond to similar stimulations remains unknown but the ERp57a/b genes display a much higher expression level in most tissues compared to the ERp57L genes (Table 2).

Following ERp57 association, the complex then associates with Tapasin (Fig. 5). The TAPBP and TAPBPR gene sequences have previously been described in trout [62] where both genes were shown to respond to viral infection. In addition to the tapasin sequences linked to the MHCI Ia and Ib regions in Atlantic salmon and a single TAPBPR gene, there are also three additional gene sequences with blast match to TAPBPR and TAPBP here denoted TAPBP-like or TAPBPL (Fig. 6, Additional file 1: Text S4). Sequence identity between Atlantic salmon TAPBP, TAPBPR and TAPBPL is similar to the 22% amino acid identity found between the human TAPBP and TAPBPR sequences [63]. Both zebrafish and spotted gar only have one TAPBPL gene suggesting the TAPBPL duplications are unique to salmonids. This previously undefined TAPBPL gene is also present in frogs, turtles, alligators, birds and marsupials, but seems to have been lost in the lineage leading to placental mammals (Fig. 6). The TAPBPL gene duplication thus occurred prior to the split between the tetrapod and the bony fish lineages i.e. is also a remnant of the VGD2 event (Fig. 1). The two TAPBPL1a and TAPBPL1b genes originate from the SGD with an amino acid sequence identity of 88% but these two gene sequences only have 53% identity to the third TAPBPL2 sequence which then seems of a more ancient origin.

Many of the TAPBP and TAPBPR residues known to interact with MHC class I [18, 19, 64] are also conserved in the TAPBPL sequences (Additional file 1: Text S4). The human TAPBP C95 residue known to bind ERp57 [64], is not conserved in any of the other TAPBP, TAPBPR or TAPBPL sequences questioning the relevance of this residue in other species. However, a unique cysteine in the TAPBPL sequences located 11 amino acids further downstream could have a similar function as the human TAPBP C95 residue or it could resemble the unique C94 residue in TAPBPR known to interact with UGT1 [20]. The N-linked glycosylation site at N233 known to interact with CALR [65] is preserved in some of the TAPBP, TAPBPR and TAPBPL sequences. The single TAPBP lysine residues in the transmembrane region associating with TAP [2] is not conserved in the TAPBPL sequences, while some TAPBPL sequences display an ER retention signal as found in mammalian TAPBP [66]. Thus, based on conservation of many MHCI-interacting residues, the glycosylation site used to interact with CALR and the ER retention signal, TAPBPL sequences share more structural similarities with TAPBP than with TAPBPR supporting their name as TAPBPL and not TAPBPRL. Although we can only speculate as to the TAPBPL function, all three Atlantic salmon genes are expressed and their expression profiles resemble that of the TAPBP and TAPBPR genes with some tissue specific patterns (Table 2).

In addition to paralog functional copies of the immunoproteasome components PSMB8–13 [Fig. 2, [37]], Atlantic salmon also has duplicate copies of the interferon-inducible PSME1 and PSME2 regulatory subunits. These duplicates originate from the SGD with amino acid sequence identities of 93 and 91% respectively (Additional file 1: Figure S9) and display fair expression in a wide variety of organs (Table 2). As with the majority of paralogs, we do not know how these paralogs influence the peptide repertoire available for MHCI binding. Atlantic salmon also has four genes for the PSME3 subunit (Additional file 1: Figure S9), but at least in humans this subunit is not interferon inducible and thus not a part of the immunoproteasome. If this also holds true for Atlantic salmon remains to be established.

Once transported inside the ER, the N-terminal end of peptides are further trimmed by ERAP1 and ERAP2 molecules [17]. For ERAP1 there is only one gene, while there are two copies of the ERAP2 gene (Fig. 5, Additional file 1: Figure S10). These two ERAP2 genes originate from the SGD with 93% amino acid sequence identity and both gene copies display low expression levels mainly in immunologically important tissues (Table 2).


As opposed to what has been found in sharks, rats, frogs, chicken, medaka and zebrafish and allelic polymorphism in genes closely linked to the MHCIa gene was not evident in salmonids.

Northern pike, a diploid species basal to the salmonids [41], does display TAP2 polymorphism where many of the variable amino acid residues coincide with variable residue positions found in other species suggesting this polymorphism has been lost en route to salmonids.

It seems odd that there are functional MHCIa haplotypes in zebrafish and medaka but not in salmonids. One explanation for the loss of functional Ia haplotypes is the unique salmonid genome duplication that occurred approximately 94 million years ago [39]. Medaka and zebrafish have not experienced an additional WGD event after the third WGD that occurred in a ancestor to all teleost fish approximately 320 million years ago [56]. WGD provides raw material for evolutionary diversification, but must balance against negative dosage-effects, regulatory errors, negative protein-protein interactions and mitotic mistakes. Most WGDs are followed by a reduplication phase with extensive reorganizations to balance against the negative effects of duplications mentioned above. As seen in Atlantic salmon, a burst of transposon-mediated repeat expansions most likely triggered this reduplication phase resulting in increased homeologue sequence divergence and large chromosomal rearrangements such as fusions, fissions, deletions and inversions [39] that disrupted the possibility for homeologous pairing during mitosis. Potentially this extensive rearrangement and sequence divergence eradicated the functional haplotypes found in other teleost species in part including Northern pike. However, it should be noted that this study only includes a few haplotypes so there may be other haplotypes with polymorphic genes influencing MHCIa peptide processing, transport and loading.

Having paralog MHCI regions, the PSMB, TAP and TAPBP genes in the Atlantic salmon Ia and Ib regions could have evolved to serve MHCI molecules encoded by individual regions where genes within the Ia region serve the UBA molecule while genes within the Ib region serve the UDA molecule. Without functional data, these paralog genes could of course also serve MHCI molecules originating from both regions.

The SGD also provided salmonids with many paralog genes that are retained as extant expressed and presumably functional copies. Lien et al. [39] found that SGD duplicates tended to belong to closely related but still different co-expression clusters suggesting far more instances of neo-functionalization than subfunctionalization. Many Atlantic salmon genes involved in peptide processing, transport and loading exist in multiple expressed copies with sequence identity reflecting which of the three whole genome duplication events they originated from. Atlantic salmon has CALR gene duplicates originating from both the SGD as well as the TGD in addition to a gene duplication that originates from the second vertebrate whole genome duplication (VGD2, Fig. 1). This provides Atlantic salmon with three CALR genes and three CALRL genes. As SGD paralogs tend to acquire novel functions based on transcript expression patterns [39], the CALR and CALRL gene duplicates may provide Atlantic salmon with a variety of CALR-like functions. A similar picture is seen for ERp57 and TAPBP, where gene duplications have provided Atlantic salmon with five expressed ERp57/ ERp57L genes and six expressed TAPBP/ TAPBPL genes. As spotted gar also contains these CALR/ CALRL, ERp57/ ERp57L and TAPBP/ TAPBPL gene duplications, they originate from the VGD2 event and not from the teleost specific genome duplication event. This is further supported by the fact that a TAPBPL gene is also present in the tetrapod lineage with what seems to be a bona fide gene even in opossum. How this TAPBPL molecule affects MHCI peptide loading or editing is unclear, but it may hold new and intriguing functions as seen for the TAPBPR gene [18,19,20].

It is tempting to speculate that there exists alternative peptide-loading complexes in teleosts based on the presence of duplicate CALR/ CALRL, ERp57/ERp57L and TAPBP/ TAPBPL genes in both zebrafish, salmonids and medaka. If this holds true one may only speculate on which MHC molecules these complexes associate with and how that difference in PLC affects the peptides transported and loaded into the peptide binding groove of these MHCI molecules. Expression data support the CALR, ERp57L and TAPBP/ TAPBPL genes to be functionally related as they all display fairly low expression levels in identical tissues. Expression levels of CALRL and ERp57 are much higher and more diversified suggesting they may have different or additional roles.


We found no evidence pointing to functional polymorphism in TAP, TAPBP, PSMB8, PSMB9, PSMB12 and PSMB13 genes closely linked to classical UBA alleles in salmonids as opposed to what has been found previously in zebrafish, medaka and several other species. The unique salmonid whole genome duplication has most likely disrupted such haplotypes with frequent recombination between chromosomal paralogs prior to their diversification. However, several whole genome duplications have provided salmonids with many duplicated genes involved in peptide generation, loading and editing most likely broadening their biological function compared to what is found in mammals. A surprise is the functional retention of many genes originating from the second vertebrate whole genome duplication event providing both spotted gar as well as teleosts with potential alternative versions of the peptide-loading complex. Future studies are needed to understand the functional relevance of these gene duplications including the alterative PLCs.


Data mining and bioinformatics

MHC class I haplotype gene sequences originate either from genomes available in GenBank or previously sequenced BAC clones as follows: Atlantic salmon (Salmo salar) BACs [37, 38]: Haplotype Ia_#A BACs 868O01 (EF441211), 92I04 (EF427384.1), 539 M19 (EF427383) and 129P21 (GQ505858); Haplotype Ia_#B BAC 714P22 (EF210363); Haplotype Ib_#A BAC 8I14 (EF427379); Haplotype Ib_#B BAC 438 J08 (FJ969490); Genome assembly GCA_000233375.4 [39] Haplotype Ia_#C Chr.27 NC_027326: 10.000.000–10.656.000; Haplotype Ib_#C genome Chr.14 region NC_027313:50.800.000–59.500.000. Rainbow trout (Oncorhynchus mykiss) BACs [42] Haplotype Ia_#A BAC AB162342; Haplotype Ib_#A BAC AB162343; Trout genome (GCA_002163495.1) Haplotype Ia_#B Chr.18 (CM007952.1); Haplotype Ib_#B Chr.14 (CM007948.1). Coho salmon (Oncorhynchus kisutch) genome GCA_002021735.1 haplotype Ia region Chr.17 NC_034190: 25300000–26,000,000; Haplotype Ib region Chr.14 NC_034187: 22.700.000–23.500.000. Northern pike (Esox Lucius) genome (GCA_000721915.3) Haplotype Ia region Chr.10 (NC_025977.3), and presumed allelic haplotype represented by the unplaced genomic scaffolds NW_017859580.1 (NW80) continued with NW_017859271.1 (NW71). Some genomic un-annotated gene sequences as well as unlinked gene sequences were identified using various blastN and TblastN searches of Ensembl and NCBI databases uand evolutionary diverged as well as species-specific sequences. Open reading frames were predicted using FGENESH [67] and verified using available expressed resources. Potential genomic assembly errors could influence the analyses. Some smaller pseudogene remnants that did not contribute to evolutionary understanding were neglected. Expressed match was either identified through TblastN search against EST, GenBank nucleotide (cDNA) and available TSA/SRA resources. RPKM values shown in Table 2 were defined using CLC Genomic Workbench 6.0.5 [68] and SRR transcriptome runs from a single individual [39] as follows: HK (head kidney) SRR1422860, gills SRR1422858, spleen SRR1422870, heart SRR1422862, gut SRR1422859, muscle SRR1422866, brain SRR1422856, liver SRR22865, nose SRR1422867, skin SRR1422869, ovary SRR1422871, testis SRR1422872. 1.

Phylogenetic analysis

All amino acid sequence alignments were performed using Clustal X [69].The phylogenetic trees were inferred using best-fit models calculated by MEGA7 [70] and bootstrapped using 100 replicates.









Major histocompatibility complex


Million years ago


Proteasome subunit beta


Salmonid-specific whole genome duplication


Transport associated protein




Teleost-specific whole genome duplication


Vertebrate genome duplication


  1. Neefjes J, Jongsma ML, Paul P, Bakke O. Towards a systems understanding of MHC class I and MHC class II antigen presentation. Nat Rev Immunol. 2011;11(12):823–36.

    Article  CAS  PubMed  Google Scholar 

  2. Sadasivan B, Lehner PJ, Ortmann B, Spies T, Cresswell P. Roles for calreticulin and a novel glycoprotein, tapasin, in the interaction of MHC class I molecules with TAP. Immunity. 1996;5(2):103–14.

    Article  CAS  PubMed  Google Scholar 

  3. Adams J. The proteasome: structure, function, and role in the cell. Cancer Treat Rev. 2003;29(1):3–9.

    Article  CAS  PubMed  Google Scholar 

  4. Ferrington DA, Gregerson DS. Immunoproteasomes: structure, function, and antigen presentation. Prog Mol Biol Transl Sci. 2012;109:75–112.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Cascio P. PA28alphabeta: the enigmatic magic ring of the proteasome? Biomol Ther. 2014;4(2):566–84.

    CAS  Google Scholar 

  6. Erath S, Groettrup M. No evidence for immunoproteasomes in chicken lymphoid organs and activated lymphocytes. Immunogenetics. 2015;67(1):51–60.

    Article  CAS  PubMed  Google Scholar 

  7. van Endert PM, Tampe R, Meyer TH, Tisch R, Bach JF, McDevitt HO. A sequential model for peptide binding and transport by the transporters associated with antigen processing. Immunity. 1994;1(6):491–500.

    Article  CAS  PubMed  Google Scholar 

  8. Nijenhuis M, Hammerling GJ. Multiple regions of the transporter associated with antigen processing (TAP) contribute to its peptide binding site. J Immunol. 1996;157(12):5467–77.

    CAS  PubMed  Google Scholar 

  9. Herget M, Baldauf C, Scholz C, Parcej D, Wiesmuller KH, Tampe R, Abele R, Bordignon E. Conformation of peptides bound to the transporter associated with antigen processing (TAP). Proc Natl Acad Sci U S A. 2011;108(4):1349–54.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Peters B, Tong W, Sidney J, Sette A, Weng Z. Examining the independent binding assumption for binding of peptide epitopes to MHC-I molecules. Bioinformatics. 2003;19(14):1765–72.

    Article  CAS  PubMed  Google Scholar 

  11. Uebel S, Kraas W, Kienle S, Wiesmuller KH, Jung G, Tampe R. Recognition principle of the TAP transporter disclosed by combinatorial peptide libraries. Proc Natl Acad Sci U S A. 1997;94(17):8976–81.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. van Endert PM, Riganelli D, Greco G, Fleischhauer K, Sidney J, Sette A, Bach JF. The peptide-binding motif for the human transporter associated with antigen processing. J Exp Med. 1995;182(6):1883–95.

    Article  CAS  PubMed  Google Scholar 

  13. Cresswell P, Bangia N, Dick T, Diedrich G. The nature of the MHC class I peptide loading complex. Immunol Rev. 1999;172:21–8.

    Article  CAS  PubMed  Google Scholar 

  14. Peh CA, Burrows SR, Barnden M, Khanna R, Cresswell P, Moss DJ, McCluskey J. HLA-B27-restricted antigen presentation in the absence of tapasin reveals polymorphism in mechanisms of HLA class I peptide loading. Immunity. 1998;8(5):531–42.

    Article  CAS  PubMed  Google Scholar 

  15. Dick TP, Bangia N, Peaper DR, Cresswell P. Disulfide bond isomerization and the assembly of MHC class I-peptide complexes. Immunity. 2002;16(1):87–98.

    Article  CAS  PubMed  Google Scholar 

  16. Kozlov G, Pocanschi CL, Rosenauer A, Bastos-Aristizabal S, Gorelik A, Williams DB, Gehring K. Structural basis of carbohydrate recognition by calreticulin. J Biol Chem. 2010;285(49):38612–20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Chen H, Li L, Weimershaus M, Evnouchidou I, van Endert P, Bouvier M. ERAP1-ERAP2 dimers trim MHC I-bound precursor peptides; implications for understanding peptide editing. Sci Rep. 2016;6:28902.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Hermann C, Trowsdale J, Boyle LH. TAPBPR: a new player in the MHC class I presentation pathway. Tissue Antigens. 2015;85(3):155–66.

    Article  CAS  PubMed  Google Scholar 

  19. Morozov GI, Zhao H, Mage MG, Boyd LF, Jiang J, Dolan MA, Venna R, Norcross MA, McMurtrey CP, Hildebrand W, et al. Interaction of TAPBPR, a tapasin homolog, with MHC-I molecules promotes peptide editing. Proc Natl Acad Sci U S A. 2016;113(8):E1006–15.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Neerincx A, Hermann C, Antrobus R, van Hateren A, Cao H, Trautwein N, Stevanovic S, Elliott T, Deane JE, Boyle LH. TAPBPR bridges UDP-glucose:glycoprotein glucosyltransferase 1 onto MHC class I to provide quality control in the antigen presentation pathway. elife. 2017;6:e23049.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Kaufman J, Jacob J, Shaw I, Walker B, Milne S, Beck S, Salomonsen J. Gene organisation determines evolution of function in the chicken MHC. Immunol Rev. 1999;167:101–17.

    Article  CAS  PubMed  Google Scholar 

  22. van Hateren A, Carter R, Bailey A, Kontouli N, Williams AP, Kaufman J, Elliott T. A mechanistic basis for the co-evolution of chicken tapasin and major histocompatibility complex class I (MHC I) proteins. J Biol Chem. 2013;288(45):32797–808.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Walker BA, Hunt LG, Sowa AK, Skjodt K, Gobel TW, Lehner PJ, Kaufman J. The dominantly expressed class I molecule of the chicken MHC is explained by coevolution with the polymorphic peptide transporter (TAP) genes. Proc Natl Acad Sci U S A. 2011;108(20):8396–401.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Chappell P, Meziane El K, Harrison M, Magiera L, Hermann C, Mears L, Wrobel AG, Durant C, Nielsen LL, Buus S, et al. Expression levels of MHC class I molecules are inversely correlated with promiscuity of peptide binding. elife. 2015;4:e05345.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Tregaskes CA, Harrison M, Sowa AK, van Hateren A, Hunt LG, Vainio O, Kaufman J. Surface expression, peptide repertoire, and thermostability of chicken class I molecules correlate with peptide transporter specificity. Proc Natl Acad Sci U S A. 2016;113(3):692–7.

    Article  CAS  PubMed  Google Scholar 

  26. Rudolph MG, Stevens J, Speir JA, Trowsdale J, Butcher GW, Joly E, Wilson IA. Crystal structures of two rat MHC class Ia (RT1-a) molecules that are associated differentially with peptide transporter alleles TAP-A and TAP-B. J Mol Biol. 2002;324(5):975–90.

    Article  CAS  PubMed  Google Scholar 

  27. Joly E, Le Rolle AF, Gonzalez AL, Mehling B, Stevens J, Coadwell WJ, Hunig T, Howard JC, Butcher GW. Co-evolution of rat TAP transporters and MHC class I RT1-a molecules. Curr Biol. 1998;8(3):169–72.

    Article  CAS  PubMed  Google Scholar 

  28. Namikawa C, Salter-Cid L, Flajnik MF, Kato Y, Nonaka M, Sasaki M. Isolation of Xenopus LMP-7 homologues. Striking allelic diversity and linkage to MHC. J Immunol. 1995;155(4):1964–71.

    CAS  PubMed  Google Scholar 

  29. Ohta Y, Powis SJ, Lohr RL, Nonaka M, Pasquier LD, Flajnik MF. Two highly divergent ancient allelic lineages of the transporter associated with antigen processing (TAP) gene in Xenopus: further evidence for co-evolution among MHC class I region genes. Eur J Immunol. 2003;33(11):3017–27.

    Article  CAS  PubMed  Google Scholar 

  30. Grimholt U, Tsukamoto K, Azuma T, Leong J, Koop BF, Dijkstra JM. A comprehensive analysis of teleost MHC class I sequences. BMC Evol Biol. 2015;15:32-49.

  31. McConnell SC, Hernandez KM, Wcisel DJ, Kettleborough RN, Stemple DL, Yoder JA, Andrade J, de Jong JL. Alternative haplotypes of antigen processing genes in zebrafish diverged early in vertebrate evolution. Proc Natl Acad Sci U S A. 2016;113(34):E5014–23.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Miura F, Tsukamoto K, Mehta RB, Naruse K, Magtoon W, Nonaka M. Transspecies dimorphic allelic lineages of the proteasome subunit beta-type 8 gene (PSMB8) in the teleost genus Oryzias. Proc Natl Acad Sci U S A. 2010;107(50):21599–604.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Nonaka MI, Nonaka M. Evolutionary analysis of two classical MHC class I loci of the medaka fish, Oryzias Latipes: haplotype-specific genomic diversity, locus-specific polymorphisms, and interlocus homogenization. Immunogenetics. 2010;62(5):319–32.

    Article  CAS  PubMed  Google Scholar 

  34. Tsukamoto K, Miura F, Fujito NT, Yoshizaki G, Nonaka M. Long-lived dichotomous lineages of the proteasome subunit beta type 8 (PSMB8) gene surviving more than 500 million years as alleles or paralogs. Mol Biol Evol. 2012;29(10):3071–9.

    Article  CAS  PubMed  Google Scholar 

  35. Grimholt U, Larsen S, Nordmo R, Midtlyng P, Kjoeglum S, Storset A, Saebo S, Stet RJ. MHC polymorphism and disease resistance in Atlantic salmon (Salmo salar); facing pathogens with single expressed major histocompatibility class I and class II loci. Immunogenetics. 2003;55(4):210–9.

    Article  CAS  PubMed  Google Scholar 

  36. Macqueen DJ, Johnston IA. A well-constrained estimate for the timing of the salmonid whole genome duplication reveals major decoupling from species diversification. Proc R Soc B. 2013;281:20132881.

    Article  Google Scholar 

  37. Lukacs MF, Harstad H, Bakke HG, Beetz-Sargent M, McKinnel L, Lubieniecki KP, Koop BF, Grimholt U. Comprehensive analysis of MHC class I genes from the U-, S-, and Z-lineages in Atlantic salmon. BMC Genomics. 2010;11:154.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Lukacs MF, Harstad H, Grimholt U, Beetz-Sargent M, Cooper GA, Reid L, Bakke HG, Phillips RB, Miller KM, Davidson WS, et al. Genomic organization of duplicated major histocompatibility complex class I regions in Atlantic salmon (Salmo salar). BMC Genomics. 2007;8:251.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Lien S, Koop BF, Sandve SR, Miller JR, Kent MP, Nome T, Hvidsten TR, Leong JS, Minkley DR, Zimin A, et al. The Atlantic salmon genome provides insights into rediploidization. Nature. 2016;533(7602):200–5.

    Article  CAS  PubMed  Google Scholar 

  40. Berthelot C, Brunet F, Chalopin D, Juanchich A, Bernard M, Noel B, Bento P, Da Silva C, Labadie K, Alberti A, et al. The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates. Nat Comm. 2014;5:3657.

    Article  Google Scholar 

  41. Rondeau EB, Minkley DR, Leong JS, Messmer AM, Jantzen JR, von Schalburg KR, Lemon C, Bird NH, Koop BF. The genome and linkage map of the northern pike (Esox lucius): conserved synteny revealed between the salmonid sister group and the Neoteleostei. PLoS One. 2014;9(7):e102089.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Shiina T, Dijkstra JM, Shimizu S, Watanabe A, Yanagiya K, Kiryu I, Fujiwara A, Nishida-Umehara C, Kaba Y, Hirono I, et al. Interchromosomal duplication of major histocompatibility complex class I regions in rainbow trout (Oncorhynchus mykiss), a species with a presumably recent tetraploid ancestry. Immunogenetics. 2005;56(12):878–93.

    Article  CAS  PubMed  Google Scholar 

  43. Matsuo M, Asakawa S, Shimizu N, Kimura H, Nonaka M. Nucleotide sequence of the MHC class I genomic region of a teleost, the medaka (Oryzias latipes). Immunogenetics. 2002;53:930–40.

    Article  CAS  PubMed  Google Scholar 

  44. Tsukamoto K, Hayashi S, Matsuo M, Nonaka M, Kondo M, Shima MI, Asakawa S, Shimizu N, Nonaka M. Unprecedented intraspecific diversity of the MHC class I region of a teleost medaka, Oryzias latipes. Immunogenetics. 2005;57:420–31.

    Article  CAS  PubMed  Google Scholar 

  45. Kiryu I, Dijkstra JM, Sarder RI, Fujiwara A, Yoshiura Y, Ototake M. New MHC class Ia Domain lineages in rainbow trout (Oncorhynchus mykiss) which are shared with other fish species. Fish Shellfish Immunol. 2005;18:243–54.

    Article  CAS  PubMed  Google Scholar 

  46. Hansen JD, Strassburger P, Du PL. Conservation of an alpha 2 domain within the teleostean world, MHC class I from the rainbow trout Oncorhynchus mykiss. Dev Comp Immunol. 1996;20(6):417–25.

    Article  CAS  PubMed  Google Scholar 

  47. Nonaka MI, Aizawa K, Mitani H, Bannai HP, Nonaka M. Retained orthologous relationships of the MHC class I genes during euteleost evolution. Mol Biol Evol. 2011;28(11):3099–112.

    Article  CAS  PubMed  Google Scholar 

  48. Shum BP, Guethlein L, Flodin LR, Adkison MA, Hedrick RP, Nehring RB, Stet RJ, Secombes C, Parham P. Modes of salmonid MHC class I and II evolution differ from the primate paradigm. J Immunol. 2001;166(5):3297–308.

    Article  CAS  PubMed  Google Scholar 

  49. Koch J, Guntrum R, Tampe R. The first N-terminal transmembrane helix of each subunit of the antigenic peptide transporter TAP is essential for independent tapasin binding. FEBS Lett. 2006;580(17):4091–6.

    Article  CAS  PubMed  Google Scholar 

  50. Sever L, Vo NT, Bols NC, Dixon B. Rainbow trout (Oncorhynchus mykiss) contain two calnexin genes which encode distinct proteins. Dev Comp Immunol. 2014;42(2):211–9.

    Article  CAS  PubMed  Google Scholar 

  51. Magor KE, Shum BP, Parham P. The beta 2-microglobulin locus of rainbow trout (Oncorhynchus mykiss) contains three polymorphic genes. J Immunol. 2004;172(6):3635–43.

    Article  CAS  PubMed  Google Scholar 

  52. Shum BP, Azumi K, Zhang S, Kehrer SR, Raison RL, Detrich HW, Parham P. Unexpected beta2-microglobulin sequence diversity in individual rainbow trout. Proc Natl Acad Sci U S A. 1996;93(7):2779–84.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Kondo H, Darawiroj D, Gung YT, Yasuike M, Hirono I, Aoki T. Identification of two distinct types of beta-2 microglobulin in marine fish, Pagrus major and Seriola quinqueradiata. Vet Immunol Immunopathol. 2010;134(3–4):284–8.

    Article  CAS  PubMed  Google Scholar 

  54. Perarnau B, Siegrist CA, Gillet A, Vincent C, Kimura S, Lemonnier FA. Beta 2-microglobulin restriction of antigen presentation. Nature. 1990;346(6286):751–4.

    Article  CAS  PubMed  Google Scholar 

  55. Liu H, Peatman E, Wang W, Abernathy J, Liu S, Kucuktas H, Lu J, Xu DH, Klesius P, Waldbieser G, et al. Molecular responses of calreticulin genes to iron overload and bacterial challenge in channel catfish (Ictalurus punctatus). Dev Comp Immunol. 2011;35(3):267–72.

    Article  CAS  PubMed  Google Scholar 

  56. Braasch I, Gehrke AR, Smith JJ, Kawasaki K, Manousaki T, Pasquier J, Amores A, Desvignes T, Batzel P, Catchen J, et al. The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons. Nat Genet. 2016;48(4):427–37.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Raghavan M, Wijeyesakere SJ, Peters LR, Del Cid N. Calreticulin in the immune system: ins and outs. Trends Immunol. 2013;34(1):13–21.

    Article  CAS  PubMed  Google Scholar 

  58. Kales S, Fujiki K, Dixon B. Molecular cloning and characterization of calreticulin from rainbow trout (Oncorhynchus mykiss). Immunogenetics. 2004;55(10):717–23.

    Article  CAS  PubMed  Google Scholar 

  59. Kales SC, Bols NC, Dixon B. Calreticulin in rainbow trout: a limited response to endoplasmic reticulum (ER) stress. Comp Biochem Physiol B Biochem Mol Biol. 2007;147(4):607–15.

    Article  PubMed  Google Scholar 

  60. Sever L, Bols NC, Dixon B. The cloning and inducible expression of the rainbow trout ERp57 gene. Fish Shellfish Immunol. 2013;34(2):410–9.

    Article  CAS  PubMed  Google Scholar 

  61. Pinto RD, Moreira AR, Pereira PJ, dos Santos NM. Two thioredoxin-superfamily members from sea bass (Dicentrarchus labrax, L.): characterization of PDI (PDIA1) and ERp57 (PDIA3). Fish Shellfish Immunol. 2013;35(4):1163–75.

    Article  CAS  PubMed  Google Scholar 

  62. Landis ED, Palti Y, Dekoning J, Drew R, Phillips RB, Hansen JD. Identification and regulatory analysis of rainbow trout tapasin and tapasin-related genes. Immunogenetics. 2006;58(1):56–69.

    Article  CAS  PubMed  Google Scholar 

  63. Teng MS, Stephens R, Du Pasquier L, Freeman T, Lindquist JA, Trowsdale J. A human TAPBP (TAPASIN)-related gene, TAPBP-R. Eur J Immunol. 2002;32(4):1059–68.

    Article  CAS  PubMed  Google Scholar 

  64. Dong G, Wearsch PA, Peaper DR, Cresswell P, Reinisch KM. Insights into MHC class I peptide loading from the structure of the tapasin-ERp57 thiol oxidoreductase heterodimer. Immunity. 2009;30(1):21–32.

    Article  PubMed  PubMed Central  Google Scholar 

  65. Rizvi SM, Del Cid N, Lybarger L, Raghavan M. Distinct functions for the glycans of tapasin and heavy chains in the assembly of MHC class I molecules. J Immunol. 2011;186(4):2309–20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Paulsson KM, Jevon M, Wang JW, Li S, Wang P. The double lysine motif of tapasin is a retrieval signal for retention of unstable MHC class I molecules in the endoplasmic reticulum. J Immunol. 2006;176(12):7482–8.

    Article  CAS  PubMed  Google Scholar 

  67. Salamov AA, Solovyev VV. Ab initio gene finding in drosophila genomic DNA. Genome Res. 2000;10(4):516–22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. CLC Genomics Workbench 6.

  69. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, et al. Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23(21):2947–8.

    Article  CAS  PubMed  Google Scholar 

  70. Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33(7):1870–4.

    Article  CAS  PubMed  Google Scholar 

  71. Benton MJ, Donoghue PC. Paleontological evidence to date the tree of life. Mol Biol Evol. 2007;24(1):26–53.

    Article  CAS  PubMed  Google Scholar 

  72. Campbell MA, Lopez JA, Sado T, Miya M. Pike and salmon as sister taxa: detailed intraclade resolution and divergence time estimation of Esociformes + Salmoniformes based on whole mitochondrial genome sequences. Gene. 2013;530(1):57–65.

    Article  CAS  PubMed  Google Scholar 

  73. Near TJ, Eytan RI, Dornburg A, Kuhn KL, Moore JA, Davis MP, Wainwright PC, Friedman M, Smith WL. Resolution of ray-finned fish phylogeny and timing of diversification. Proc Natl Acad Sci U S A. 2012;109(34):13698–703.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Crete-Lafreniere A, Weir LK, Bernatchez L. Framing the Salmonidae Family phylogenetic portrait: a more complete picture from increased taxon sampling. PLoS One. 2012;7(10):e46662.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Jones DT, Taylor WR, Thornton JM. The rapid generation of mutations data matrixes from protein sequences. CABIOS. 1992;8:275–82.

    CAS  PubMed  Google Scholar 

  76. Le SQ, Gascuel O. An improved general amino acid replacement matrix. Mol Biol Evol. 2008;25(7):1307–20.

    Article  CAS  PubMed  Google Scholar 

  77. Whelan S, Goldman N. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol. 2001;18(5):691–9.

    Article  CAS  PubMed  Google Scholar 

  78. Nonaka M, Yamada-Namikawa C, Flajnik MF, Du Pasquier L. Trans-species polymorphism of the major histocompatibility complex-encoded proteasome subunit LMP7 in an amphibian genus. Xenopus Immunogenetics. 2000;51(3):186–92.

    Article  CAS  PubMed  Google Scholar 

Download references


Not applicable.


This study was funded by the Norwegian Veterinary Institute who had no role in the the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Availability of data and materials

All data supporting the conclusions of this article are referred to or included within the article and its additional files. The datasets used and/or analysed during the current study are available from the corresponding author on request.

Author information

Authors and Affiliations



UG was the only scientist involved in data gathering, analyses and manuscript drafting. The author read and approved the final manuscript.

Corresponding author

Correspondence to Unni Grimholt.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The author declares that she has no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Figure S1. Teleost MHCI haplotypes. Text S1. Deduced amino acid gene sequences. Text S2. Alignment of deduced PSMB8 amino acid sequences. Figure S2. MHCI data. Figure S3. PSMB9 and PSMB12 data. Figure S4. PSMB7, PSMB10, PSMB13 data. Text S3. Alignment of deduced TAP2 amino acid sequences. Figure S5. TAPBP data. Figure S6. CANX, CALR and CALRL data. Figure S7. B2m data. Figure S8. ERp57 and ERp57L data. Text S4. TAPBP; TAPBPR and TAPBPL data. Figure S9. PSME data. Figure S10. ERAP data (PDF 4922 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Grimholt, U. Whole genome duplications have provided teleosts with many roads to peptide loaded MHC class I molecules. BMC Evol Biol 18, 25 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: