Schematic representations of the PPE38 gene region in the H37 reference strain published sequences. The PPE38 region from the published H37Rv (2a) and H37Ra (2b) sequences are shown. Colour coding as follows: PPE38 pale blue, PPE71 dark blue, MRA_2374 pale green, MRA_2375 dark green. Locations of the PPE38F/R and PPE38 IntF/R primers are shown. 2a. H37Rv ATCC reference strain (published whole genome sequence) The published H37Rv sequence  represents the RvD7 genotype. Recombination between PPE38 and PPE71 results in a single PPE38/71 gene (Rv2352c) and loss of the 2 esx-like genes MRA_2374 and MRA_2375. The PPE38F/R primers (black arrows) are predicted to produce an amplicon of 1335 bp from the RvD7 genotype. It is impossible to determine which PPE38/71 gene has been deleted hence the mixture of colours used. The published H37Rv sequence is not representative of the H37Rv ATCC reference strain, most clinical isolates, or the H37Ra whole genome sequence . This genotype is also seen in strains SAWC 2240 (CAS, F20), SAWC 1748 (Pre-Haarlem, F24), SAWC 1595 (Quebec/S), SAWC 1841 (Haarlem, F4), CPHL_A (WA-1, M. africanum), T17 (PGG1, EAI), EAS054 (PGG1, EAI), strain C (LCC, "3 bander") and Haarlem (PGG2, F4) [see additional file 1]. 2b. H37Rv ATCC reference strain (actual) and H37Ra (published whole genome sequence) This represents the ancestral MTBC genotype that is also seen in M. canettii. It contains the 2 identical PPE38 (MRA_2373) and PPE71 (MRA_2376) genes separated by the 2 esx-like genes MRA_2374 and MRA_2375. Gene annotations are as reported for the H37Ra published sequence . Locations of primers used for PCR and sequence analysis are indicated (black arrows). This is also the true genotype of the ATTC reference strain H37Rv.