Genetic immune escape landscape in primary and metastatic cancer

feature-image

Play all audios:

Loading...

ABSTRACT Studies have characterized the immune escape landscape across primary tumors. However, whether late-stage metastatic tumors present differences in genetic immune escape (GIE)


prevalence and dynamics remains unclear. We performed a pan-cancer characterization of GIE prevalence across six immune escape pathways in 6,319 uniformly processed tumor samples. To address


the complexity of the HLA-I locus in the germline and in tumors, we developed LILAC, an open-source integrative framework. One in four tumors harbors GIE alterations, with high mechanistic


and frequency variability across cancer types. GIE prevalence is generally consistent between primary and metastatic tumors. We reveal that GIE alterations are selected for in tumor


evolution and focal loss of heterozygosity of _HLA-I_ tends to eliminate the HLA allele, presenting the largest neoepitope repertoire. Finally, high mutational burden tumors showed a


tendency toward focal loss of heterozygosity of _HLA-I_ as the immune evasion mechanism, whereas, in hypermutated tumors, other immune evasion strategies prevail. SIMILAR CONTENT BEING


VIEWED BY OTHERS IMMUNOGENOMIC PAN-CANCER LANDSCAPE REVEALS IMMUNE ESCAPE MECHANISMS AND IMMUNOEDITING HISTORIES Article Open access 03 August 2021 FREQUENCY-DEPENDENT SELECTION OF


NEOANTIGENS FOSTERS TUMOR IMMUNE ESCAPE AND PREDICTS IMMUNOTHERAPY RESPONSE Article Open access 25 June 2024 PERSISTENT MUTATION BURDEN DRIVES SUSTAINED ANTI-TUMOR IMMUNE RESPONSES Article


Open access 26 January 2023 MAIN Cancer immune escape is the process whereby tumor cells prevent their elimination by the immune system1,2. Tumors acquire this capacity as a response to the


accumulation of tumor-specific alterations, which may be presented—in the form of neoepitopes—by the major histocompatibility complex class I (MHC-I). Escape from immune system recognition


often involves tumor-specific genomic alterations in immune-related pathways, a process named genetic immune escape (GIE). GIE alterations operate through different mechanisms, including


partial or complete abrogation of neoepitope presentation3 or suppression of proapoptotic signals from the surrounding immune cells4. Therefore, identification of GIE events across human


cancers is key to understanding the interplay between cancer cells and the immune system, as well as to enable effective precision medicine based on immunotherapy. Previous studies have


performed cancer type-specific molecular profiling of GIE events and their phenotypic implications in several cancer types, including non-small-cell lung cancer5,6 (NSCLC) and colorectal


carcinoma7, among others8,9. Others have performed an extensive analysis of loss of heterozygosity (LOH) of _HLA-I_ across thousands of tumor samples10. However, a pan-cancer analysis of the


prevalence and impact of diverse GIE events is currently lacking. In addition, the focus of these studies was to portray GIE in early stage primary tumors, whereas the changes induced by


exposure to treatment and by the metastatic bottleneck have not been comprehensively addressed. One of the main challenges to perform such analyses lies in the extraordinary diversity of the


_HLA-I_ locus, with >15,000 different sequences of the _HLA-A_, _HLA-B_ and _HLA-C_ genes reported to date11. This extensive polymorphism hampers the identification of tumor-specific


somatic alterations, prompting the development of tools that specifically identify LOH of _HLA-I_ (ref. 12) or _HLA-I_ somatic mutations13 from whole-exome sequencing (WES) and whole-genome


sequencing (WGS) data. However, none of these tools provides an integrative characterization of the _HLA-I_ tumor status in both the germline and the tumor, which includes _HLA-I_ typing,


allelic imbalance, LOH of _HLA-I_ and somatic mutation annotation. In the present study, we present a pan-cancer landscape of the GIE prevalence in primary (represented by the PCAWG


(pan-cancer analysis of whole genomes) cohort) and unmatched metastatic patients (represented by the Hartwig cohort). Furthermore, to address the complexity of the _HLA-I_ locus, we


developed LILAC, an open-source integrative framework that characterizes the _HLA-I_ locus, including its tumor status from WGS data. We applied LILAC and a universal tumor-processing


pipeline to establish a comprehensive portrait of GIE events and their positive selection landscape across six different pathways associated with an immune evasion phenotype: the _HLA-I_


locus, the antigen presentation machinery, interferon (IFN)-γ signaling pathway, the programmed cell death ligand 1 (PD-L1) immune checkpoint, the costimulatory signaling by the CD58


receptor and epigenetic immune escape driven by _SETDB1_ (Fig. 1a and Supplementary Table 1). We also studied how the tumor mutational burden (TMB) and other genomic and environmental


features influence the prevalence of GIE alterations, providing insights into tumorigenesis and its interplay with the immune system. RESULTS INFERENCE OF _HLA-I_ TUMOR STATUS WITH LILAC


Inference of the correct _HLA-I_ tumor status is fundamental to identifying GIE alterations (Fig. 1a), to estimate the neoepitope repertoire and burden and to predict the response to immune


checkpoint inhibitors14,15 (ICIs). We have developed LILAC, a framework that performs _HLA-I_ typing for the germline of each patient, as well as determining the status of each of those


alleles in the tumor using WGS data on tumor-normal paired samples as input. LILAC also allows for detection of novel human leukocyte antigen (HLA) alleles and provides allele-specific and


sample-level, quality control measurements (Fig. 1b and Supplementary Note 1). We first assessed LILAC’s _HLA-I_ typing robustness by independently calculating the germline and tumor _HLA-I_


two-field-calling agreement across 6,279 patients, including 4,439 patients from the Hartwig16 dataset and 1,839 from the PCAWG17 cohort. LILAC showed the highest agreement compared with


two state-of-the-art _HLA_ typing tools, Polysolver13 and xHLA18 (Fig. 1c, Extended Data Fig. 1a and Supplementary Data 1). The Hartwig dataset showed higher normal-tumor agreement for all


tools, possibly due to the higher sequencing coverage and read quality of this dataset. In a three-way comparison, LILAC also displayed the highest overlap with the predictions from the


other tools across both datasets (Extended Data Fig. 1b,c). Moreover, LILAC’s _HLA-I_ typing performance on three family trios with diverse genetic ancestries showed a perfect agreement with


previously reported _HLA-I_ types (Fig. 1d). Next, we demonstrated WES applicability by running LILAC on the TRACERx100 lung cohort, where it showed a 98.16% agreement with the _HLA-I_


types originally reported in the publication12 (Fig. 1e). Finally, we evaluated LILAC _HLA-I_ typing sensitivity in a set of 95 samples with challenging _HLA-I_ types—including 10 from tumor


biopsies—with an independent orthogonal and clinically validated _HLA-I_ typing approach (Supplementary Note 1). LILAC showed a perfect 100% two-field agreement across the 564 alleles,


higher than Polysolver (93.09%) and xHLA (98.94%) agreements (Fig. 1f and Supplementary Data 1). To conclude, LILAC reported nine somatic mutations in seven of the tumor biopsies evaluated.


All of them were perfectly matched by the orthogonal approach (Supplementary Data 1). HLA allele-specific, tumor copy number (CN) determination is key to identify LOH of _HLA-I_ genes in


tumors, a well-established mechanism of immune evasion10,12. LILAC annotates allele-specific ploidy levels of each _HLA-I_ allele based on the purity-corrected local tumor CN estimations and


the number of fragments assigned to each allele (Supplementary Note 1). WGS data provide adequate resolution to annotate purity-adjusted minor and major allele ploidy in the HLA-I locus


(Extended Data Fig. 1d,e). Moreover, we quantified LILAC’s agreement with LOHHLA12 in the TRACERx100 lung WES cohort. LILAC and LOHHLA estimates displayed a global 90% agreement (Fig. 1g and


Supplementary Data 1). Importantly, high tumor purity samples showed considerably better concordance than low-purity samples (96.08% in samples with tumor purity ≥0.3, 75.51% when tumor


purity <0.3), reflecting increased challenges for genome-wide CN loss calling in low-purity WES samples. Finally, the three tumor samples harboring LOH of _HLA-I_, according to our


framework and evaluated by the orthogonal approach, displayed a strong allelic imbalance in the experimental validation (Supplementary Data 1). GIE PREVALENCE ACROSS CANCER TYPES We then


combined LILAC with the Hartwig tumor analytical cancer WGS pipeline16,19 to annotate GIE events across 6 pathways strongly associated with immune escape (Fig. 1a and Supplementary Table 1)


across 6,319 uniformly processed WGS samples20, including 1,880 primary patients from PCAWG and 4,439 patients with metastases from Hartwig (Fig. 2a, Extended Data Fig. 2a, Supplementary


Table 2 and Supplementary Note 1). In total, these patients were classified into 58 cancer types, which included 30 tumor types with sufficiently high representativeness (that is, number of


patients ≥15) in the metastatic cohort, 27 in the primary dataset and 20 cancer types with sufficient representation in both datasets (Fig. 2b, Extended Data Fig. 2b,c and Supplementary


Table 2). GIE prevalence showed high mechanistic and frequency variability across primary and metastatic cancer types (Fig. 2c,d, top panels and Supplementary Data 2). The median proportion


of patients harboring GIE alterations per cancer type was 0.27 for the metastatic cohort and 0.20 for primary tumors, both showing highly dispersed distributions (±0.15 s.d. and ±0.19 s.d.


in metastatic and primary tumors, respectively). In certain cancer types, such as pancreatic neuroendocrine (PANET, metastatic), diffuse large B-cell lymphoma (DLBCL, metastatic) and kidney


chromophobe cancer (KICH, primary), GIE was present in >50% of patient samples (65%, 55% and 74%, respectively) whereas in others, such as lung neuroendocrine (LUNET, metastatic), GIE was


an extremely rare event. Overall, one in four patients (26% in metastatic and 24% in primary) presented GIE alterations based on the six investigated pathways (Fig. 2c,d, bottom panels).


The most frequent GIE alteration was partial loss of the _HLA_-_I_ locus (including both LOH of _HLA-I_ and homozygous deletions of _HLA-I_ genes that were grouped as LOH of _HLA-I_ for


simplicity), which was present in 783 (18%) of metastatic and 319 (17%) of primary cancer patients, followed by IFN-γ inactivation (4% in metastatic and 3% in primary) and alterations in the


antigen presentation pathway (4% in metastatic and 3% in primary). CD58 inactivation was the least frequent immune escape event present in only 16 metastatic and 8 primary patients. The


high GIE rates of KICH and PANET were exclusively due to LOH of _HLA-I_ (Fig. 2e), whereas other cancer types displayed a wider range of GIE mechanisms (Fig. 2f). Of note, we did not observe


a significant mutual exclusivity between LOH of _HLA-I_ and other GIE events in cancer types with sufficient representation of multiple GIE mechanisms (Supplementary Data 2). This suggests


that certain tumors may require complementary GIE alterations, such as concurrent alterations that disrupt _HLA-I_-mediated neoepitope presentation and _CD58_ loss21, to effectively escape


immune surveillance. HIGH AGREEMENT BETWEEN PRIMARY AND METASTATIC GIE RATES We next sought to investigate whether there was a GIE prevalence difference between early stage primary and


late-stage metastatic tumors. Comparison by tumor type across the 20 cancer types with sufficient representation showed a broad agreement between both stages (Fig. 3a). Although nine cancer


types showed a certain degree of metastatic enrichment (log2(odds ratio) (log2(OR)) > 0.5; Fig. 3a,b), only in prostate carcinoma (PRAD) and thyroid cancer (THCA) was this difference


statistically significant (Fisher’s exact test corrected _P_ < 0.01). The significant enrichment in these two cancer types might be connected to the substantial genome transformation at


the metastatic transition20. Breaking down pathway-specific differences revealed that THCA metastatic enrichment is the result of increased LOH of _HLA-I_ incidence, whereas the


discrepancies in PRAD are the result of a widespread enrichment across several pathways (Fig. 3b). In general, LOH of _HLA-I_ showed a nonsignificant trend toward metastatic enrichment


across seven of the nine metastatic-enriched cancer types. None of the cancer types showed a significantly higher GIE incidence in primary tumors. POSITIVE SELECTION OF _HLA-I_ ALTERATIONS


We next examined to what extent somatic alterations in _HLA-I_ genes (that is, _HLA-A_, _HLA-B_ and _HLA-C_) were positively selected during tumorigenesis. First, a pan-cancer-grouped


_HLA-I_ analysis revealed a nonsynonymous:synonymous substitution (dN:dS) ratio >1 for nonsense, splice site and truncating variants in both the metastatic and the primary datasets (Fig.


4a), indicating that these genes are subject to positive selection. Next, pan-cancer and gene-specific dN:dS ratios showed that _HLA-A_ and _HLA-B_, but not _HLC-C_, are positively selected


and are mostly enriched in truncating variants but not in missense mutations (Fig. 4b,c). Finally, gene and cancer type-specific analysis showed that _HLA-A_ and _HLA-B_ were deemed as


drivers across several cancer types, including metastatic colorectal, NSCLC and DLBCL as well as the pan-cancer cohorts (Fig. 4d,e and Supplementary Data 3). Somatic point mutations and


small indels (insertions and deletions) of _HLA-I_ genes were evenly distributed along their sequences (Fig. 4f and Extended Data Fig. 3a). The main exception was the recurrent _HLA-A_


Lys210 frameshift indel (chromosome 6 at position 29911899), which was observed in six mismatch repair-deficient (MMRd) metastatic tumors. This genomic region overlaps with a (C)7


homopolymer repeat, which probably explains its susceptibility for the observed base indel. No enrichment for mutations in amino acids involved in the peptide binding was observed. Such


uniform distribution was in agreement with previous observations22 and with the expected profile in tumor-suppressor genes dominated by inactivating variants23. LOH of _HLA-I_ trims the


repertoire of _HLA-I_-presented epitopes in _HLA-I_ heterozygous individuals. Therefore, to further shed light on the tumorigenic role of LOH of _HLA-I_, we developed a randomization


strategy that pinpoints cancer types where the LOH of _HLA-I_ rates were significantly higher than the expected, given their background LOH rates using three genomic resolutions (that is,


nonfocal LOH including all LOH events spanning >75% of the chromosome arm length, focal LOH for those events <75% of the chromosome arm and highly focal LOH for LOH events <3 Mb).


In spite of the global correlation with background genome-wide LOH rates (Extended Data Fig. 3b), our analyses revealed higher-than-expected rates of LOH of _HLA-I_ across several cancer


types in both the metastatic and the primary datasets (G-test goodness of fit _q_ value <0.1; Fig. 4d,e and Supplementary Data 3). PANET (Fig. 4g) and KICH (Extended Data Fig. 3c) showed


nonfocal LOH of _HLA-I_ enrichment. Others, such as metastatic cervix carcinoma (Fig. 4h), metastatic colorectal cancer (Fig. 4i) or primary DLBCL, showed focal or highly focal LOH of


_HLA-I_ patterns. Furthermore, 33 patients with nonsynonymous mutations of _HLA-I_ genes (20% of the total 159 patients with mutations in _HLA-I_ genes) displayed the concurrent loss of the


alternative allele by LOH, potentially leading to complete inactivation of the _HLA_ gene. Finally, we did not observe any biallelic deletion of the entire _HLA-I_ locus (Supplementary Data


3), suggesting that homozygous deletions within the _HLA-I_ might be constrained by purifying selection, featuring the importance of expressing a minimal amount of _HLA-I_ molecules to avoid


immune-alerter signals24. DIFFERENCES BETWEEN FOCAL AND NONFOCAL LOH OF HLA-I Our results suggest that LOH of _HLA-I_ is a positively selected genomic event in certain tumor types. However,


it remains unclear whether these losses target a specific allele and whether both focal and nonfocal LOH of HLA events display similar selective patterns. To address these questions, we


assessed whether LOH of _HLA-I_ tends to involve the allele(s) with the highest neoepitope ratio (that is, higher number of predicted neoepitopes compared with the alternative allele; Fig.


5a). We observed a positive association between the neoepitope ratio and the frequency of the allele with highest neoepitope repertoire to be lost in both the metastatic and the primary


cohorts (Fig. 5b,c). This trend was significantly different from a neutral scenario where both alleles are equally likely to be lost independently of their neoepitope repertoire


(Kolmogorov–Smirnov test metastatic _P_ = 2.47 × 10−5 and primary _P_ = 2.24 × 10−7). Remarkably, the association between neoepitope ratio and the loss frequency became stronger when


selecting for focal LOH of _HLA-I_ events (Fig. 5d,e; _P_ = 1.71 × 10−9 and _P_ = 1.17 × 10−10 for metastatic and primary, respectively). However, it was indistinguishable from a neutral


scenario for nonfocal LOH of _HLA-I_ (Fig. 5f,g; _P_ = 0.32 and _P_ = 0.99 for metastatic and primary, respectively), showing that nonfocal LOH of _HLA-I_ does not select for the allele with


the highest neoepitope repertoire and that its high recurrency in several cancer types may be associated with other selective forces operating on chromosome 6. Furthermore, the majority of


focal LOH of _HLA-I_ events were CN neutral (81% in metastatic tumors and 70% in primary), which was considerably higher than for nonfocal events (65% in metastatic and 35% in primary),


providing further support for the notion that the loss of neoepitope repertoire, and not gene dosage, is the main driving force behind focal LOH of _HLA-I_. POSITIVE SELECTION OF GIE


ALTERATIONS BEYOND _HLA-I_ Alterations in other pathways beyond the HLA-I locus may also lead to immune escape. Hence, we explored signals of positive selection across 18 genes associated


with 5 immune escape pathways (pathways 2–6 in Fig. 1a). Grouped pan-cancer analysis of the dN:dS ratio in these pathways (covering a total of 16 genes, excluding those with an oncogenic


mechanism based on CN amplification; Methods) revealed a >1 ratio for nonsense, splice site and truncating variants in both the metastatic and the primary datasets (Fig. 6a), which was


indicative of positive selection. Refining the analysis for specific genes and cancer types revealed that two genes from the antigen presentation pathway (that is, _B2M_ and _CALR_)


displayed recurrent patterns of inactivating mutations and focal biallelic deletions across several tumor types, as well as in the pan-cancer cohorts (Fig. 6b–d). Moreover,


higher-than-expected frequencies of focal biallelic deletions for several IFN-γ pathway genes, including _JAK1_, _JAK2_ and _IRF2_, were also observed. _CD58_ also harbored a


higher-than-expected number of nonsynonymous mutations and homozygous deletions in DLBCL and the pan-cancer primary cohort. Finally, the chromatin modifier _SETDB1_ was recurrently focally


amplified in multiple cancer types, including metastatic NSCLC (Fig. 6e) and primary breast cancer. Full results are available in Supplementary Data 3. GIE ASSOCIATION WITH CANCER GENOMIC


FEATURES We next investigated whether, aside from cancer-type intrinsic differences, there were other cancer genomic and environmental features associated with GIE prevalence. Thus, we


performed a cancer type-specific univariate logistic regression of 99 tumor genomic features and 366 driver genes against the presence of GIE events (excluding nonfocal LOH of _HLA-I_)


across 38 cancer types (Supplementary Note 3). Moreover, to control for associations that may be secondary to increased mutation and CN variant (CNV) background rates, we filtered out


significant associations that were found in our GIE simulations (Supplementary Note 3). Overall, 35 genomic features and 5 driver genes showed a statistically significant association with


GIE in at least one cancer type (Fig. 7a and Extended Data Fig. 4a). Even after controlling for background mutation rates, TMB and patient’s neoepitope load were strongly associated with GIE


events in DLBCL, pancreas carcinoma and skin melanoma (_q_ value < 0.05, log2(OR) > 0.0 and simulated GIE prevalence ≤2%; Fig. 7a and Extended Data Fig. 4a–c). It is interesting that


clonal TMB and clonal neoepitope load showed a strong positive association with GIE, whereas subclonal TMB and neoepitope load showed a modest correlation (Fig. 7a–c), highlighting the


relevance of mutation cellularity in triggering immune responses. Finally, fusion-derived neoepitopes were significantly associated with GIE in DLBCL and NSCLC (Fig. 7a), which emphasizes


the importance of considering noncanonical sources of neoepitopes beyond small nonsynonymous variants in coding regions. Exposure to certain endogenous and exogenous mutational processes


have been correlated with increased immunogenicity25 and response to ICIs26,27. After controlling for molecular age and excluding non-GIE exclusive associations (that is, associations also


observed in the GIE simulations) several mutational processes showed significant association with GIE incidence (Fig. 7a and Extended Data Fig. 4a). First, MMRd mutational processes were


broadly associated with increased GIE incidence. Similarly, exposure to the APOBEC family of cytidine deaminases was strongly associated with GIE in multiple cancer types, including breast


carcinomas (Fig. 7d and Extended Data Fig. 4d). Last, the mutation burden associated with several exogenous mutational processes, such as ultraviolet light in skin melanoma (Fig. 7e and


Extended Data Fig. 4e) and platinum treatment in NSCLC (Fig. 7f and Extended Data Fig. 4f), was also significantly linked to an increased incidence of GIE events in these cancer types. We


also identified other tumor genomic features that were correlated with GIE. For instance, in colorectal cancers, which also include some patients with anal cancer, human papillomavirus DNA


integration was positively associated with GIE incidence (Fig. 7a). Moreover, high-immune infiltration, as determined by several RNA-sequencing-based deconvolution measurements


(Supplementary Note 3), was significantly linked with higher GIE incidence in this cancer type (Fig. 7g and Extended Data Fig. 4g), which is in agreement with previous reports7. Certain


driver alterations, beyond the GIE pathways considered in the present study, also showed a strong association with GIE events. Specifically, _CASP8_, _KMT2D_, _RPL22_ and _TGFBR2_


alterations tended to co-occur with GIE in patients with colorectal cancer. Of note, _CASP8_ (ref. 13) and _TGFBR2_ (ref. 28) alterations have previously been linked to immune surveillance


escape. Finally, other factors, such as the _HLA-I_ supertype, the germline _HLA-I_ divergence, patient chronological age or exposure to previous treatments, including immunotherapy, failed


to attain significant association with GIE (or the association was also observed in the simulated GIE). All the screened molecular features alongside their cancer type-specific significance


coefficients are available in Supplementary Data 4. THE SELECTED IMMUNE EVASION MECHANISMS DEPENDS ON TMB An increase in mutational load leads to the generation of neoepitopes susceptible to


recognition as neoantigens by the adaptive immune system. Therefore, we investigated the relationship between the frequency of GIE alterations (excluding nonfocal LOH of _HLA-I_) and the


TMB across 20 evenly distributed TMB buckets (Methods). We first observed that GIE frequency steadily increased with the TMB (Fig. 8a; observed GIE) and that this trend was not fully


explained by an increased background mutation and CNV rate (Fig. 8a; simulated GIE). More specifically, as the TMB increases, the observed GIE frequency deviates from the expected frequency


given by the GIE simulations. This is particularly noticeable for (ultra)hypermutated tumors, which showed a GIE incidence two- to threefold higher than the simulations. This trend was still


consistent after controlling for the cancer type (Extended Data Fig. 5a) and mutation clonality (Extended Data Fig. 5b). Using the burden of predicted neoepitopes based on the germline


_HLA-I_ profile as baseline also revealed an almost uniformly increasing distribution across the neoepitope buckets, which becomes sharper and higher than expected after the 17th bucket


(Fig. 8b). It is interesting that, in the bucket grouping samples with ~10–13 mutations per Mb, which is the minimal threshold regularly used as a response to ICIs, we observed an average


GIE frequency of 0.30 ± 0.03 s.d. Similarly, in the group of samples between 26 and 36 mutations per Mb, mostly including hypermutated tumors, the average frequency was 0.42 ± 0.06 s.d.,


whereas beyond ~95 mutations per Mb (considered to be ultra-hypermutated tumors29) we identified GIE alterations in >70% of samples (0.72 ± 0.06 s.d.). Our results thus showed that an


important fraction of patients eligible for ICIs harbored tumor alterations that may hinder recognition and/or elimination by the immune system. We then analyzed the relationship between the


TMB and the presence of specific GIE alterations across the six immune escape pathways included in the present study. Overall, the observed frequency distributions across these pathways


were remarkably different (Fig. 8c and Extended Data Fig. 5c). In fact, different types of _HLA-I_ alterations showed a distinctive frequency distribution along the TMB buckets. Nonfocal LOH


of _HLA-I_ was primarily present in low-TMB tumors, whereas focal LOH of _HLA-I_ showed a clear enrichment for mid and high TMB tumors, peaking around ~10–20 mutations per Mb (average


frequency of 0.22 ± 0.04 s.d.) and displaying an inverted U-shaped distribution. Finally, mutations in _HLA-I_ genes were more frequent in hypermutated tumors (that is, from ~26 mutations


per Mb to 36 mutations per Mb). Similarly, alterations in the antigen presentation machinery and the IFN-γ pathway were predominantly found in hypermutated tumors (Extended Data Fig. 5c).


The remaining pathways did not show any clear TMB preference, probably due the lower prevalence of these alterations in our dataset. Finally, using the number of predicted neoepitopes as


baseline revealed consistent distributions (Fig. 8d and Extended Data Fig. 5d). DISCUSSION In the present study, we have characterized the prevalence and impact of GIE alterations involved


in six major pathways across thousands of uniformly processed primary and metastatic tumors from fifty-eight cancer types. Moreover, we addressed the complexity of identifying tumor-specific


_HLA-I_ alterations by developing LILAC. Our results revealed that, on average, one in four patients bears a GIE event, primarily as a result of LOH of _HLA-I_. However, GIE incidence and


the targeted pathways showed high diversity across cancer types. Importantly, the fact that we did not observe mutual exclusivity between GIE alterations targeting different pathways


suggests that multiple GIE alterations may concur to effectively avoid immune surveillance. Remarkably, our analyses also showed that the frequency of GIE alterations in metastatic patients


are comparable to their primary counterparts across most cancer types. This result is also supported by independent studies6,30, denoting that early stages of tumorigenesis have already


acquired the capacity to escape from immune system recognition. Immune escape alterations were often positively selected during tumor evolution. Specifically, loss-of-function mutations in


_HLA-A_ and _HLA-B_, as well as multiple genes from other immune escape pathways, displayed higher-than-expected frequencies across several cancer types. Nevertheless, _HLA-C_ did not show a


significant enrichment in inactivating variants which may imply that its expression is needed to avoid natural killer-mediated immunity31 and that the neoepitope repertoire of this gene is


generally lower compared with _HLA-A_ and _HLA-B_. Finally, we also observed higher-than-expected LOH of _HLA-I_ rates across multiple cancer types. Related to this, focal and nonfocal LOH


of _HLA-I_ undergo divergent mechanisms of selection. Focal LOH of _HLA-I_ was primarily a CN-neutral event that tended to target the HLA allele with the largest neoepitope repertoire,


indicating an active role in immune evasion. On the contrary, we did not observe such allelic preference for nonfocal LOH of _HLA-I_, suggesting that alternative selective forces, such as


_DAXX_ haploinsufficiency32, are operating in these large-scale chromosome 6 events. Multiple tumor intrinsic and extrinsic features displayed a significant association with increased GIE


incidence. However, in our cohort, a patient’s exposure to previous cancer therapies, including immunotherapies, did not attain a significant association with GIE frequency, indicating that


the efficacy of GIE alterations may be compromised when dealing with the strong immune pressure released by ICIs. The tumor mutation and neoepitope burden influenced both the GIE frequency


and the targeted GIE pathway. Although focal LOH of _HLA-I_ was the most frequent mechanism in mid and high TMB tumors, the loss of certain _HLA-I_ alleles was apparently not sufficient to


cope with the neoepitope load of (ultra)hypermutated tumors, where a nontargeted GIE mechanism, such as antigen presentation abrogation, is probably needed. However, we cannot rule out the


fact that such differences may also be partially shaped by mutation and CNV rate differences across cancer types. It is important to mention that the GIE escalation as the TMB increases was


not entirely attributed to the underlying increase in background mutation rate, particularly in hypermutated tumors. Although the modeling of background GIE rates could be sensitive to the


selected randomization strategy, our results are supported by independent studies based on orthogonal analytical approaches30, evidencing the robustness of our conclusions. The present study


considered a collection of highly confident GIE alterations across six well-characterized, immune-related pathways. However, in our dataset, three of four patients did not harbor GIE


events, highlighting the need to characterize other mechanisms of immune evasion. These may involve not only alternative molecular pathways such as the _HLA-II_ (ref. 33), but also other


types of alterations such as germline variants34 and epigenetic modifications5,35. Finally, tumor extrinsic factors such as clonal hematopoiesis, tumor-associated microbiome or the tissue


architecture may also play an important role in tumor immune evasion. We expect that the combination of cancer genomics with high-resolution characterization of the tumor microenvironment


will aid in further understanding of the interplay between tumor evolution and the immune system. METHODS DATA COLLECTION AND PROCESSING The Hartwig Medical Foundation sequences and


characterizes the genomic landscape for a large number of patients with metastases. A detailed description of the consortium and the whole patient cohort has been given in detail in


Priestley et al.16. In the present study, the Hartwig cohort included 4,784 metastatic tumor samples from 4,468 patients. The Hartwig patient samples have been processed using the Hartwig


analytical pipeline5 (https://github.com/hartwigmedical/pipeline5) implemented in Platinum (v.1.0) (https://github.com/hartwigmedical/platinum). Briefly, Platinum is an open-source pipeline


designed for analyzing WGS tumor data. It enables a comprehensive characterization of tumor WGS samples (for example, somatic point mutations and indels, structural variants, CN changes) in


one single run. Hartwig samples that failed to provide a successful pipeline output, potential nontumor samples, with purity <0.2, with TMB < 50 SNVs/indels, lacking sufficient


informed consent for the present study or without enough read coverage to perform two-field HLA typing (Supplementary Note 1) were discarded. Similarly, for patients with multiple biopsies,


we selected the tumor sample with the most recent biopsy date and, if this information did not exist, we selected the sample with the highest tumor purity. However, some Hartwig patients had


biopsies from different primary tumor locations. In these cases, we kept at least one sample from each primary tumor location and, when there were multiple samples from the same primary


tumor location, we applied the aforementioned biopsy date and tumor purity-filtering criteria. A total number of 4,439 Hartwig samples were whitelisted and used in the present study


(Extended Data Fig. 2a and Supplementary Table 2). Preprocessed RNA-sequencing (RNA-seq) data by ISOFOX (https://github.com/hartwigmedical/hmftools/tree/master/isofox) were available for


1,864 Hartwig samples and were consequently used in the immune infiltration deconvolution analysis. Patient clinical data were obtained from the Hartwig database. Cancer-type labels were


harmonized to maximize the number of samples that had tumor types comparable with the PCAWG dataset (Supplementary Table 2). The PCAWG cohort consisted of 2,835 patient tumors and access for


raw sequencing data for the PCAWG-US was approved by the National Institutes of Health (NIH) for the dataset General Research Use in The Cancer Genome Atlas (TCGA) and downloaded via the


dbGAP download portal. Raw sequencing access to the non-US PCAWG samples was granted via the Data Access Compliance Office (DACO). A detailed description of the consortium and the whole


patient cohort has been given in Campbell et al.17. The samples were fully processed using the same cancer analytical pipeline applied to the Hartwig cohort (BWA38 v.0.7.17, GATK39 v.3.8.0,


SAGE16 v.2.2, GRIDSS40 v.2.9.3, PURPLE16 v.2.53 and LINX41 v.1.17). This enabled a harmonized analysis and eliminated the potential biases introduced by applying different methodological


approaches. Samples that failed to provide a successful pipeline output, with a tumor purity <0.2, potential nontumor samples, blacklisted by the PCAWG original publication17 or without


enough read coverage to perform two-field HLA typing were discarded. Similarly, for patients with multiple samples, we selected the first according to the aliquot ID alphabetical order. A


total number of 1,880 were whitelisted and used in the present study (Extended Data Fig. 2a and Supplementary Table 2). For more details about the re-processing of the PCAWG dataset and the


technical validation see Martínez-Jiménez et al.20. Preprocessed gene level expression data were downloaded for 1,118 samples from the International Cancer Genome Consortium (ICGC) portal


(https://dcc.icgc.org/releases/PCAWG/transcriptome/gene_expression/tophat_star_fpkm_uq.v2_aliquot_gl.tsv.gz). ENSEMBL identifiers were mapped to HUGO symbols. Of these samples, 930 belonged


to biopsies selected for the present study and were therefore used for the RNA analysis in PCAWG samples. The most recent clinical data were downloaded from the PCAWG release page


(https://dcc.icgc.org/releases/PCAWG) on August 2021. Cancer-type labels were harmonized to maximize the number of samples that had tumor types comparable with the Hartwig dataset


(Supplementary Table 2). LILAC All information relative to LILAC’s algorithm, implementation and validation is described in Supplementary Note 1. DEFINITIONS OF GIE ALTERATIONS We searched


in the literature for somatic genomic alterations that are robustly and recurrently associated with immune evasion. We stratified the reported alterations into six major pathways (Fig. 1a


and Supplementary Table 1): * (1) The _HLA-I_: somatic alterations in the _HLA-A_, _HLA-B_ and _HLA-C_ genes have been extensively reported as a mechanism for immune evasion across several


cancer types9,10,12,22. We considered LOH of _HLA-I_, homozygous deletions and somatic nonsynonymous mutations on these genes as immune evasion alterations. We defined LOH for _HLA-A_,


_HLA-B_ and _HLA-C_ as those cases with a minor allele ploidy <0.3 and a major allele ploidy >0.7 according to LILAC annotation. We also relied on LILAC mapping of somatic mutations


into _HLA-A_, _HLA-B_ and _HLA-C_ alleles to report samples with nonsynonymous alterations. Finally, we also used LILAC allele-specific tumor CN estimations to annotate samples with


homozygous deletions of _HLA-A_, _HLA-B_ and _HLA-C_ genes. A gene was homozygous deleted in a sample if the estimated minimum tumor CN of the gene was <0.5. * (2) The antigen


presentation pathway: several studies have reported the immunomodulatory effect of somatic inactivation of genes involved in the antigen presentation machinery (see Supplementary Table 1 for


gene-specific references). The most recurrent alteration is _B2M_ inactivation, but there are other genes involved in antigen presentation and antigen presentation activation, the


inactivation of which has been linked to increased immune evasion, including _CALR_, _TAP1_, _TAP2_, _TAPBP_, _NLRC5_, _CIITA_ and _RFX5_. We defined inactivation events as monoallelic and


biallelic clonal loss-of-function mutations (frameshift variant, stop gained, stop lost, splice acceptor variant, splice donor variant, splice region variant and start lost), biallelic


clonal nonsynonymous mutations not included in the former group (for example, missense mutations) and homozygous deletions. A gene was homozygous deleted in a sample if the estimated minimum


tumor CN of the gene was <0.5. * (3) The IFN-γ pathway: IFN-γ is a cytokine with known proapoptotic and immune booster capacities. Hence, it has been reported that tumors frequently


leverage somatic alterations targeting IFN-γ receptors and downstream effectors to evade immune system surveillance (see Supplementary Table 1 for gene-specific references). More


specifically, we considered that inactivation events (see above for specifics of which type of alterations are included) in _JAK1_, _JAK2_, _IRF2_, _IFNGR1_, _IFNGR2_, _APLNR_ and _STAT1_


have been probed to have the ability to provide an immune evasion phenotype. * (4) The PD-L1 receptor: the PD-L1 receptor, encoded by the _CD274_ gene, plays a major role in suppressing the


adaptive immune system. It has been reported how overexpression of PD-L1 in tumor cells leads to impaired recruitment of immune effectors42. We therefore considered _CD274_ CN amplification


as a genetic mechanism of immune evasion. We defined a _CD274_ CN amplification event as samples with _CD274_ minimum tumor CN >3× the average sample ploidy. * (5) The CD58 receptor: the


CD58 receptor, encoded by _CD58_, plays an essential role in T-cell recognition and stimulation. It has been extensively reported that _CD58_ alterations in B-cell lymphomas lead to immune


evasion21. Moreover, a recent study identified _CD58_ loss as one of the major effectors of impaired T-cell recognition43. Consequently, we considered inactivation events (see above) in


_CD58_ as alterations able to provide an immune escape phenotype. * (6) Epigenetic driven immune escape: it has been recently reported how _SETDB1_ amplification leads to epigenetic


silencing of tumor intrinsic immunogenicity44. _SETDB1_ amplification was recurrently found across several cancer types and was therefore considered in the present study as a mechanism of


immune evasion. We defined a _SETDB1_ CN amplification event as samples with SETDB1 minimum tumor CN >3× the mean sample ploidy. A summary table with all 21 considered genes, their


associated pathway, references and their type of somatic alterations is presented in Supplementary Table 1. GIE MUTUAL EXCLUSIVITY To assess whether LOH of HLA-I events were mutually


exclusive with other GIE events, we performed two statistical tests. First, we performed a left-sided Fisher’s exact test comparing two groups of annotations (LOH of _HLA-I_ and other GIE


events) in a cancer type-specific manner. Second, for each cancer type, we compared the number of samples bearing both LOH of _HLA-I_ and other GIE events with the expected given by 10,000


randomization, using the observed alteration frequency of both groups in the specific cancer type (LOH of _HLA-I_ and other GIE alterations). The significance was computed using an empirical


one-sided _P_ value (that is, number of randomizations with co-occurring events lower than the real observed value divided by the total number of randomizations). PRIMARY AND METASTATIC GIE


PREVALENCE The prevalence of a pathway alteration for a particular cohort was calculated as the number of samples with at least one alteration in the pathway divided by the total number of


cohort samples. The presence of a genetic immune alteration in a given sample was annotated if there was at least one pathway with an alteration in that sample. For the primary versus


metastatic comparison, we performed a tumor type-specific Fisher’s exact test comparing pathway-specific and global escaped status prevalence across the two cohorts. _P_ values were adjusted


with a multiple-testing correction using the Benjamini–Hochberg procedure (_α_ = 0.05). POSITIVE SELECTION: SOMATIC POINT MUTATIONS AND INDELS Positive selection analysis based on somatic


point mutation and small indels was performed using dNdScv and the hg19 reference genome. The analysis was performed in a cohort-specific, cancer-type and pan-cancer manner across the two


datasets. The analysis was restricted to datasets with sufficient representativeness (that is, number of samples ≥15). Global grouped dN:dS ratios of the _HLA-I_ (_HLA-A_, _HLA-B_ and


_HLA-C_) and the 16 non-_HLA-I_ genes potentially targeted by mutations (that is, excluding _SETDB1_ and _CD274_ because their immune escape phenotype is associated with CN gains;


Supplementary Table 1) were calculated in a pan-cancer manner using the gene_list attribute of the dndscv function. We used the geneci() function of dNdScv to estimate the pan-cancer and


gene-specific dN:dS ratios, which include confidence intervals (CIs), of the _HLA-I_ genes. POSITIVE SELECTION: CNAS We devised a statistical test to assess positive selection in LOH,


homozygous deletion (HD) and CN amplification (AMP) events. LOH was defined as those genomic regions where the minor allele ploidy of this gene was <0.3 and the major allele ploidy


>0.7. HD was defined as those regions with estimated minimum CN < 0.5. Similarly, AMP events were defined as those genomic regions with the minimum tumor CN >3× the mean sample


ploidy. For a particular type of genomic event overlapping with a gene, this test compares the number of observed samples bearing the alteration with the expected number after whole-genome


randomization. More specifically, these are the steps followed: * (1) Let us first denote _E_ as the type of query alteration (that is, LOH, HD or AMP), _S_ as a group of samples (usually


samples from the same cancer type and same dataset) and _G_s as the genomic scale (that is, nonfocal for segment lengths >75% chromosome arm, focal for segments <75% of the chromosome


arm and highly focal for segments <3 Mb). * (2) For every sample _S__i_ in {_S_1,_S_2,…_S__T_} we first gather the number and length of observed (_O__i_) segments targeted by _E_ within


that _G_s. Only _E_ events overlapping with autosomes are considered in the present study. Samples that do not harbor any event of type _E_ within that _G_s are ignored. * (3) Next, for


every sample _S__i_ we performed 10 independent randomizations (_R__i_1, _R__i_2, … _R__i_10) of the _O__i_ events, by randomly shuffling these events _E_ along the autosomes. For this, we


used the shuffle function from pybedtools45 with the following parameters (genome=‘hg19’, noOverlapping=True, excl=‘sexual_chomosomes’, allowBeyondChromEnd=False). In certain samples, with


an extremely high segment load (_O__i_ > 10,000) or with mean ploidy of ~1 (that is, monoploid genome), the noOverlapping flag was set to False because the randomization would not


converge. * (4) We then binned the autosomes into 28,824 bins of 100 kb and counted for each bin _k__j_ {_k_1, … _k_28,842} the total number of observed events _O__Tj_ as the sum of observed


events O1k, … OTK overlapping with that bin across all _S_ samples. * (5) Similarly, for each _R__i_th (_R_1, … _R_10) randomization and each bin _k__j_{_k_1,_… k_28,842}, we counted the


total number of simulated events as the sum of events—in that _i_th randomization and overlapping with that bin across all samples in _S_. * (6) We then performed a bin-specific comparison


of the _O__Tk_ with the average number of simulated events _R__TK_ across the ten simulations and performed a statistical test of significance using a G-test goodness of fit. As chromosome


starting bins were highly depleted in the simulated group (_R__TK_), we also computed the global simulated mean across all bins _k__j_{_k_1, … _k_28,842}, and used this as the expected


number of events for the statistical significance assessment. * (7) The _P_ values were adjusted (that is, converted to _q_ values) with a multiple-testing correction using the


Benjamini–Hochberg procedure (_α_ = 0.05). * (8) For each gene, overlapping with one or with multiple _k__j_ bins, we used the minimal adjusted _P_-value significance of the bin(s)


overlapping with the genomic location of the specific gene-coding sequence. Therefore, by definition, two genes sharing the same bins would have a similar _q_ value. We used ENSEMBL v.88 to


perform the annotation of gene exonic regions to hg19 genomic coordinates. We observed that LINE insertions near the HLA-I locus (LINE activation site at chr6:29,920,000) in some esophageal


cancer samples had an incorrect CN estimation due to multiple insertions originating from almost the same site in the same sample. Consequently, these samples were not considered in the


_HLA-I_ homozygous deletion analysis. DISTRIBUTION OF MUTATIONS IN _HLA-I_ GENES LILAC mapped the _HLA-A_, _HLA-B_ and _HLA-C_ somatic mutations detected by SAGE into the inferred _HLA-I_


alleles (see LILAC section). LILAC provides the consequence type and coding sequence position of _HLA-I_ alterations, which was used to display the distribution of mutations across the


_HLA-_I coding sequence. The 34 amino acids involved in peptide presentation were gathered from our neoepitope prioritization pipeline (see below). Pfam _HLA-A_, _HLA-B_ and _HLA-C_ domains


were manually downloaded from the Pfam36 website. TUMOR-SPECIFIC NEOEPITOPES The methodology for the identification and prioritization of neoepitopes is extensively described in


Supplementary Note 2. CALCULATION AND RANDOMIZATION OF NEOEPITOPE RATIO We wanted to evaluate whether LOH of _HLA-I_ tends to select the _HLA-I_ allele with highest neoepitope repertoire.


Let us first introduce the neoepitope allele ratio (nr). Given an _HLA-I_ gene, _G_, we defined nr as _G_A1/2 = _G_A1/(_G_A1 + _G_A2), where _G_A1/2 is the number of predicted neoepitopes of


allele 1 and allele 2, respectively. For each patient tumor sample, the assignment of allele number (that is, allele 1 or allele 2) was randomly performed. Then, we followed the next steps:


* (1) For each patient sample with LOH of _HLA-I_ we calculated the nr across the _HLA-I_ genes targeted by the LOH. Homozygous _HLA-I_ cases were not considered, because their nr is by


definition 0.5. * (2) We then grouped the nr into eight buckets: (0.0–0.35), (0.35, 0.4), (0.4, 0.45), (0.45, 0.5), (0.5, 0.55), (0.55, 0.6), (0.6, 0.65) and (0.65, 1.0). Consequently, each


bucket included _n_ allele pairs with an nr within the bound limits. * (3) Next, we performed 100 bootstraps by randomly subsampling 75% of the total number of available allele pairs in the


bucket. * (4) For each bootstrap iteration _i_th (_i_ \(\in\) 1, … 100) and each bucket we estimated the frequency of allele 1 loss (_F_A1_loss) as the number of cases with allele 1 loss


compared with the total number of cases in that bucket. Similarly, we computed the expected frequency (_F_A1exp) by randomly assigning LOH events to the allele 1 (background probably of


0.5). * (5) We then computed the bucket-specific average and s.d. of _F_A1loss and _F_A1exp values across the 100 bootstraps. * (6) Finally, we performed a Kolmogorov–Smirnov test to compare


the observed distribution with the expected given random distribution of the LOH events. This test was applied to LOH of _HLA-I_, focal LOH of _HLA-I_ and nonfocal LOH of _HLA-I_ events


across the metastatic (Hartwig) and primary (PCAWG) datasets. GIE AND TUMOR GENOMIC FEATURES Check Supplementary Note 3 for a full description of the methods for this section. GIE AND TMB


ASSOCIATION We aggregated the two datasets, metastatic and primary, to increase the robustness of this analysis. We then defined 20 evenly arranged buckets (10 for the cancer type-specific


analyses) of the log10(TMB) scale, starting from the 1st percentile and ending in the 99th percentile values. Next, each sample with a log10(TMB) = _S_tmb, was allocated to the _i_th (_i_


\(\in\) 1, … 20) bucket such as log10(TMB)i−1 < _S_tmb ≤ log10(TMB)i. Samples with an _S_tmb greater than the last bucket threshold (that is, log10(TMB)20) were allocated into the last


bucket. The number of mutations in each bucket was displayed as the number of mutations per megabase by dividing the total number of mutations by 3,000 (that is, approximated number of human


genome megabases). Finally, the GIE frequency (GIEfreq) of the _i_th bucket was defined as the number of GIE samples in the _i_th bucket divided by the total number of available samples in


that bucket. To enable calculation of the uniformity in GIE frequency among samples in the same TMB bucket, we performed _n_ (where _n_ = 1,000) bootstraps of the 50% of samples allocated to


each bucket. We then calculated the average and s.d. of the GIEfreq across the bootstraps. A similar approach was conducted to analyze the relationship between the predicted neoepitope load


and GIE frequency. The number of neoantigens of each bucket was estimated as 1% (ref. 46) and 5% (ref. 47) of the total predicted neoepitopes assigned to that bucket threshold. For the


simulated GIE control, we estimated the average and s.d. across the 100 simulated GIE iterations for each TMB bucket. STATISTICS AND REPRODUCIBILITY Sample sizes were determined by the


availability of samples with sufficient quality from the two datasets included in the present study (PCAWG and Hartwig). Sample-exclusion criteria are thoroughly described in Methods,


Supplementary Note 1 and the original publication describing the harmonized cohort20. The statistical tests and randomization strategies used in each specific analysis are described in


Methods and the figure legends. Unless otherwise specified, the scipy48 (v.1.5.3) library from python v.3.6.9 was used to carry out the statistical tests. All the code and data to reproduce


the analyses presented in the present article have been deposited in public repositories as described in Data availability and Code availability. REPORTING SUMMARY Further information on


research design is available in the Nature Portfolio Reporting Summary linked to this article. DATA AVAILABILITY The Hartwig dataset used in the present study is freely available for


academic use from the Hartwig Medical Foundation through standardized procedures and request forms that can be found at https://www.hartwigmedicalfoundation.nl/en/applying-for-data. This


includes raw sequencing data (.bam files and unmapped reads for hg19 reference genome) as well as the processed data through the latest version of the Hartwig tumor-processing pipeline. The


re-processed PCAWG data using the Hartwig Medical Foundation pipeline (for hg19 reference genome) have also been made available for academic purposes. The ICGC part of the PCAWG dataset17


can be accessed now through the ICGC platform (https://dcc.icgc.org/releases/PCAWG/Hartwig), following their standard access control mechanisms originally put in place. Similarly, users with


authorized access can download the re-processed TCGA portion of the PCAWG dataset at https://icgc.bionimbus.org/files/5310a3ac-0344-458a-88ce-d55445540120. We refer to the accompanying


publication20, including the description of the entire primary and metastatic cohort, for further information about the technical aspects of the re-processing of the PCAWG dataset. Raw


sequencing data of the high-resolution HLA typing performed by GenDx can also be downloaded via European Genome-phenome Archive (http://www.ebi.ac.uk/ega) under accession no.


EGAD00001008643. _HLA-I_ typing, sample-specific GIE events and processed data are now shared as Supplementary Data and Supplementary Tables. CODE AVAILABILITY The Hartwig analytical


processing pipeline is available at https://github.com/hartwigmedical/pipeline5 and implemented in Platinum (https://github.com/hartwigmedical/platinum). LILAC’s source code is available at


https://github.com/hartwigmedical/hmftools/tree/master/lilac. The source code of the neoepitope prioritization pipeline is available at


https://github.com/hartwigmedical/hmftools/tree/master/neo. The source code to reproduce the figures and analysis of the manuscript is available at


https://github.com/UMCUGenetics/Genetic-Immune-Escape. REFERENCES * Vinay, D. S. et al. Immune evasion in cancer: mechanistic basis and therapeutic strategies. _Semin. Cancer Biol._ 35,


S185–S198 (2015). PubMed  Google Scholar  * Jhunjhunwala, S., Hammer, C. & Delamarre, L. Antigen presentation in cancer: insights into tumour immunogenicity and immune evasion. _Nat.


Rev. Cancer_ 21, 298–312 (2021). CAS  PubMed  Google Scholar  * Dhatchinamoorthy, K., Colbert, J. D. & Rock, K. L. Cancer immune evasion through loss of MHC class I antigen presentation.


_Front. Immunol._ 12, 636568 (2021). CAS  PubMed  PubMed Central  Google Scholar  * Castro, F., Cardoso, A. P., Gonçalves, R. M., Serre, K. & Oliveira, M. J. Interferon-gamma at the


crossroads of tumor immune surveillance or evasion. _Front. Immunol._ 9, 847 (2018). PubMed  PubMed Central  Google Scholar  * Rosenthal, R. et al. Neoantigen-directed immune escape in lung


cancer evolution. _Nature_ 567, 479–485 (2019). CAS  PubMed  PubMed Central  Google Scholar  * Pennycuick, A. et al. Immune surveillance in clinical regression of preinvasive squamous cell


lung cancer. _Cancer Discov._ 10, 1489–1499 (2020). CAS  PubMed  PubMed Central  Google Scholar  * Grasso, C. S. et al. Genetic mechanisms of immune evasion in colorectal cancer. _Cancer


Discov._ 8, 730–749 (2018). CAS  PubMed  PubMed Central  Google Scholar  * Restifo, N. P. et al. Loss of functional beta 2-microglobulin in metastatic melanomas from five patients receiving


immunotherapy. _J. Natl Cancer Inst._ 88, 100–108 (1996). CAS  PubMed  Google Scholar  * Fangazio, M. et al. Genetic mechanisms of HLA-I loss and immune escape in diffuse large B cell


lymphoma. _Proc. Natl Acad. Sci. USA_ 118, e2104504118 (2021). CAS  PubMed  PubMed Central  Google Scholar  * Montesion, M. et al. Somatic HLA class I loss Is a widespread mechanism of


immune evasion which refines the use of tumor mutational burden as a biomarker of checkpoint inhibitor response. _Cancer Discov._ 11, 282–292 (2021). CAS  PubMed  Google Scholar  * Robinson,


J. et al. IPD-IMGT/HLA database. _Nucleic Acids Res._ 48, D948–D955 (2020). CAS  PubMed  Google Scholar  * McGranahan, N. et al. Allele-specific HLA loss and immune escape in lung cancer


evolution. _Cell_ 171, 1259–1271.e11 (2017). CAS  PubMed  PubMed Central  Google Scholar  * Shukla, S. A. et al. Comprehensive analysis of cancer-associated somatic mutations in class I HLA


genes. _Nat. Biotechnol._ 33, 1152–1158 (2015). CAS  PubMed  PubMed Central  Google Scholar  * Chowell, D. et al. Patient HLA class I genotype influences cancer response to checkpoint


blockade immunotherapy. _Science_ 359, 582–587 (2018). CAS  PubMed  Google Scholar  * Naranbhai, V. et al. HLA-A*03 and response to immune checkpoint blockade in cancer: an epidemiological


biomarker study. _Lancet Oncol._ 23, 172–184 (2022). CAS  PubMed  Google Scholar  * Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. _Nature_ 575, 210–216


(2019). CAS  PubMed  PubMed Central  Google Scholar  * Campbell, P. J. et al. Pan-cancer analysis of whole genomes. _Nature_ 578, 82–93 (2020). Google Scholar  * Xie, C. et al. Fast and


accurate HLA typing from short-read next-generation sequence data with xHLA. _Proc. Natl Acad. Sci. USA_ 114, 8059–8064 (2017). CAS  PubMed  PubMed Central  Google Scholar  * Roepman, P. et


al. Clinical validation of whole genome sequencing for cancer diagnostics. _J. Mol. Diagn._ 23, 816–833 (2021). CAS  PubMed  Google Scholar  * Martínez-Jiménez, F. et al. Pan-cancer whole


genome comparison of primary and metastatic solid tumors. _Nature_ https://doi.org/10.1038/s41586-023-06054-z (2023). * Challa-Malladi, M. et al. Combined genetic inactivation of


β2-microglobulin and CD58 reveals frequent escape from immune recognition in diffuse large B cell lymphoma. _Cancer Cell_ 20, 728–740 (2011). CAS  PubMed  PubMed Central  Google Scholar  *


Castro, A. et al. Elevated neoantigen levels in tumors with somatic mutations in the HLA-A, HLA-B, HLA-C and B2M genes. _BMC Med. Genomics_ 12, 107 (2019). PubMed  PubMed Central  Google


Scholar  * Martínez-Jiménez, F. et al. A compendium of mutational cancer driver genes. _Nat. Rev. Cancer_ 20, 555–572 (2020). PubMed  Google Scholar  * Waldhauer, I. & Steinle, A. NK


cells and cancer immunosurveillance. _Oncogene_ 27, 5932–5943 (2008). CAS  PubMed  Google Scholar  * Boichard, A. et al. APOBEC-related mutagenesis and neo-peptide hydrophobicity:


implications for response to immunotherapy. _OncoImmunology_ 8, 1550341 (2019). PubMed  Google Scholar  * Rousseau, B. et al. The spectrum of benefit from checkpoint blockade in hypermutated


tumors. _N. Engl. J. Med._ 384, 1168–1170 (2021). PubMed  PubMed Central  Google Scholar  * Litchfield, K. et al. Meta-analysis of tumor- and T cell-intrinsic mechanisms of sensitization to


checkpoint inhibition. _Cell_ 184, 596–614.e14 (2021). CAS  PubMed  PubMed Central  Google Scholar  * Dhainaut, M. et al. Spatial CRISPR genomics identifies regulators of the tumor


microenvironment. _Cell_ 185, 1223–1239.e20 (2022). CAS  PubMed  PubMed Central  Google Scholar  * Campbell, B. B. et al. Comprehensive analysis of hypermutation in human cancer. _Cell_ 171,


1042–1056.e10 (2017). CAS  PubMed  PubMed Central  Google Scholar  * Lakatos, E. et al. Evolutionary dynamics of neoantigens in growing tumors. _Nat. Genet._ 52, 1057–1066 (2020). CAS 


PubMed  PubMed Central  Google Scholar  * Blais, M.-E., Dong, T. & Rowland-Jones, S. HLA-C as a mediator of natural killer and T-cell activation: spectator or key player? _Immunology_


133, 1–7 (2011). CAS  PubMed  PubMed Central  Google Scholar  * Jiao, Y. et al. DAXX/ATRX, MEN1, and mTOR pathway genes are frequently altered in pancreatic neuroendocrine tumors. _Science_


331, 1199–1203 (2011). CAS  PubMed  PubMed Central  Google Scholar  * Nijland, M. et al. HLA dependent immune escape mechanisms in B-cell lymphomas: implications for immune checkpoint


inhibitor therapy? _Oncoimmunology_ 6, e1295202 (2017). PubMed  PubMed Central  Google Scholar  * Sayaman, R. W. et al. Germline genetic contribution to the immune landscape of cancer.


_Immunity_ 54, 367–386.e8 (2021). CAS  PubMed  PubMed Central  Google Scholar  * Berglund, A. et al. Methylation of immune synapse genes modulates tumor immunogenicity. _J. Clin. Invest._


130, 974–980 (2020). CAS  PubMed  PubMed Central  Google Scholar  * Mistry, J. et al. Pfam: The protein families database in 2021. _Nucleic Acids Res._ 49, D412–D419 (2021). CAS  PubMed 


Google Scholar  * Davoli, T., Uno, H., Wooten, E. C. & Elledge, S. J. Tumor aneuploidy correlates with markers of immune evasion and with reduced response to immunotherapy. _Science_


355, eaaf8399 (2017). PubMed  PubMed Central  Google Scholar  * Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. _Bioinformatics_ 25, 1754–1760


(2009). CAS  PubMed  PubMed Central  Google Scholar  * McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. _Genome Res._


20, 1297–1303 (2010). CAS  PubMed  PubMed Central  Google Scholar  * Cameron, D. L. et al. GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph


assembly. _Genome Res._ 27, 2050–2060 (2017). CAS  PubMed  PubMed Central  Google Scholar  * Shale, C. et al. Unscrambling cancer genomes via integrated analysis of structural variation and


copy number. _Cell Genomics_ 2, 100112 (2022). CAS  PubMed  PubMed Central  Google Scholar  * Hamanishi, J. et al. Programmed cell death 1 ligand 1 and tumor-infiltrating CD8+ T lymphocytes


are prognostic factors of human ovarian cancer. _Proc. Natl Acad. Sci. USA_ 104, 3360–3365 (2007). CAS  PubMed  PubMed Central  Google Scholar  * Frangieh, C. J. et al. Multimodal pooled


Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. _Nat. Genet._ 53, 332–341 (2021). CAS  PubMed  PubMed Central  Google Scholar  * Griffin, G. K. et al.


Epigenetic silencing by SETDB1 suppresses tumour intrinsic immunogenicity. _Nature_ 595, 309–314 (2021). CAS  PubMed  PubMed Central  Google Scholar  * Dale, R. K., Pedersen, B. S. &


Quinlan, A. R. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. _Bioinforma_ 27, 3423–3424 (2011). CAS  Google Scholar  * Gartner, J. J. et al. A


machine learning model for ranking candidate HLA class I neoantigens based on known neoepitopes from multiple human tumor types. _Nat. Cancer_ 2, 563–574 (2021). CAS  PubMed  PubMed Central


  Google Scholar  * Wells, D. K. et al. Key parameters of tumor epitope immunogenicity revealed through a consortium approach improve neoantigen prediction. _Cell_ 183, 818–834.e13 (2020).


CAS  PubMed  PubMed Central  Google Scholar  * Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. _Nat. Methods_ 17, 261–272 (2020). CAS  PubMed 


PubMed Central  Google Scholar  Download references ACKNOWLEDGEMENTS This publication and the underlying study have been made possible partly on the basis of the data that the Hartwig


Medical Foundation and the Center of Personalised Cancer Treatment have made available to the study. We thank A. Van Hoeck and A. G. Perez for critical reading of the manuscript and their


valuable input. We also thank A. Van Hoeck for assistance in the dataset collection and L. Nyguen for help in the mutational signature extraction. We thank H. W. Lee for technical assistance


in the TRACERx dataset processing. The present study makes use of data generated by the TRAcking Non-small Cell Lung Cancer Evolution Through Therapy (TRACERx) Consortium and provided by


the UCL Cancer Institute and the Francis Crick Institute. The TRACERx study is sponsored by University College London (UCL), funded by Cancer Research UK and coordinated through the Cancer


Research UK and UCL Cancer Trials Centre. We also thank N. McGranahan, O. Pich and C. Puttick for sharing the original _HLA-I_ typing and LOHHLA calls used in their original publication.


Finally, we thank W. Mulder, M. Penning and H. Merkens for discussions on LILAC’s orthogonal validation. AUTHOR INFORMATION Author notes * Francisco Martínez-Jiménez Present address: Vall


d’Hebron Institute of Oncology, Barcelona, Spain * These authors contributed equally: Francisco Martínez-Jiménez, Peter Priestley. AUTHORS AND AFFILIATIONS * Center for Molecular Medicine


and Oncode Institute, University Medical Center Utrecht, Utrecht, the Netherlands Francisco Martínez-Jiménez & Edwin Cuppen * Hartwig Medical Foundation, Amsterdam, the Netherlands


Francisco Martínez-Jiménez & Edwin Cuppen * Hartwig Medical Foundation Australia, Sydney, New South Wales, Australia Peter Priestley, Charles Shale & Jonathan Baber * GenDx, Utrecht,


the Netherlands Erik Rozemuller Authors * Francisco Martínez-Jiménez View author publications You can also search for this author inPubMed Google Scholar * Peter Priestley View author


publications You can also search for this author inPubMed Google Scholar * Charles Shale View author publications You can also search for this author inPubMed Google Scholar * Jonathan Baber


View author publications You can also search for this author inPubMed Google Scholar * Erik Rozemuller View author publications You can also search for this author inPubMed Google Scholar *


Edwin Cuppen View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS F.M.J., E.C. and P.P. conceived the study. F.M.J., P.P., J.B. and C.S.


provided the methodology. P.P., J.B. and C.S. provided the software. F.M.J., P.P. and E.R. validated the study. F.M.J. and P.P. performed the formal analysis. F.M.J. and P.P. carried out the


investigations. E.C. and P.P. provided the resources. F.M.J., P.P., C.S. and E.R. curated the data. F.M.J. wrote the original draft of the MS. E.C. and P.P. reviewed and edited the MS.


F.M.J. visualized the study. F.M.J. and E.C. supervised the study. F.M.J. and E.C. administered the project. E.C. acquired funding. CORRESPONDING AUTHORS Correspondence to Francisco


Martínez-Jiménez or Edwin Cuppen. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. PEER REVIEW PEER REVIEW INFORMATION _Nature Genetics_ thanks Jimmy Van


den Eynden, Teresa Davoliand and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. ADDITIONAL INFORMATION


PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. EXTENDED DATA EXTENDED DATA FIG. 1 LILAC’S VALIDATION.


A) _HLA-I_ typing tumor and germline agreement in PCAWG. Venn diagrams representing the overlap of germline samples with perfect 2-field _HLA-I_ allotype match in B) Hartwig and C) PCAWG.


D) Left, copy number of minor and major alleles of _HLA-A_, _HLA-B_ and _HLA-C_ in Hartwig dataset. Right, proportion of samples with integer minor and major copy number (copy number


<=0.3 or >=0.7) of _HLA-I_ genes (orange) compared to the average of 1,000 randomly selected genes (gray) in the Hartwig cohort (N=4,439). E) analogous for PCAWG samples (N=1,880).


Box-plots: center line, median; box limits, first and third quartiles; whiskers, lowest/highest data points at first quartile minus/plus 1.5× IQR. N, number of samples. EXTENDED DATA FIG. 2


GIE IN PRIMARY AND METASTATIC TUMORS. A) Workflow of the sample processing pipeline used in this study in Hartwig (left) and PCAWG (right). Each rectangle represents a processing step. The


resulting number of selected samples for this study are displayed at the bottom. Sample-exclusion criteria for LILAC is described in Supp. Note 1. B) number of metastatic (Hartwig) samples


across cancer types that lack sufficient representation in the primary (PCAWG) dataset. C) Analogous representation for primary (PCAWG) samples. Vertical dashed lines (N>= 15 samples)


represent the threshold of samples to consider a cohort as sufficiently populated. EXTENDED DATA FIG. 3 POSITIVE SELECTION OF _HLA-I_ GENES. A) Needle plots representing the pan-cancer


distribution of somatic mutations along the _HLA-A_, _HLA-B_ and _HLA-C_ protein sequences in the primary (PCAWG) dataset. Mutations are colored according to the consequence type. Rectangles


represent the Pfam domains. B) Correlation of LOH of _HLA-I_ rates (y-axis) with the mean genome-wide LOH rates (x-axis) across the metastatic (left) and primary (right) datasets. Each dot


represents a cancer type. Horizontal lines represent genome-wide LOH standard deviation across samples from each cancer type. The regression and 95% confidence intervals of the linear


regression are represented as a solid line and the adjacent shaded area, respectively. Confidence intervals are calculated using 1,000 bootstraps. P-values are obtained from a two-sided test


of no-correlation. C) Distribution of non-focal LOH events along the autosomes in the primary kidney chromophobe cancer cohort. X-ticks represent the chromosomal starting position. Dashed


horizontal lines represent the expected mean after randomization. Vertical dashed lines highlight the _HLA-I_ genomic location. CDS pos, coding sequence position. Std., standard deviation.


EXTENDED DATA FIG. 4 GIE ASSOCIATION WITH CANCER GENOMIC FEATURES. A) Heatmap displaying the association of genomic features with GIE frequency across cancer types. Significant associations


that can not be explained by higher background mutation rate are highlighted by a red border line. Significant associations found in >2% of the GIE simulations are highlighted by a black


border line. B) Comparison of the TMB between samples bearing GIE alterations and non-GIE samples in diffuse large B-cell lymphoma (DLBCL). Left boxplot, using observed GIE events. Right


using simulated GIE events. C) Similar comparison for neoepitope burden in skin melanoma. D) Comparison of the APOBEC mutational exposure between samples bearing simulated GIE alterations


and wild-type (no simulated GIE) in breast cancer in one randomly selected simulation. E) Analogous for ultraviolet light (UV) associated double base substitutions (DBSs) in skin melanoma


and F) for platinum treatment associated DBSs in non-small cell lung cancer (NSCLC). G) Comparison of immune infiltration estimates from Davoli et al.37 between samples bearing simulated GIE


alterations and wild-type (no simulated GIE) in colorectal cancer. Boxplots: center line, median; First section out from the centerline contains 50% of the data. The next sections contain


half the remaining data until we are at the outlier level. Each level out is shaded lighter. N, number of samples. P-values of the boxplots are calculated using a two-sided Mann–Whitney U


test. One of the 100 simulations was randomly selected for all the simulated GIE boxplots. SBS, single base substitution. EXTENDED DATA FIG. 5 IMMUNE EVASION MECHANISMS AND TMB. A) Analogous


to Fig. 7a, but restricted to (from left to right) non-small cell lung cancer, breast cancer and colorectal cancer patients. Top panels represent the total number of bucket-assigned (white


bars with a black contouring line) and GIE-positive (pink bars with pink contouring lines) samples across the ten evenly arranged TMB buckets. bottom panel, representation of observed (pink)


and simulated (gray) GIE frequency across these buckets. For the observed GIE values, average (represented as pink dots) and standard deviation (std, vertical error bars and shaded pink


area) values are computed using 1,000 bootstraps from the total number of cancer type samples classified into each bucket (from the top panel). For the simulated GIE values, average (gray


triangle) and standard deviation (std, vertical bars and shaded gray area) values are computed from 100 GIE simulations using the total number of cancer type samples assigned into each


bucket. B) Analogous to panel a) but using the entire cohort (N=6,319) and twenty evenly arranged buckets as well as clonal TMB (left) and subclonal TMB (right) as baseline. C) Equivalent to


panel a) but using the entire cohort (N=6,319) and twenty evenly arranged buckets and grouping GIE events according to the assigned GIE pathway. Label number represents the assigned immune


escape pathway (from Fig. 1a) D) Similar to panel c) but using the number of predicted neoepitopes as baseline for the buckets. Bottom labels, number of estimated neoantigens as a relative


percentage (1% and 5%) of the number of predicted neoepitopes in the bucket. N, number of samples. Muts/Mb, mutations per megabase. ICI, immune checkpoint inhibitor. SUPPLEMENTARY


INFORMATION SUPPLEMENTARY INFORMATION Supplementary Tables 1 and 2 and Data 1–4 captions, and Supplementary Notes 1–3. REPORTING SUMMARY PEER REVIEW FILE SUPPLEMENTARY TABLES 1 AND 2


Supplementary Table 1 Pathways and genes involved in genetic immune escape. Table 2 Dataset metadata and cancer-type representation. SUPPLEMENTARY DATA 1 _HLA-I_ typing for Hartwig and PCAWG


patients and validation of LILAC. SUPPLEMENTARY DATA 2 Sample-specific GIE annotation and cohort-wise GIE frequency. SUPPLEMENTARY DATA 3 Positive selection in _HLA-I_ and non-_HLA-I_


genes. SUPPLEMENTARY DATA 4 Tumor genomics and clinical features and GIE association. RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0


International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the


source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative


Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by


statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit


http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Martínez-Jiménez, F., Priestley, P., Shale, C. _et al._ Genetic immune escape


landscape in primary and metastatic cancer. _Nat Genet_ 55, 820–831 (2023). https://doi.org/10.1038/s41588-023-01367-1 Download citation * Received: 28 June 2022 * Accepted: 10 March 2023 *


Published: 10 May 2023 * Issue Date: May 2023 * DOI: https://doi.org/10.1038/s41588-023-01367-1 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content:


Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative