In:
Cancer Research, American Association for Cancer Research (AACR), Vol. 80, No. 16_Supplement ( 2020-08-15), p. 1320-1320
Abstract:
Somatic mutation calling from bulk DNA sequencing is a complex problem susceptible to elevated false positive rates. High mapping quality is considered an important feature of reliable variant calls. At the resolution of short reads, ~10% of the genome displays high sequence similarity with at least one other genomic region and is assigned low mapping quality by alignment algorithms. These low-mapping-quality regions represent recurrent blind spots for mutation callers, which discard many of the variants they harbor, overlooking true biological variation. Here, we developed a pipeline to call substitutions in the low-mapping-quality genome. We used a published thesaurus approach to annotate the variant positions with their high-similarity links. We trained a classifier to emulate high-quality consensus calls made in unique regions using 20 features unrelated to mapping quality, reaching ~95% accuracy in those regions. In an independent sample more than 90% of the thesaurus calls were validated through linked-read sequencing. We then applied the classifier to all candidate substitutions of 2,658 cancer whole genomes from the PCAWG/ICGC consortium including variants falling in low-mapability regions. We retrieve hidden thesaurus variants genome-wide in ~6% of the genome, including genic, coding, and promoter regions. Thesaurus calls are directly proportional in numbers to somatic calls falling in the low-mapping-quality genome and share a similar trinucleotide context spectrum. Rescuing these mutations reveal hidden signal in known cancer genes, including PIK3CA, and excess of mutations genome-wide in promoter, untranslated, and coding regions of many other genes. We also find potential excess of non-synonymous mutations, including in genes from the TRIM and POTE families, having been previously implicated in multiple cancer types. Altogether, we developed a pipeline to call somatic substitutions in the low-mapping-quality genome and uncovered hidden somatic changes along the genomes of human cancers. In the future, this pipeline could be extended to indels and structural variants, and applied to the study of de novo germline variants. Citation Format: Maxime Tarabichi, Jonas Demeulemeester, Annelien Verfaillie, Peter Van Loo, Tomasz Konopka. The landscape of somatic substitutions in the repetitive genome across cancer types [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 1320.
Type of Medium:
Online Resource
ISSN:
0008-5472
,
1538-7445
DOI:
10.1158/1538-7445.AM2020-1320
Language:
English
Publisher:
American Association for Cancer Research (AACR)
Publication Date:
2020
detail.hit.zdb_id:
2036785-5
detail.hit.zdb_id:
1432-1
detail.hit.zdb_id:
410466-3
Permalink