Next Article in Journal
Homo and Heterotypic Cellular Cross-Talk in Epithelial Ovarian Cancer Impart Pro-Tumorigenic Properties through Differential Activation of the Notch3 Pathway
Next Article in Special Issue
Contraception and Hormone Replacement Therapy in Healthy Carriers of Germline BRCA1/2 Genes Pathogenic Variants: Results from an Italian Survey
Previous Article in Journal
Isolation of Circulating Tumor Cells from Seminal Fluid of Patients with Prostate Cancer Using Inertial Microfluidics
Previous Article in Special Issue
Analysis of the Clinical Advancements for BRCA-Related Malignancies Highlights the Lack of Treatment Evidence for BRCA-Positive Male Breast Cancer
 
 
cancers-logo
Article Menu

Article Menu

Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Uncovering the Contribution of Moderate-Penetrance Susceptibility Genes to Breast Cancer by Whole-Exome Sequencing and Targeted Enrichment Sequencing of Candidate Genes in Women of European Ancestry

by
Martine Dumont
1,
Nana Weber-Lassalle
2,
Charles Joly-Beauparlant
1,
Corinna Ernst
2,
Arnaud Droit
1,
Bing-Jian Feng
3,4,
Stéphane Dubois
1,
Annie-Claude Collin-Deschesnes
1,
Penny Soucy
1,
Maxime Vallée
1,
Frédéric Fournier
1,
Audrey Lemaçon
1,
Muriel A. Adank
5,
Jamie Allen
6,
Janine Altmüller
7,
Norbert Arnold
8,
Margreet G. E. M. Ausems
9,
Riccardo Berutti
10,
Manjeet K. Bolla
6,
Shelley Bull
11,12,
Sara Carvalho
6,
Sten Cornelissen
13,
Michael R. Dufault
14,
Alison M. Dunning
15,
Christoph Engel
16,
Andrea Gehrig
17,
Willemina R. R. Geurts-Giele
18,
Christian Gieger
19,20,
Jessica Green
11,21,
Karl Hackmann
22,
Mohamed Helmy
23,24,25,
Julia Hentschel
26,
Frans B. L. Hogervorst
5,
Antoinette Hollestelle
27,
Maartje J. Hooning
27,
Judit Horváth
28,
M. Arfan Ikram
29,
Silke Kaulfuß
30,
Renske Keeman
13,
Da Kuang
21,23,
Craig Luccarini
15,
Wolfgang Maier
31,
John W. M. Martens
27,
Dieter Niederacher
32,
Peter Nürnberg
33,
Claus-Eric Ott
34,
Annette Peters
19,35,
Paul D. P. Pharoah
6,15,
Alfredo Ramirez
36,
Juliane Ramser
37,
Steffi Riedel-Heller
38,
Gunnar Schmidt
39,
Mitul Shah
15,
Martin Scherer
40,
Antje Stäbler
41,
Tim M. Strom
10,
Christian Sutter
42,
Holger Thiele
7,
Christi J. van Asperen
43,
Lizet van der Kolk
5,
Rob B. van der Luijt
43,44,
Alexander E. Volk
45,
Michael Wagner
46,
Quinten Waisfisz
47,
Qin Wang
6,
Shan Wang-Gohrke
48,
Bernhard H. F. Weber
49,50,
Genome of the Netherlands Project
,
GHS Study Group
,
Peter Devilee
51,
Sean Tavtigian
4,52,
Gary D. Bader
11,21,23,53,54,
Alfons Meindl
37,
David E. Goldgar
3,4,
Irene L. Andrulis
11,21,
Rita K. Schmutzler
2,
Douglas F. Easton
6,15,
Marjanka K. Schmidt
13,55,
Eric Hahnen
2 and
Jacques Simard
1,56,*
add Show full author list remove Hide full author list
1
Genomics Center, CHU de Québec-Université Laval Research Center, 2705 Laurier Boulevard, Quebec City, QC GIV 4G2, Canada
2
Center for Familial Breast and Ovarian Cancer, Center for Integrated Oncology (CIO), Faculty of Medicine and University Hospital Cologne, University of Cologne, 50937 Cologne, Germany
3
Department of Dermatology, University of Utah, Salt Lake City, UT 84103, USA
4
Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, USA
5
Family Cancer Clinic, The Netherlands Cancer Institute—Antoni van Leeuwenhoek Hospital, 1066 Amsterdam, The Netherlands
6
Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK
7
Cologne Center for Genomics (CCG), Faculty of Medicine, University Hospital Cologne, University of Cologne, 50931 Cologne, Germany
8
Institute of Clinical Molecular Biology, Department of Gynaecology and Obstetrics, University Hospital of Schleswig-Holstein, Campus Kiel, Christian-Albrechts University Kiel, 24105 Kiel, Germany
9
Division Laboratories, Pharmacy and Biomedical Genetics, Department of Genetics, University Medical Center Utrecht, 3584 Utrecht, The Netherlands
10
Institute of Human Genetics, Technische Universität München, 81675 Munich, Germany
11
Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON M5G 1X5, Canada
12
Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada
13
Division of Molecular Pathology, The Netherlands Cancer Institute—Antoni van Leeuwenhoek Hospital, 1066 Amsterdam, The Netherlands
14
Precision Medicine and Computational Biology, Sanofi Genzyme, Cambridge, MA 02142, USA
15
Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Cambridge CB1 8RN, UK
16
Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, 04107 Leipzig, Germany
17
Centre of Familial Breast and Ovarian Cancer, Department of Medical Genetics, Institute of Human Genetics, University of Würzburg, 97074 Würzburg, Germany
18
Department of Clinical Genetics, Erasmus University Medical Center, 3015 Rotterdam, The Netherlands
19
Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
20
Research Unit Molecular Epidemiology, Helmholtz Zentrum München, German Research Centre for Environmental Health, 85764 Neuherberg, Germany
21
Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
22
Institute for Clinical Genetics, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
23
The Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
24
Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore
25
Department of Computer Science, Lakehead University, Thunder Bay, ON P7B 5E1, Canada
26
Institute of Human Genetics, University Leipzig, 04103 Leipzig, Germany
27
Department of Medical Oncology, Erasmus MC Cancer Institute, 3015 Rotterdam, The Netherlands
28
Institute of Human Genetics, University of Münster, 48149 Münster, Germany
29
Department of Epidemiology, Erasmus MC University Medical Center, 3015 Rotterdam, The Netherlands
30
Institute of Human Genetics, University Medical Center Göttingen, 37075 Göttingen, Germany
31
German Center for Neurodegenerative Diseases (DZNE), Department of Neurodegenerative Diseases and Geriatric Psychiatry, Medical Faculty, University Hospital Bonn, 53127 Bonn, Germany
32
Department of Gynecology and Obstetrics, University Hospital Düsseldorf, Heinrich-Heine University Düsseldorf, 40225 Düsseldorf, Germany
33
Center for Molecular Medicine Cologne (CMMC), Cologne Center for Genomics (CCG), Faculty of Medicine, University Hospital Cologne, University of Cologne, 50931 Cologne, Germany
34
Institute of Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin and Berlin Institute of Health, 13353 Berlin, Germany
35
Department of Epidemiology, Institute for Medical Information Processing, Biometry and Epidemiology, Medical Faculty, Ludwig-Maximilians-Universität München, 80539 Munich, Germany
36
Division for Neurogenetics and Molecular Psychiatry, Medical Faculty, University of Cologne, 50937 Cologne, Germany
37
Division of Gynaecology and Obstetrics, Klinikum Rechts der Isar der Technischen Universität München, 81675 Munich, Germany
38
Institute of Social Medicine, Occupational Health and Public Health, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany
39
Institute of Human Genetics, Hannover Medical School, 30625 Hannover, Germany
40
Department of Primary Medical Care, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
41
Institute of Medical Genetics and Applied Genomics, University of Tübingen, 72076 Tübingen, Germany
42
Institute of Human Genetics, University Hospital Heidelberg, 69120 Heidelberg, Germany
43
Department of Clinical Genetics, Leiden University Medical Center, 2333 Leiden, The Netherlands
44
Department of Medical Genetics, University Medical Center, 3584 Utrecht, The Netherlands
45
Institute of Human Genetics, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
46
Department of Neurodegenerative Diseases and Geriatric Psychiatry, University Hospital Bonn, 53127 Bonn, Germany
47
Department of Human Genetics, Amsterdam UMC, Vrije Universiteit Amsterdam, 1081 Amsterdam, The Netherlands
48
Department of Gynaecology and Obstetrics, University of Ulm, 89081 Ulm, Germany
49
Institute of Human Genetics, Regensburg University, 93053 Regensburg, Germany
50
Institute of Clinical Human Genetics, University Hospital Regensburg, 93053 Regensburg, Germany
51
Department of Pathology, Department of Human Genetics, Leiden University Medical Center, 2333 Leiden, The Netherlands
52
Department of Oncological Sciences, University of Utah School of Medicine, Salt Lake City, UT 84132, USA
53
Department of Computer Science, University of Toronto, Toronto, ON M5S 3E1, Canada
54
Princess Margaret Research Institute, University Health Network, Toronto, ON M5G 0A3, Canada
55
Division of Psychosocial Research and Epidemiology, The Netherlands Cancer Institute—Antoni van Leeuwenhoek Hospital, 1066 Amsterdam, The Netherlands
56
Department of Molecular Medicine, Faculty of Medicine, Université Laval, Quebec, QC G1V 0A6, Canada
*
Author to whom correspondence should be addressed.
Genome of the Netherlands Project and GHS Study Group are listed in Acknowledgments.
Cancers 2022, 14(14), 3363; https://doi.org/10.3390/cancers14143363
Submission received: 12 April 2022 / Revised: 29 June 2022 / Accepted: 1 July 2022 / Published: 11 July 2022

Abstract

:

Simple Summary

Genetic variants explaining approximately 40% of familial breast cancer risk have been identified, thus leaving a significant fraction of the heritability of this disease still unexplained. The exact nature of this missing fraction is unknown; more extensive sequencing efforts could potentially identify new moderate-penetrance breast cancer risk alleles. The aim of this study was to perform a large-scale whole-exome sequencing study, followed by a targeted validation, in breast cancer patients and healthy women of European descent. We identified 20 novel genes with modest evidence of association (p-value < 0.05) for either overall or subtype-specific breast cancer; however, much larger studies are needed to confirm the exact role of these genes in susceptibility to breast cancer.

Abstract

Rare variants in at least 10 genes, including BRCA1, BRCA2, PALB2, ATM, and CHEK2, are associated with increased risk of breast cancer; however, these variants, in combination with common variants identified through genome-wide association studies, explain only a fraction of the familial aggregation of the disease. To identify further susceptibility genes, we performed a two-stage whole-exome sequencing study. In the discovery stage, samples from 1528 breast cancer cases enriched for breast cancer susceptibility and 3733 geographically matched unaffected controls were sequenced. Using five different filtering and gene prioritization strategies, 198 genes were selected for further validation. These genes, and a panel of 32 known or suspected breast cancer susceptibility genes, were assessed in a validation set of 6211 cases and 6019 controls for their association with risk of breast cancer overall, and by estrogen receptor (ER) disease subtypes, using gene burden tests applied to loss-of-function and rare missense variants. Twenty genes showed nominal evidence of association (p-value < 0.05) with either overall or subtype-specific breast cancer. Our study had the statistical power to detect susceptibility genes with effect sizes similar to ATM, CHEK2, and PALB2, however, it was underpowered to identify genes in which susceptibility variants are rarer or confer smaller effect sizes. Larger sample sizes would be required in order to identify such genes.

1. Introduction

Breast cancer is the most common cause of cancer-related death and the most frequently diagnosed cancer among women worldwide [1]. Currently, approximately 40% of familial breast cancer risk is explained by a combination of common low-penetrance variants [2,3], together with coding rarer variants in predisposition genes, such as BRCA1, BRCA2, PALB2, ATM, and CHEK2 that confer higher risks [4]. Thus, a significant fraction of the heritability of this disease is still unexplained. A better understanding of the genetic risk factors contributing to breast cancer can improve risk prediction, and hence better inform screening and prevention strategies, and may also inform understanding of the biology underlying breast cancer predisposition.
While linkage studies have identified high-risk breast cancer susceptibility genes, and genome-wide association studies (GWAS) using single-nucleotide polymorphism (SNP) arrays have identified many common variants conferring modest risks, neither of these approaches are powerful enough to identify rarer variants conferring moderate risk. Several genes associated with moderate risk (two- to four fold) of breast cancer have been identified by candidate gene sequencing approaches. These targeted approaches are limited to our understanding of the pathways involved in breast cancer etiology, and have thus mainly identified new breast cancer susceptibility alleles within genes either coding for proteins that interact with BRCA1 or BRCA2 (e.g., PALB2), or other genes involved in DNA repair processes, such as ATM and CHEK2. Since most genes have not yet been subjected to large-scale sequencing studies, it is possible that other susceptibility genes exist, which would explain some of the residual heritability of breast cancer.
Recently, whole-exome sequencing (WES) has provided a comprehensive, agnostic approach to exploring associations between rarer coding variants and disease. Compared to whole-genome sequencing (WGS), WES has a substantially lower cost and generates results that are more easily interpretable [5]. In the last few years, a number of WES studies have been carried out for breast cancer [6,7]. These studies rely on the sequencing of cases enriched for family history of disease, and matched controls, sometimes in combination with analysis of the segregation of the variants associated with disease in families. However, very few novel susceptibility genes have been robustly identified by these studies. This lack of success may, in part, be due to the lack of statistical power: since deleterious variants in most genes are likely to be very rare, studies need to be very large to detect them. More recently, the largest WES study for overall breast cancer reported increased breast cancer risk associated with ATM, CHEK2, PALB2, and MSH6 [8]. However, while the first three genes are already well established, the association with MSH6 has not been definitively replicated in large, targeted sequencing studies.
In an attempt to overcome these shortcomings, we designed a large-scale two-stage WES study. The discovery step was carried out using WES data from 1528 breast cancer cases enriched for genetic susceptibility to the disease, based on early onset breast cancer, a family history of disease, or bilateral breast cancer, and 3733 geographically matched, unaffected controls. The validation step assessed the associations between the presence of loss-of-function (LoF) and rare missense variants in 198 selected candidate genes and the risk of breast cancer overall, as well as estrogen receptor (ER) disease subtypes, in a replication set of 6211 breast cancer cases and 6019 controls. In addition to these genes, a panel of 32 known or suspected breast cancer susceptibility genes was also included in this targeted enrichment sequencing study.

2. Materials and Methods

2.1. Discovery Stage—Whole-Exome Sequencing (WES) Analysis

2.1.1. Studies and Datasets

For the discovery stage, WES was performed on 1528 breast cancer cases, and the resulting data were compared with whole-exome data obtained from 3733 unaffected controls (3483 unaffected women from four studies, as described below, and 250 unaffected men from GoNL trios) (Table S1). Briefly, breast cancer cases were selected from two clinic-based studies: The German Consortium for Hereditary Breast and Ovarian Cancer (GC-HBOC) (1021 cases) [9], with half of the samples originating from the Cologne region (490 samples) and the other half from the Munich area (531 samples), and The Dutch Familial Bilateral Breast Cancer Study (DFBBCS) (511 cases) [10,11,12]. All subjects had previously tested negative for the presence of BRCA1 and BRCA2 germline mutations in a clinical genetic setting. All subjects from the GC-HBOC had a family history of breast cancer, while all subjects from the DFBBCS had bilateral breast cancer. Control whole-exome data were obtained from four studies: The Rotterdam Study [13], The Genome of the Netherlands Project (GoNL) [14,15], The German Study on Ageing, Cognition, and Dementia (AgeCoDe) [16], and The KORA Study [17]. A more detailed description of these studies and datasets is included in Table S1 and in the Supplementary Methods. All study subjects were recruited based on protocols approved by the Institutional Review Boards at each participating institution, and all subjects provided written informed consent.

2.1.2. Selection of Breast Cancer Cases

Breast cancer cases were selected according to the following criteria: early onset (<50 years of age) and a family history of one or more first-degree relatives with breast cancer diagnosed before 50 years of age, or two relatives diagnosed under the age of 60. A scoring system was established for the selection of families based on the assumption that rare, moderate-risk variants might be expected to confer a higher relative risk at young ages, as is the case for most known susceptibility genes [10]. In addition, the number of affected relatives weighted by their degree of relationship to the case was taken into account. Briefly, families were scored according to the following: (a) For each case in the family, a score of 1 was given if the case was diagnosed at age 50 or older, a score of 1.5 was given if diagnosed between 40–49 years, and a score of 2 if diagnosed at <40 years of age. (b) Cases were then weighted by their degree of relationship to the index case, where a weight of 1 was attributed for the index case, 0.5 for first-degree relatives, and 0.25 for second-degree relatives. (c) For bilateral cases, both cancers were included (thus giving a score of 2). (d) The total score for the family was obtained by the sum of these scores. Carriers of deleterious germline mutations in BRCA1 and BRCA2, and CHEK2 c.1100delC carriers were excluded. Related samples were identified on the basis of genotype data, and the individual with the youngest age of onset was chosen. The score distribution among breast cancer cases was as follows: 0.6% with a score below 2; 16.7% with a score between 2 and <2.5; 33.6% with a score between 2.5 and <3; 25.5% with a score between 3 and <3.5; 14.1% with a score between 3.5 and <4; 9.5% with a score ≥4.

2.1.3. Library Preparation, High-Throughput Sequencing, and Bioinformatics Analysis

Libraries for samples from GC-HBOC cases were prepared using the Agilent SureSelectXT Human All Exon V5 capture kit, while libraries for samples from DFBBCS cases were prepared using NimbleGen SeqCap EZ Exome Library v3.0. (Table S1). Barcoded libraries were sequenced on an Illumina HiSeq2500 for paired-end 100 bp sequencing. Available sequencing data from controls was obtained from other studies. These data were generated using the following capture technologies: NimbleGen SeqCap EZ Exome V2, NimbleGen SeqCap EZ Exome V3, Agilent SureSelect Human All Exon V3 and V5 (Table S1). Whole-exome data from cases and controls were demultiplexed and aligned to the reference genome using BWA-MEM to create aligned BAM files, followed by base quality score recalibration. The analysis of sequencing data from cases and controls was restricted to the region of 34.5 Mb common to the different capture technologies used. For breast cancer cases, a mean coverage of 100× was obtained with >97.5% of bases above 10×. Variant calling was performed using a pipeline based on the GATK HaplotypeCaller. Details of library preparation and bioinformatics analysis including variant calling and quality control procedures are given in Supplementary Methods.

2.1.4. Variant Filtering and Gene Prioritization

Five research groups involved in this project (CHU de Québec-Université Laval Research Centre, Huntsman Cancer Institute, University of Utah, Munich Technical University, University of Toronto, University of Cambridge) performed variant filtering and gene prioritization using different approaches according to their expertise with regard to such analyses (Supplementary Methods). Variant filtering and gene prioritization strategies are summarized in Figure 1 and in Supplementary Methods. Most variant filtering strategies are similar for all groups, reflecting current best practices.

2.1.5. Aggregation of Gene Lists

Following these gene prioritization analyses, the results from each group underwent manual scrutiny and were then merged into a final list of candidate genes for validation. The selection process first included the top 30-ranked genes from each group. Each group then nominated additional putative breast cancer susceptibility genes according to their analysis and expertise. Finally, the list was augmented by the inclusion of genes that were among the top 500-ranked by at least four groups. Through this selection process, 198 candidate genes identified in the WES were selected for targeted resequencing (Table S2). In addition to these candidate genes, this targeted sequencing enrichment stage also included 32 genes for which there was prior evidence of association with breast cancer risk [4,18]. These 32 genes were selected based on their a priori associations with the disease, and were not considered in the individual gene ranking process.

2.2. Validation Stage—Targeted Enrichment Sequencing

2.2.1. Sample Sets

A total of 12,655 samples, including 6518 women affected with breast cancer, and 6137 controls, were selected for the validation stage. These were drawn from five sources, namely: GC-HBOC [9] (n = 5966, 3199 cases and 2767 controls); a Dutch validation set comprising samples from two studies, the Amsterdam Breast Cancer Study-Familial (ABCS-F) [10] and the Rotterdam Breast Cancer Study (RBCS) [19] (n = 1941, 979 cases and 962 controls); Studies of Epidemiology and Risk factors in Cancer Heredity (SEARCH) [20,21,22] (n = 2637, 1289 cases and 1348 controls); the Ontario Familial Breast Cancer Registry (OFBCR) [23] (n = 1191, 600 cases and 591 controls) and CARTaGENE (CaG) [24] (n = 920, 451 cases and 469 controls). The GC-HBOC samples did not overlap with those used for the discovery step. A more detailed description of these studies is included in Table S1 and in Supplementary Methods. All study subjects were recruited based on protocols approved by the Institutional Review Boards at each participating institution, and all subjects provided written informed consent.

2.2.2. Power Calculations

To perform power calculations for the replication stage, we assumed a significance level of p-value < 10−4. The standard “exome-wide” significance level is p < 2.5 × 10−6; this is equivalent to assuming a prior probability 40-fold higher than an average gene. This is also the level proposed by Easton et al. [4] for assessing the evidence for candidate DNA repair genes included in gene panels. To allow for the oversampling of cases with a family history, bilaterality, or early age at onset, we assumed that the effect size (log odds ratio) was increased by a factor of 1.5. Based on these assumptions, the power exceeded 50% for genes with a carrier frequency >0.005 and an odds ratio (OR) in the population >2, or a carrier frequency >0.001 and an OR > 4. Based on the effect sizes and European allele frequencies [18], the power of our study to detect genes with the same characteristics as ATM, CHEK2, and PALB2 was 0.44, 1.0 and 0.95, respectively. However, the power was low to detect genes with the characteristics of RAD51C (0.0039), RAD51D (0.0025), or BARD1 (0.017).

2.2.3. Library Preparation and High-Throughput Sequencing

Library preparation was performed at the Cologne Center for Genomics, the Center for Familial Breast and Ovarian Cancer, and the Genomics Centre at CHU de Québec-Université Laval Research Center. Sequencing was performed at two centers: the Cologne Center for Genomics, Cologne, Germany, using Illumina HiSeq 4000 sequencers for paired-end 75 bp sequencing (mean coverage 140×); and the Genomics Centre at CHU de Québec-Université Laval Research Center, Quebec City, Canada, using Illumina HiSeq 2500 for paired-end 100 bp sequencing (mean coverage 185×). Additional details on library preparation and quality control procedures are given in the Supplementary Methods.

2.2.4. Bioinformatics and Statistical Analyses

Bioinformatics analyses were performed on the Compute Canada supercomputer, Graham. A pipeline was set up following GATK Best Practices [25]. GATK Unified Genotyper was used for variant calling, and variants were annotated and quality-controlled using VICTOR (variant interpretation for clinical testing or research), and both gene-level and variant-level significances were computed using PERCH (polymorphism evaluation, ranking and classification for heritable traits) [26]. PERCH quantitatively integrated deleteriousness prediction, allele frequency information, rare variant association analysis, biological relevance assessment, and the quality of variant calls. Additional details on the bioinformatics pipeline and variant calling procedure are given in the Supplementary Methods. Following these quality control and filtering steps, data from a total of 12,230 samples were further analyzed, including 6211 breast cancer cases and 6019 controls.
The principal association analysis was a gene burden analysis, in which the genotypes were collapsed into a 0/1 variable for each gene, according to whether or not a deleterious variant was carried. For each gene, variant carriers were defined using the VICTOR software suite. LoF variants were defined as (1) StopGain or FrameShift variants, excluding those affecting only the last 5% of coding sequences (CDS); (2) SpliceSite variants that include coding and non-coding regions; and (3) variants that inhibit transcription or translation. Rare missense variants included in these analyses were those showing a deleteriousness score according to the BayesDel algorithm [26] (Tables S3–S5). A final list of 7437 variants was established, including LoF variants (n = 1696) and rare missense variants (n = 5741). Among these, 25 LoF variants had a minor allele frequency (MAF) between 0.01 and 0.001, and 98.5% of LoF variants had an MAF < 0.001; in contrast, 21 missense variants had an MAF between 0.01 and 0.001, and the remaining 99.6% of missense variants had an MAF < 0.001.
ORs and 95% confidence intervals (CI) associated with carrier status were computed using logistic regression, adjusting for study population as a covariate, and p-values were derived from Wald tests. For analyses performed by individual study, p-values were computed via Fisher’s exact tests. All statistical analyses were performed in R v3.6.0.

3. Results

A final list of 7437 variants including LoF (n = 1696) and rare missense variants (n = 5741) were included in the gene burden analyses. Quantile–quantile (Q–Q) plots are shown in Figure S1A for LoF variants, and Figure S1B for missense variants. When known breast cancer susceptibility genes were excluded, there was no evidence of inflation in the test statistics.

3.1. Known or Suspected Breast Cancer Susceptibility Genes

Targeted enrichment sequencing of the panel of 32 known or suspected breast cancer susceptibility genes confirmed the associations of known breast cancer susceptibility genes ATM, CHEK2, and PALB2 (Figure 2, Table 1, Tables S6 and S7). It should be noted that these analyses did not include BRCA1 or BRCA2, as our original study design excluded samples from families that carried mutations in either of these genes.

3.1.1. Loss-of-Function Variants

Significant associations for LoF variants with overall breast cancer risk were observed for CHEK2 (OR = 3.47 (CI 95% 2.33–5.15), p-value = 7.62 × 10−10); ATM (OR = 3.55 (CI 95% 2.16–5.82), p-value = 5.44 × 10−7); and PALB2 (OR = 3.95 (CI 95% 1.98–7.88), p-value = 9.68 × 10−5), while more modest evidence of overall risk association was observed for TP53 (OR = 10.16 (CI 95% 1.31–78.67, p-value = 0.026) (Figure 2 and Table 1). No nominally significant associations were observed for the remaining 28 genes in this panel. Of the four genes that were associated with breast cancer overall, CHEK2 and PALB2 also showed evidence of association with both ER-positive and ER-negative disease (CHEK2 ER-positive: OR = 4.44 (CI 95% 2.82–6.99), p-value = 1.23 × 10−10 and ER-negative: OR = 2.65 (CI 95% 1.24–5.70), p-value = 0.012; PALB2 ER-positive: OR = 5.27 (CI 95% 2.39–11.61), p-value = 3.83 × 10−5 and ER-negative: OR = 6.45 (CI 95% 2.31–17.98), p-value = 3.72 × 10−4) while ATM showed evidence of an association with ER-positive disease (OR = 4.06 (CI 95% 2.37–6.97), p-value = 3.39 × 10−7, p-diff = 0.026 for difference with ER-negative disease) (Figure 2, Table 1). Among the genes that had no evidence of association with overall breast cancer risk, RAD51D showed some evidence of association with ER-negative disease (OR = 11.26 (CI 95% 2.60–48.68), p-value = 1.19 × 10−3, p-diff = 0.049 for difference with ER-positive disease) (Figure 2, Table 1). LoF variants in MSH6 showed evidence of a negative association with breast cancer risk. This effect was observed for overall breast cancer risk (OR = 0.36 (CI 95% 0.21–0.62), p-value = 2.07 × 10−4) as well as for ER-positive specific disease (OR = 0.42 (CI 95% 0.22–0.81), p-value = 9 × 10−3) (Table 1).

3.1.2. Rare Missense Variants

As described in the Materials and Methods section, rare missense variants included in these analyses were those showing a deleteriousness score according to the BayesDel algorithm [26] (Tables S3–S5). In the panel of 32 known or suspected breast cancer susceptibility genes, there was evidence of an association with overall breast cancer risk for rare missense variants in four genes: ATM (OR = 1.67 (CI 95% 1.32–2.10), p-value = 1.56 × 10−5); CHEK2 (OR = 1.95 (CI 95% 1.35–2.82), p-value = 3.43 × 10−4); TP53 (OR = 2.22 (CI 95% 1.26–3.93), p-value = 5.86 × 10−3) and MEN1 (OR = 5.93 (CI 95% 1.33–26.50), p-value = 0.020) (Figure 2 and Table S7). The results of the tumor ER-subtype analyses are shown in Table S7. Among genes that were not associated with overall breast cancer risk, missense variants in BARD1 showed some evidence of association with ER-positive disease (OR = 2.14 (CI 95% 1.13–4.04), p-value = 0.020), while an association with ER-negative breast cancer was observed for missense variants in BRIP1 (OR = 3.78 (CI 95% 1.23–11.55), p-value = 0.020) (Figure 2, Table S7).

3.2. Validation of Candidate Genes Identified at the Discovery Stage

Analysis of the 198 genes selected from the WES analysis identified 20 potential candidate susceptibility genes with evidence of association with breast cancer (p-value < 0.05) (Figure 3, Table 2 and Tables S8–S10).

3.2.1. Loss-of-Function Variants

LoF variants in three genes (ZFAND1, TYRO3, and TMEM206/PACC1) displayed modest evidence of an association with breast cancer risk overall (p-value < 0.05) (Figure 3 and Table 2), with effect sizes for ZFAND1 of 1.73 (CI 95% 1.12–2.68, p-value = 0.014), TYRO3 of 1.40 (CI 95% 1.06–1.83, p-value = 0.016), and TMEM206/PACC1 of 1.70 (CI 95% 1.11–2.62, p-value = 0.016). Of these genes, two were also associated with ER-negative breast cancer risk: ZFAND1 (OR = 2.96 (CI 95% 1.59–5.50), p-value = 6.37 × 10−4) and TYRO3 (OR = 1.66 (CI 95% 1.03–2.68), p-value = 0.038). A negative overall association with breast cancer risk was observed for EML5 (OR = 0.41 (CI 95% 0.18–0.94), p-value = 0.035). Tumor subtype specific analyses also revealed that, among the genes that showed no evidence of an association with overall breast cancer risk, two genes showed some evidence of an association with ER-negative breast cancer, namely DNAH11 (OR = 3.00 (CI 95% 1.42–6.32), p-value = 3.88 × 10−3, p-value for difference with ER-positive disease is 0.012) and PARP2 (OR = 6.89 (CI 95% 1.11–42.78), p-value = 0.038), while LAMC3 (OR = 3.14 (CI 95% 1.20–8.20), p-value = 0.020), MTMR11 (OR = 1.49 (CI 95% 1.03–2.17), p-value = 0.037), EPN3 (OR = 1.97 (CI 95% 1.03–3.76), p-value = 0.039), and SLC22A10 (OR = 3.70 (CI 95% 1.01–13.57), p-value = 0.048) showed evidence of an association with ER-positive disease (Figure 3, Table 2).

3.2.2. Rare Missense Variants

There was modest evidence of an association with overall breast cancer risk for rare missense variants (Table 2) in three genes: TMEM161A (OR = 2.56 (CI 95% 1.19–5.51), p-value = 0.016), SIPA1L1 (OR = 1.68 (CI 95% 1.06–2.67), p-value = 0.026), and ERCC2 (OR = 1.45 (CI 95% 1.01–2.08), p-value = 0.047). Missense variants in TYRO3 showed evidence of a negative association with risk (OR = 0.29 (CI 95% 0.10–0.89), p-value = 0.031), in contrast with the effects observed for LoF variants. Eleven missense variants were observed for this gene, each only once, either in cases or in controls; the exception was variant p.R750C rs145529129, which was predominantly observed in unaffected controls, and thus, appears to explain the majority of the observed protective association. Of the three genes associated with overall breast cancer risk, TMEM161A was also associated with ER-positive breast cancer risk (OR = 2.73 (CI 95% 1.11–6.71), p-value = 0.029). ER-subtype-specific analyses further revealed associations with ER-negative disease for SLC22A10 (OR = 20.76 (CI 95% 2.03–211.95), p-value = 0.011), PHAX (OR = 24.27 (CI 95% 2.42–243.37), p-value = 6.69 × 10−3), SMARCA2 (OR = 4.92 (CI 95% 1.47–16.48), p-value = 9.69 × 10−3), EML5 (OR = 2.74 (CI 95% 1.25–6.01), p-value = 0.012), NTRK3 (OR = 3.17 (CI 95% 1.11–9.11), p-value = 0.032), and MED23 (OR = 2.29 (CI 95% 1.01–5.18), p-value = 0.048), while RNF175 (OR = 2.59 (CI 95% 1.12–6.00), p-value = 0.026) and NCKAP1L (OR = 2.57 (CI 95% 1.12–5.91), p-value = 0.027) were associated with ER-positive breast cancer. As mentioned in the previous section, we observed modest evidence of a negative association with overall breast cancer risk for LoF variants in EML5, which is in contrast with the observed association of missense variants in this gene with ER-negative disease.

3.2.3. Combined Analysis of LoF and Rare Missense Variants

The combined analysis of LoF and rare missense variants (Table 2) identified a new association for ABCC2 (p-value = 0.038) with overall breast cancer, and also identified additional distinct associations for two genes for which evidence of associations had been observed in either separate LoF or missense variants analyses, namely, a modest association of SMARCA2 with overall breast cancer (OR = 2.43 (CI 95% 1.01–5.82), p-value = 0.046), and one for TMEM206/PACC1 with ER-negative breast cancer (OR = 2.26 (CI 95% 1.13–4.51), p-value = 0.021). Moreover, the combined analysis also identified associations (p < 0.05), which had also been observed in separate LoF or missense variants analyses, for the following genes: ZFAND1 (p-value = 0.014), TMEM206/PACC1 (p-value = 0.011), TMEM161A (p-value = 0.014), and SIPA1L1 (p-value = 0.026) with overall breast cancer; ZFAND1 (p-value = 1.78 × 10−3), DNAH11 (p-value = 0.037), SLC22A10 (p-value = 0.011), PHAX (p-value = 5.50 × 10−3), SMARCA2 (p-value = 3.40 × 10−3), and NTRK3 (p-value = 0.032) with ER-negative breast cancer subtype; and MTMR11 (p-value = 0.032), TMEM161A (p-value = 0.015), RNF175 (p-value = 0.031), and NCKAP1L (p-value = 0.048) for ER-positive breast cancer subtype.

3.2.4. Individual LoF and Missense Variant Analysis

As mentioned in the Materials and Methods section, the main association analysis performed in our study was a gene burden analysis. This approach increases the power to detect an association, and has been used by a vast majority of association studies, including those recently published by Dorling et al. [18] and Hu et al. [27]. Nevertheless, it has been shown that single variants can show evidence of association when a weaker signal is obtained at the gene level [28,29]. We therefore examined whether certain individual LoF or missense variants with higher frequencies showed evidence of association, and whether they were driving the observed associations for a given gene. In this analysis, we only examined individual variants observed in at least three carriers, at least two of which were observed in breast cancer cases. As shown in Supplementary Table S11A, a total of seven LoF variants in six different genes showed evidence of association with overall breast cancer (nominal p < 0.05). Three of these variants were observed in known breast cancer susceptibility genes, one in ATM (c.1564_1565delGA, OR = 8.77 (CI 95% 1.11–69.29), p-value = 0.040), which is reported as pathogenic in the ClinVar database [30], and two in CHEK2 (c.1100delC, OR = 3.08 (CI 95% 2.01–4.3), p-value = 2.83 × 10−7; c.444 + 1G > A, OR = 8.12 (CI 95% 1.03–64.13), p-value = 0.047). Three other LoF variants showing evidence of association in our single variant analysis were located in the top three candidate genes identified through the gene burden test: ZFAND1 (c.139-3_139delCAGG, OR = 1.76 (CI 95% 1.11–2.81), p-value = 0.017), TMEM206/PACC1 (c.631delC, OR = 1.81 (CI 95% 1.15–2.86), p-value = 0.011), and TYRO3 (c.308_308 + 1insCCTGAAGTCA, OR = 1.55 (CI 95% 1.15–2.09), p-value = 3.93 × 10−3). These three variants have a much higher frequency than other such variants identified in these genes. They clearly contribute to the association observed at the gene level. Lastly, one LoF variant in the TRPM4 gene had a very high OR, while no evidence of association was observed for this gene when all LoF variants were analyzed in aggregate (TRPM4, c.2254C > T, OR = 10.64 (CI 95% 1.37–82.46), p-value = 0.024). A similar analysis performed on rare missense variants (Table S11B) identified only three variants with nominally significant p-values: one in ATM (p.L2307F, OR = 5.94 (CI 95% 2.06–17.16), p-value = 9.85 × 10−4), one in PMS2 (p.I18T, OR = 4.74 (CI 95% 1.02–22.01), p-value = 0.047), and one in RIC1 (p.R635H, OR = 6.07 (CI 95% 1.36–27.17), p-value = 0.018). With the exception of the ATM gene, PMS2 and RIC1 showed no evidence of association in the gene burden test. Even though these variants showed a deleteriousness score according to the BayesDel algorithm, caution should be taken when interpreting missense variants because their impact may differ greatly depending on their position in the gene (e.g., conserved or functional domains).

4. Discussion

Our study identified 20 new genes showing modest evidence of an association either with overall disease and/or with ER-positive breast cancer or ER-negative breast cancer (p-value < 0.05). These results are based on the analysis of LoF variants and rare missense variants. Of these twenty, three showed evidence of association with LoF variants for overall breast cancer (ZFAND1, TYRO3, TMEM206/PACC1), and a further six with breast cancer subtypes (DNAH11, PARP2, LAMC3, MTMR11, EPN3, SLC22A10). When missense variants were assessed, three further associations were observed for overall breast cancer (TMEM161A, SIPA1L1, ERCC2), and a further seven were observed for subtypes (RNF175, NCKAP1L, PHAX, SMARCA2, EML5, NTRK3, MED23). One additional association was observed when missense and LoF variants were combined (ABCC2). Detailed information on these 20 genes, including a description of the known functions and pathways in which they are involved, as well as the number of LoF and missense variants observed in the different analyses, are provided in Supplementary Tables S2–S5. It should be noted that none of the genes reached the p < 10−4 threshold in the replication stage, suggested for candidate genes [4]. Moreover, the number of associations observed at the p < 0.05 level for each analysis is actually less than the number that would be expected by chance (10, given 198 genes). Thus, larger replication studies will be required to confirm or refute these associations.
Among these genes, ZFAND1, TYRO3, and TMEM206/PACC1 showed evidence of association with LoF variants for overall breast cancer. In ZFAND1, one variant (c.139-3_139delCAGG, rs1260212806) explained the majority of the observed association. It is located in a splice site junction (acceptor site). This variant was observed only in the German study, so it could be more frequent in this population. The TMEM206/PACC1 gene is involved in the progression of colorectal cancer by accelerating cell proliferation and promoting cell migration and invasion [31]. Five LoF variants were identified in this gene, one of which was observed in 93% of the carriers (c.631delC, rs532279691). TYRO3 has been found to be upregulated in various cancers, including AML, CML, multiple myeloma, melanoma, as well as uterine endometrial cancers [32,33,34]. Eight LoF variants were identified in TYRO3, including one recurrent variant c.308_308 + 1insCCTGAAGTCA (rs773930671), accounting for 85% of carriers of LoF variants in this gene. Our analysis of missense variants in TYRO3 revealed evidence of a protective effect, which is opposite to that observed for LoF variants. Eleven rare missense variants were identified in our study, the majority only being observed once, and seven of them only observed in controls.
Tumor subtype-specific analyses were performed and, as described below, some evidence of ER-specific associations was observed. It should be noted that ER status was only available for 3572 (57.51%) breast cancer cases (808 ER-negative cases and 2764 ER-positive cases). Tumor subtype analyses of LoF variants revealed evidence of association for ZFAND1, TYRO3, DNAH11, and PARP2 with ER-negative breast cancer, with ZFAND1 showing the strongest association (OR = 2.96 (CI 95% 1.59–5.50), p-value = 6.37 × 10−4). The observed association for this gene is mainly due to the variant rs1260212806, previously mentioned for breast cancer overall. A few reports have linked variants (other than those observed in the current study) in DNAH11 with breast cancer risk, but these reports were based on comparatively small sample sizes (c.2081_2082del (p.Val694Glyfs*2)) [35] and studies involving specific populations (G > A (rs2494938-India)) [36]. More recently, large-scale GWAS studies performed by our group through the Breast Cancer Association Consortium (BCAC) identified a common variant located at the 3′-UTR region of this gene (rs7971) to be statistically significantly associated with breast cancer risk (p-value = 1.9 × 10−8) [3]. Although further validation is needed, these observations seem to indicate that this locus may comprise both common and rare variants, with some evidence of association with breast cancer risk.
On the other hand, evidence of association with ER-positive disease with regard to LoF variants was observed for MTMR11, LAMC3, and EPN3. For MTMR11, the observed association is mainly due to one recurrent variant, c.14_15insG (rs587606143), which accounts for 94% of carriers of a LoF variant in this gene. Interestingly, a locus at 1q12-q21, which includes MTMR11, was identified by GWAS as being associated with mammographic density (rs11205303, OR = 0.73 (95% CI = 0.66–0.80), p = 2.64 × 10−11) [37]. Large-scale GWAS and fine-mapping studies performed in the BCAC have shown that common variants at 1q21.2 are significantly associated with overall and ER-positive breast cancer risk (overall breast cancer p-value = 9 × 10−14; ER + breast cancer p-value = 1 × 10−12) [3,38]. One study has also reported altered expression levels of MTMR11 in breast cancer cells [39]. These observations make this locus an interesting candidate for further validation and characterization with regard to breast cancer. Although a modest association was observed in the current study for SLC22A10, the frequency of rare variants in cases and controls was very low for this gene (Tables S3–S5).
In addition to clinical estrogen receptor subtyping, analyses performed on the molecular subtypes of breast cancer, namely, luminal A, luminal B, HER2-enriched, basal-like, and normal-like, would have been relevant in the context of this analysis, given the recognized heterogeneity of ER-positive and ER-negative tumors. However, the limited number of samples for which we had data did not allow us to perform a meaningful analysis of these molecular subtypes.
A number of genes showed an excess of missense variants (MAF < 0.01) classified as deleterious using bioinformatics in silico methods, without showing a corresponding excess of LoF variants. The best putative candidates in this group were TMEM161A, ERCC2, and SIPA1L1 for overall breast cancer, RNF175 and NCKAP1L for ER-positive disease, and PHAX, SMARCA2, NTRK3, EML5, and MED23 for ER-negative disease. Because the observed effects did not reach the threshold significance level following the adjustment of probability for multiple testing, we cannot exclude the possibility that the excess of missense variants, in the absence of enrichment for LoF variants, may be a false-positive result. However, these findings may also suggest that some of these genes are intolerant of LoF variants. When looking at the best candidate genes for overall breast cancer, pLI (probability of being loss-of-function intolerant) scores and observed/expected (o/e) scores available on gnomAD seem to suggest that this may be the case for SIPAL1, with pLI = 1, o/e = 0.09 (0.05–0.17). Interestingly, we did not observe any LoF variants in this gene in any of our analyses. On the other hand, ERCC2 (pLI = 0, o/e = 0.82 (0.62–1.09)) and TMEM161A (pLI = 0; o/e = 0.53 (0.35–0.82)) scores seem to indicate that these genes are not LoF intolerant. For ER-subtype-specific associations, LoF intolerant scores were observed for NCKAP1L (pL1 = 0.85, o/e = 0.2 (0.13–0.33)), SMARCA2 (pL1 = 1, o/e = 0.12 (0.08–0.2)), and NTRK3 (pL1 = 1, o/e = 0.12 (0.06–0.26)) (https://gnomad.broadinstitute.org, accessed on 9 February 2022). Among these genes, ERCC2 is of particular interest, as it plays an important role in the nucleotide excision repair pathway, and common polymorphisms in this gene have been associated with risk of various types of cancers [40]. The evidence supporting ERCC2 as a more penetrant breast cancer susceptibility gene, however, is not convincing [41].
Our study also included the analysis of 32 well-established or suspected susceptibility genes included in breast cancer gene panels. For many genes in such panels, the evidence of an association with cancer is often weak, and, therefore, further validation is required to confirm their contribution to breast cancer risk [4]. The current validation study successfully detected confirmed breast cancer susceptibility genes such as ATM, CHEK2, and PALB2. The ORs calculated in our study of these frequently mutated genes are similar to those calculated in recent publications [8,18,42,43,44,45] according to analyses of LoF and rare missense variants. Similar to the current analysis, the study by Dorling et al. [18] did not detect a statistically significant association of missense variants in PALB2 (OR = 0.96 (95% CI 0.87–1.06)) with breast cancer risk. It should be noted, however, that there is a significant overlap of the current study with the Dorling et al. dataset, which may contribute towards explaining these observations. For the remaining suspected susceptibility genes, including genes confirmed to be associated with breast cancer risk or predisposition to specific cancer syndromes, as studied by Lee et al. [46], Dorling et al. [18], and/or Hu et al. [27] (BARD1, CDH1, PTEN, RAD51C, RAD51D, STK11, TP53), the frequencies of LoF and rare missense variants were too low in our study to detect or support an association with breast cancer risk.

Limitations and Weaknesses

It should be noted that this study has some limitations. Firstly, the discovery step involved WES of samples from breast cancer cases followed by analysis of the resulting data with available WES or WGS control data. This study design was motivated by the possibility of taking advantage of several available datasets. However, many factors resulting from this choice may have affected our ability to identify likely candidate genes at the discovery step, such as: (1) The matched control data were derived from different exome enrichment methods. These methods do not have the same efficiency in terms of probe hybridization and exon capture. This may have contributed to uneven exon sequencing depth, which in turn could have affected the identification of variants [47]; (2) For the GoNL control dataset, exome data were extracted from WGS data. Although it is recognized that WGS outperforms WES, and provides a higher proportion of covered transcripts at a lower sequencing coverage and no strand bias [48], the 13× sequencing coverage for these samples may not have been sufficient for the purpose of identifying rare variants. In an attempt to minimize these impacts, several calling scenarios for the joint genotyping of case and control samples, taking into consideration ethnicity and capture technology, were performed in order to ensure that the most comprehensive variety of calling scenarios were covered. Five different variant filtering and prioritization strategies were also performed in an attempt to build the best candidate gene list for the validation step. Despite this, it cannot be excluded that insufficient or non-uniform sequencing coverage across exons between case and controls [49], batch effects, and population or cryptic stratification [50] could have had an impact on variant calling, which may not have been resolved at the bioinformatics level [48]. We also acknowledge the limitations of the currently available in silico tools for the prediction of pathogenic missense variants, and how these limitations could have impacted our gene selection.
Secondly, while the replication stage was quite large, important susceptibility may not have been selected at the discovery stage due to the limitations mentioned above. Moreover, although the validation stage was of a large enough scale to identify genes with effect sizes similar to those of CHEK2, ATM, and PALB2, the power was too low to detect genes with the characteristics of RAD51C, RAD51D, or BARD1. Indeed, the associations with LoF variants in these genes were all non-significant, despite the fact that the estimated ORs (~2) were comparable with those previously reported. Thus, if a susceptibility gene with the characteristics of the latter was among those identified at the discovery stage, the validation step would not have had sufficient power to confirm it. We therefore conclude that it is unlikely that any genes in the replication set with combined LoF frequencies comparable with ATM or CHEK2 are associated with moderate risk, but that moderate-risk genes with lower LoF frequencies could be present.
Finally, our study was based on WES, which does not consider non-coding regulatory regions or large genomic rearrangements. Regulatory variants have been shown to contribute to the heritability of several diseases. In particular, the common susceptibility variants for breast cancer identified through GWAS are mostly located in regulatory regions [51].

5. Conclusions

We found nominal evidence of association with breast cancer overall for LoF variants in three genes (ZFAND1, TMEM206/PACC1, and TYRO3), with ER-negative breast cancer in two genes, DNAH11 and PARP2, and with ER-positive disease in four genes LAMC3, MTMR11, EPN3, and SLC22A10. Analysis of rare missense variants identified evidence of associations of TMEM161A, SIPA1L1, and ERCC2 with overall breast cancer, RNF175 and NCKAP1L with ER-positive disease, and SLC22A10, PHAX, SMARCA2, EML5, NTRK3, and MED23 with ER-negative disease. Our study design shows that we had the power to detect genes with effect sizes similar to some confirmed breast cancer susceptibility genes, such as ATM and CHEK2. The fact that we did not identify similar novel genes suggests that the remaining breast cancer susceptibility could be explained by lower frequency variants. The outcome of our study thus provides crucial information with regards to the planning of future sequencing efforts.
In order to efficiently identify novel lower frequency variants, larger collaborative sequencing efforts will be needed. The data generated in the current project will most certainly be useful for combining with other similar large-scale breast cancer studies in order to obtain better power to detect such variants in moderate-risk genes.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/cancers14143363/s1. Supplementary Methods, Figure S1: Quantile–quantile plot of the observed vs. expected chi-square statistics for (A) loss-of-function and (B) missense variants in the analysis of 230 target genes using logistic regression adjusted for the origin cohort. Each circle represents the chi-square statistic for a gene. The red diagonal line represents the predicted association statistics under the global null hypothesis of no association. Table S1: Description of studies included in the discovery and validation analyses. Table S2: Selection of genes for the validation stage according to gene prioritization strategies. Table S3: List of (A) loss-of-function variants and (B) rare missense variants identified in overall breast cancer at the validation stage. Table S4: List of (A) loss-of-function variants and (B) missense variants identified in ER-negative breast cancer cases at the validation stage. Table S5: List of (A) loss-of-function variants and (B) missense variants identified in ER-positive breast cancer cases at the validation stage. Table S6: Associations, by study, of loss-of-function and rare missense variants in 32 genes included in commercial breast cancer susceptibility gene panels and breast cancer risk. OR denotes odds ratio and CI the confidence interval. Table S7: Associations of rare missense variants in 32 genes included in commercial breast cancer gene panels and breast cancer risk overall (6211 cases, 6019 controls), ER-negative (808 cases, 6019 controls), and ER-positive breast cancer risk (2764 cases, 6019 controls). OR denotes odds ratio and CI the confidence interval. Table S8: Associations of (A) loss-of-function variants, (B) rare missense variants, and (C) combined loss-of-function and rare missense variants in 178 genes identified at the discovery stage (6211 cases, 6019 controls) with overall breast cancer risk and ER-negative and ER-positive breast cancer risk. OR denotes odds ratio and CI the confidence interval. Table S9: Associations with breast cancer risk of loss-of-function and rare missense variants in 178 genes identified at the discovery stage, by study. OR denotes odds ratio and CI the confidence interval. Table S10: Associations of loss-of-function and rare missense variants in 178 genes identified at the discovery stage with (A) ER-negative breast cancer risk (808 cases, 6019 controls) and (B) ER-positive breast cancer risk (2764 cases, 6019 controls). OR denotes odds ratio and CI the confidence interval. Table S11: Associations of individual (A) loss-of-function and (B) rare missense variants, observed in at least three carriers, among breast cancer cases and controls, that included at least two breast cancer cases, with overall breast cancer. References [52,53,54,55,56,57,58,59,60,61,62,63,64,65] are cited in the supplementary materials.

Author Contributions

Conceptualization: J.S., D.F.E., D.E.G., E.H., M.K.S., A.D., R.K.S., A.M., G.D.B., I.L.A. and S.T.; methodology: M.D., J.S., C.J.-B., M.V., B.-J.F. and H.T.; software: B.-J.F.; data curation: F.F., M.K.B., J.A. (Jamie Allen) and Q.W. (Qin Wang); investigation: N.W.-L., S.D., A.-C.C.-D., J.A. (Janine Altmüller) and P.N.; formal analysis: M.D., C.J.-B., N.W.-L., C.E. (Corinna Ernst), M.V., F.F., E.H., S.C. (Sara Carvalho), A.L., B.-J.F., M.H., D.F.E., D.E.G., R.B., A.M., G.D.B., M.R.D. and R.K.S.; resources: M.A.A., N.A., M.G.E.M.A., S.B., S.C. (Sten Cornelissen), A.M.D., C.E. (Christoph Engel), A.G., W.R.R.G.-G., C.G., J.G., K.H., F.B.L.H., A.H., M.J.H., J.H. (Judit Horváth), M.A.I., S.K., R.K., D.K., C.L., W.M., J.W.M.M., D.N., C.-E.O., A.P., P.D.P.P., A.R., J.R., S.R.-H., G.S., M.S. (Martin Scherer), M.S. (Mitul Shah), A.S., T.M.S., C.S., C.J.v.A., L.v.d.K., R.B.v.d.L., A.E.V., M.W., Q.W. (Quinten Waisfisz), S.W.-G., B.H.F.W., N.W.-L., P.D., I.L.A., M.K.S., A.M., E.H., R.K.S. and D.F.E.; writing—original draft preparation: M.D., P.S., J.S. and M.V.; writing—review and editing: all authors; funding acquisition: J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was part of: The PERSPECTIVE project, which was supported by the Government of Canada through Genome Canada (#13529) and the Canadian Institutes of Health Research (GPH-129344), the Ministère de l’Économie, de la Science et de l’Innovation du Québec through Genome Quebec, and the Quebec Breast Cancer Foundation; The PRE3VENTION project, which was supported by a grant from the Ministère de l’Économie, de la Science et de l’Innovation du Québec through the PSR-SIIRI-949 program; The PERSPECTIVE I&I project, funded by the Government of Canada through Genome Canada and the Canadian Institutes of Health Research, the Ministère de l’Économie et de l’Innovation du Québec through Genome Québec, the Quebec Breast Cancer Foundation. Part of this work was also supported by NRNB (U.S. National Institutes of Health, National Center for Research Resources grant number P41 GM103504). Study specific funding information is provided in Supplementary Materials.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of CHU de Québec–Université Laval (project # 2014-1201, A13-10-1201 approved 24-10-2013; project # 2015-1866, A14-11-1866 approved 14-11-2014).

Informed Consent Statement

All study subjects were recruited on protocols approved by the Institutional Review Boards at each participating institution. Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The complete dataset will not be made publicly available due to restraints imposed by the ethics committees of individual studies; requests for data can be made to the corresponding author or to the Data Access Coordination Committee (DACCs) of the Breast Cancer Association Consortium (https://bcac.ccge.medschl.cam.ac.uk/, accessed on 1 June 2022).

Acknowledgments

We thank all the individuals who took part in these studies, and all the researchers, clinicians, technicians, and administrative staff who have enabled this work to be carried out. Specific thanks: The authors would like to thank Martine Tranchant for sample management and skillful technical assistance. DFBBCS and GC-HBOC: The authors thank all study participants. DFBBCS would like to thank Hanne Meijers–Heijboer. The Rotterdam Study: The authors are grateful to the study participants, the staff from the Rotterdam Study, and the participating general practitioners and pharmacists. The authors would like to thank the initial principal investigator of this study, Albert Hofman. We thank the members of the Genomics Laboratory and the ERGO support team for their help in sampling the data and in creating the database. GoNL: This study makes use of data generated by the Genome of the Netherlands Project. A full list of the investigators is available from www.nlgenome.nl. Samples were contributed by LifeLines (http://lifelines.nl/lifelines-research/general, accessed on 1 June 2022), The Leiden Longevity Study (http://www.healthy-ageing.nl, accessed on 1 June 2022), The Netherlands Twin Registry (NTR: http://www.tweelingenregister.org, accessed on 1 June 2022), The Rotterdam study, (http://www.erasmus-epidemiology.nl/research/ergo.htm, accessed on 1 June 2022), and The Genetic Research in Isolated Populations program (http://www.epib.nl/research/geneticepi/research.html#gip, accessed on 1 June 2022). The sequencing was carried out in collaboration with the Beijing Institute for Genomics (BGI). AgeCoDe: Members of the AgeCoDe Study Group: Wolfgang Maier, Martin Scherer, Heinz-Harald Abholz, Cadja Bachmann, Horst Bickel, Wolfgang Blank, Hendrik van den Bussche, Sandra Eifflaender-Gorfer, Marion Eisele, Annette Ernst, Angela Fuchs, Kathrin Heser, Frank Jessen, Hanna Kaduszkiewicz, Teresa Kaufeler, Mirjam Köhler, Hans-Helmut König, Alexander Koppara, Carolin Lange, Hanna Leicht, Tobias Luck, Melanie Luppa, Manfred Mayer, Edelgard Mösch, Julia Olbrich, Michael Pentzek, Tina Posselt, Jana Prokein, Anna Schumacher, Steffi Riedel-Heller, Susanne Röhr, Janine Stein, Susanne Steinmann, Franziska Tebarth, Michael Wagner, Klaus Weckbecker, Dagmar Weeg, Jochen Werle, Siegfried Weyerer, Birgitt Wiese, Steffen Wolfsgruber, Thomas Zimmermann. Hendrik van den Bussche (2002–2011). AgeCoDe thanks all participating patients and their general practitioners for their collaboration. KORA Study: The KORA Study Group consists of A. Peters (speaker), L. Schwettmann, R. Leidl, M. Heier, B. Linkohr, H. Grallert, C. Gieger, and their co-workers, who are responsible for the design and conduct of the KORA studies. SEARCH: SEARCH thanks The SEARCH and EPIC teams. OFBCR: The Ontario Familial Breast Cancer Registry thanks Gord Glendon, Teresa Selander, and Nayana Weerasooriya. The content of this manuscript does not necessarily reflect the views or policies of the National Cancer Institute or any of the collaborating centers in the Breast Cancer Family Registry (BCFR), nor does the mention of trade names, commercial products, or organizations imply endorsement by the USA Government or the BCFR. The GHS Study Group is part of the Gutenberg Health Study. The authors would like to thank Sylvie Laboissière at McGill University and Genome Quebec Innovation Center for providing sequencing services and for helpful advice and technical support.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  1. Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef] [PubMed]
  2. Michailidou, K.; Hall, P.; Gonzalez-Neira, A.; Ghoussaini, M.; Dennis, J.; Milne, R.L.; Schmidt, M.K.; Chang-Claude, J.; Bojesen, S.E.; Bolla, M.K.; et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat. Genet. 2013, 45, 353–361. [Google Scholar] [CrossRef] [PubMed]
  3. Michailidou, K.; Lindström, S.; Dennis, J.; Beesley, J.; Hui, S.; Kar, S.; Lemaçon, A.; Soucy, P.; Glubb, D.; Rostamianfar, A.; et al. Association analysis identifies 65 new breast cancer risk loci. Nature 2017, 551, 92–94. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Easton, D.F.; Pharoah, P.D.; Antoniou, A.C.; Tischkowitz, M.; Tavtigian, S.V.; Nathanson, K.L.; Devilee, P.; Meindl, A.; Couch, F.J.; Southey, M.; et al. Gene-panel sequencing and the prediction of breast-cancer risk. N. Engl. J. Med. 2015, 372, 2243–2257. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Kiezun, A.; Garimella, K.; Do, R.; Stitziel, N.O.; Neale, B.M.; McLaren, P.J.; Gupta, N.; Sklar, P.; Sullivan, P.F.; Moran, J.L.; et al. Exome sequencing and the genetic basis of complex traits. Nat. Genet. 2012, 44, 623–630. [Google Scholar] [CrossRef] [Green Version]
  6. Chandler, M.R.; Bilgili, E.P.; Merner, N.D. A Review of whole-exome sequencing efforts toward hereditary breast cancer susceptibility gene discovery. Hum. Mutat. 2016, 37, 835–846. [Google Scholar] [CrossRef]
  7. Zelli, V.; Compagnoni, C.; Cannita, K.; Capelli, R.; Capalbo, C.; Di Vito Nolfi, M.; Alesse, E.; Zazzeroni, F.; Tessitore, A. Applications of Next Generation Sequencing to the Analysis of Familial Breast/Ovarian Cancer. High Throughput 2020, 9, 1. [Google Scholar] [CrossRef] [Green Version]
  8. Lu, H.M.; Li, S.; Black, M.H.; Lee, S.; Hoiness, R.; Wu, S.; Mu, W.; Huether, R.; Chen, J.; Sridhar, S.; et al. Association of Breast and Ovarian Cancers with Predisposition Genes Identified by Large-Scale Sequencing. JAMA Oncol. 2019, 5, 51–57. [Google Scholar] [CrossRef]
  9. Kast, K.; Rhiem, K.; Wappenschmidt, B.; Hahnen, E.; Hauke, J.; Bluemcke, B.; Zarghooni, V.; Herold, N.; Ditsch, N.; Kiechle, M.; et al. Prevalence of BRCA1/2 germline mutations in 21,401 families with breast and ovarian cancer. J. Med. Genet. 2016, 53, 465–471. [Google Scholar] [CrossRef] [Green Version]
  10. Schmidt, M.K.; Hogervorst, F.; van Hien, R.; Cornelissen, S.; Broeks, A.; Adank, M.A.; Meijers, H.; Waisfisz, Q.; Hollestelle, A.; Schutte, M.; et al. Age- and Tumor Subtype-Specific Breast Cancer Risk Estimates for CHEK2*1100delC Carriers. J. Clin. Oncol. 2016, 34, 2750–2760. [Google Scholar] [CrossRef] [Green Version]
  11. Kriege, M.; Hollestelle, A.; Jager, A.; Huijts, P.E.; Berns, E.M.; Sieuwerts, A.M.; Meijer-van Gelder, M.E.; Collée, J.M.; Devilee, P.; Hooning, M.J.; et al. Survival and contralateral breast cancer in CHEK2 1100delC breast cancer patients: Impact of adjuvant chemotherapy. Br. J. Cancer 2014, 111, 1004–1013. [Google Scholar] [CrossRef] [PubMed]
  12. Schmidt, M.K.; Tollenaar, R.A.; de Kemp, S.R.; Broeks, A.; Cornelisse, C.J.; Smit, V.T.; Peterse, J.L.; van Leeuwen, F.E.; Van’t Veer, L.J. Breast cancer survival and tumor characteristics in premenopausal women carrying the CHEK2*1100delC germline mutation. J. Clin. Oncol. 2007, 25, 64–69. [Google Scholar] [CrossRef] [PubMed]
  13. Ikram, M.A.; Brusselle, G.; Ghanbari, M.; Goedegebure, A.; Ikram, M.K.; Kavousi, M.; Kieboom, B.C.T.; Klaver, C.C.W.; de Knegt, R.J.; Luik, A.I.; et al. Objectives, design and main findings until 2020 from the Rotterdam Study. Eur. J. Epidemiol. 2020, 35, 483–517. [Google Scholar] [CrossRef] [PubMed]
  14. Boomsma, D.I.; Wijmenga, C.; Slagboom, E.P.; Swertz, M.A.; Karssen, L.C.; Abdellaoui, A.; Ye, K.; Gurvey, V.; Vermaat, M.; Van Dijrk, F.; et al. The Genome of the Netherlands: Design, and project goals. Eur. J. Hum. Genet. 2013, 22, 221–227. [Google Scholar] [CrossRef] [Green Version]
  15. Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet. 2014, 46, 818–825. [Google Scholar] [CrossRef]
  16. Jessen, F.; Wiese, B.; Bickel, H.; Eiffländer-Gorfer, S.; Fuchs, A.; Kaduszkiewicz, H.; Köhler, M.; Luck, T.; Mösch, E.; Pentzek, M.; et al. Prediction of dementia in primary care patients. PLoS ONE 2011, 6, e16852. [Google Scholar] [CrossRef]
  17. Wichmann, H.E.; Gieger, C.; Illig, T.; MONICA/KORA Study Group. KORA-gen-resource for population genetics, controls and a broad spectrum of disease phenotypes. Das Gesundh. 2005, 67, S26–S30. [Google Scholar] [CrossRef] [Green Version]
  18. Dorling, L.; Carvalho, S.; Allen, J.; González-Neira, A.; Luccarini, C.; Wahlström, C.; Pooley, K.A.; Parsons, M.T.; Fortuno, C.; Wang, Q.; et al. Breast Cancer Risk Genes—Association Analysis in More than 113,000 Women. N. Engl. J. Med. 2021, 384, 428–439. [Google Scholar] [CrossRef]
  19. Liu, J.; Prager-van der Smissen, W.J.C.; Schmidt, M.K.; Collée, J.M.; Cornelissen, S.; Lamping, R.; Nieuwlaat, A.; Foekens, J.A.; Hooning, M.J.; Verhoef, S.; et al. Recurrent HOXB13 mutations in the Dutch population do not associate with increased breast cancer risk. Sci. Rep. 2016, 6, 30026. [Google Scholar] [CrossRef] [Green Version]
  20. Dunning, A.M.; Healey, C.S.; Pharoah, P.D.; Teare, M.D.; Ponder, B.A.; Easton, D.F. A systematic review of genetic polymorphisms and breast Cancer risk. Cancer Epidemiol. Biomark. Prev. 1999, 8, 843–854. [Google Scholar]
  21. Day, N.; Oakes, S.; Luben, R.; Khaw, K.T.; Bingham, S.; Welch, A.; Wareham, N. EPIC-Norfolk: Study design and characteristics of the cohort. european prospective investigation of Cancer. Br. J. Cancer 1999, 80, 95–103. [Google Scholar] [PubMed]
  22. Kataoka, M.; Antoniou, A.; Warren, R.; Leyland, J.; Brown, J.; Audley, T.; Easton, D. Genetic models for the familial aggregation of mammographic breast density. Cancer Epidemiol. Biomark. Prev. 2009, 18, 1277–1284. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. John, E.M.; Hopper, J.L.; Beck, J.C.; Knight, J.A.; Neuhausen, S.L.; Senie, R.T.; Ziogas, A.; Andrulis, I.L.; Anton-Culver, H.; Boyd, N.; et al. The Breast Cancer Family Registry: An infrastructure for cooperative multinational, interdisciplinary and translational studies of the genetic epidemiology of breast cancer. Breast Cancer Res. 2004, 6, R375–R389. [Google Scholar] [CrossRef] [Green Version]
  24. Awadalla, P.; Boileau, C.; Payette, Y.; Idaghdour, Y.; Goulet, J.P.; Knoppers, B.; Hamet, P.; Laberge, C.; CARTaGENE Project. Cohort profile of the CARTaGENE study: Quebec’s population-based biobank for public health and personalized genomics. Int. J. Epidemiol. 2013, 42, 1285–1299. [Google Scholar] [CrossRef]
  25. Van der Auwera, G.A.; Carneiro, M.O.; Hartl, C.; Poplin, R.; Del Angel, G.; Levy-Moonshine, A.; Jordan, T.; Shakir, K.; Roazen, D.; Thibault, J.; et al. From Fastq Data to High Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline. Curr. Protoc. Bioinform. 2013, 43, 11.10.1–11.10.33. [Google Scholar] [CrossRef]
  26. Feng, B.J. PERCH: A unified framework for disease gene prioritization. Hum. Mutat. 2017, 38, 243–251. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Hu, C.; Hart, S.N.; Gnanaolivu, R.; Huang, H.; Lee, K.Y.; Na, J.; Gao, C.; Lilyquist, J.; Yadav, S.; Boddicker, N.J.; et al. A Population-Based Study of Genes Previously Implicated in Breast Cancer. N. Engl. J. Med. 2021, 384, 440–451. [Google Scholar] [CrossRef] [PubMed]
  28. Lee, S.; Abecasis, G.R.; Boehnke, M.; Lin, X. Rare-variant association analysis: Study designs and statistical tests. Am. J. Hum. Genet. 2014, 95, 5–23. [Google Scholar] [CrossRef] [Green Version]
  29. Liu, D.J.; Peloso, G.M.; Zhan, X.; Holmen, O.L.; Zawistowski, M.; Shuang, F.; Nikpay, M.; Auer, P.L.; Goel, A.; Zhang, H.; et al. Meta-analysis of gene-level tests for rare variant association. Nat. Genet. 2014, 46, 200–204. [Google Scholar] [CrossRef] [Green Version]
  30. Landrum, M.J.; Lee, J.M.; Benson, M.; Brown, G.R.; Chao, C.; Chitipiralla, S.; Gu, B.; Hart, J.; Hoffman, D.; Jang, W.; et al. ClinVar: Improving access to variant interpretations and supporting evidence. Nucleic. Acids Res. 2018, 46, D1062–D1067. [Google Scholar] [CrossRef] [Green Version]
  31. Zhao, J.; Zhu, D.; Zhang, X.; Zhang, Y.; Zhou, J.; Dong, M. TMEM206 promotes the malignancy of colorectal cancer cells by interacting with AKT and extracellular signal-regulated kinase signaling pathways. J. Cell. Physiol. 2019, 234, 10888–10898. [Google Scholar] [CrossRef] [PubMed]
  32. Sun, W.S.; Fujimoto, J.; Tamaya, T. Clinical implications of coexpression of growth arrest-specific gene 6 and receptor tyrosine kinases Axl and Sky in human uterine leiomyoma. Mol. Hum. Reprod. 2003, 9, 701–707. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Linger, R.M.; Keating, A.K.; Earp, H.S.; Graham, D.K. TAM receptor tyrosine kinases: Biologic functions, signaling, and potential therapeutic targeting in human cancer. Adv. Cancer Res. 2008, 100, 35–83. [Google Scholar] [CrossRef] [Green Version]
  34. Zhu, S.; Wurdak, H.; Wang, Y.; Galkin, A.; Tao, H.; Li, J.; Lyssiotis, C.A.; Yan, F.; Tu, B.P.; Miraglia, L.; et al. A genomic screen identifies TYRO3 as a MITF regulator in melanoma. Proc. Natl. Acad. Sci. USA 2009, 106, 17025–17030. [Google Scholar] [CrossRef] [Green Version]
  35. Shahi, R.B.; De Brakeleer, S.; Caljon, B.; Pauwels, I.; Bonduelle, M.; Joris, S.; Fontaine, C.; Vanhoeij, M.; Van Dooren, S.; Teugels, E.; et al. Identification of candidate cancer predisposing variants by performing whole-exome sequencing on index patients from BRCA1 and BRCA2-negative breast cancer families. BMC Cancer 2019, 19, 313. [Google Scholar] [CrossRef] [PubMed]
  36. Verma, S.; Bakshi, D.; Sharma, V.; Sharma, I.; Shah, R.; Bhat, A.; Bhat, G.R.; Bhanu Sharma, B.; Wakhloo, A.; Kaul, S.; et al. Genetic variants of DNAH11 and LRFN2 genes and their association with ovarian and breast cancer. Int. J. Gynaecol. Obs. 2020, 148, 118–122. [Google Scholar] [CrossRef] [Green Version]
  37. Fernandez-Navarro, P.; González-Neira, A.; Pita, G.; Díaz-Uriarte, R.; Tais Moreno, L.; Ederra, M.; Pedraz-Pingarrón, C.; Sánchez-Contador, C.; Antonio Vázquez-Carrete, J.; Moreo, P.; et al. Genome wide association study identifies a novel putative mammographic density locus at 1q12-q21. Int. J. Cancer 2015, 136, 2427–2436. [Google Scholar] [CrossRef] [PubMed]
  38. Fachal, L.; Aschard, H.; Beesley, J.; Barnes, D.R.; Allen, J.; Kar, S.; Pooley, K.A.; Dennis, J.; Michailidou, K.; Turman, C.; et al. Fine-mapping of 150 breast cancer risk regions identifies 191 likely target genes. Nat. Genet. 2020, 52, 56–73. [Google Scholar] [CrossRef]
  39. Lucci, M.A.; Orlandi, R.; Triulzi, T.; Tagliabue, E.; Balsari, A.; Villa-Moruzzi, E. Expression profile of tyrosine phosphatases in HER2 breast cancer cells and tumors. Cell. Oncol. 2010, 32, 361–372. [Google Scholar] [CrossRef]
  40. Wu, K.-G.; He, X.-F.; Li, Y.-H.; Xie, W.-B.; Huang, X. Association between the XPD/ERCC2 Lys751Gln polymorphism and risk of cancer: Evidence from 224 case-control studies. Tumour Biol. 2014, 35, 11243–11259. [Google Scholar] [CrossRef]
  41. Rump, A.; Benet-Pages, A.; Schubert, S.; Kuhlmann, J.D.; Janavičius, R.; Macháčková, E.; Foretová, L.; Kleibl, Z.; Lhota, F.; Zemankova, P.; et al. Identification and Functional Testing of ERCC2 Mutations in a Multi-national Cohort of Patients with Familial Breast- and Ovarian Cancer. PLoS Genet. 2016, 12, e1006248. [Google Scholar] [CrossRef] [PubMed]
  42. Hauke, J.; Horvath, J.; Groß, E.; Gehrig, A.; Honisch, E.; Hackmann, K.; Schmidt, G.; Arnold, N.; Faust, U.; Sutter, C.; et al. Gene panel testing of 5589 BRCA1/2-negative index patients with breast cancer in a routine diagnostic setting: Results of the German Consortium for Hereditary Breast and Ovarian Cancer. Cancer Med. 2018, 7, 1349–1358. [Google Scholar] [CrossRef] [PubMed]
  43. Couch, F.J.; Shimelis, H.; Hu, C.; Hart, S.N.; Polley, E.C.; Na, J.; Hallberg, E.; Moore, R.; Thomas, A.; Lilyquist, J.; et al. Associations between Cancer Predisposition Testing Panel Genes and Breast Cancer. JAMA Oncol. 2017, 3, 1190–1196. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  44. Thompson, E.R.; Rowley, S.M.; Li, N.; McInerny, S.; Devereux, L.; Wong-Brown, M.W.; Trainer, A.H.; Mitchell, G.; Scott, R.J.; James, P.A.; et al. Panel Testing for Familial Breast Cancer: Calibrating the Tension between Research and Clinical Care. J. Clin. Oncol. 2016, 34, 1455–1459. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  45. Aloraifi, F.; McCartan, D.; McDevitt, T.; Green, A.J.; Bracken, A.; Geraghty, J. Protein-truncating variants in moderate-risk breast cancer susceptibility genes: A meta-analysis of high-risk case-control screening studies. Cancer Genet. 2015, 208, 455–463. [Google Scholar] [CrossRef]
  46. Lee, K.; Seifert, B.A.; Shimelis, H.; Ghosh, R.; Crowley, S.B.; Carter, N.J.; Doonanco, K.; Foreman, A.K.; Ritter, D.I.; Jimenez, S.; et al. Clinical validity assessment of genes frequently tested on hereditary breast and ovarian cancer susceptibility sequencing panels. Genet. Med. 2019, 21, 1497–1506. [Google Scholar] [CrossRef]
  47. Ku, C.S.; Cooper, D.N.; Patrinos, G.P. The Rise and Rise of Exome Sequencing. Public Health Genom. 2016, 19, 315–324. [Google Scholar] [CrossRef] [Green Version]
  48. Lelieveld, S.H.; Spielmann, M.; Mundlos, S.; Veltman, J.A.; Gilissen, C. Comparison of Exome and Genome Sequencing Technologies for the Complete Capture of Protein-Coding Regions. Hum. Mutat. 2015, 36, 815–822. [Google Scholar] [CrossRef] [Green Version]
  49. Sims, D.; Sudbery, I.; Ilott, N.E.; Heger, A.; Ponting, C.P. Sequencing depth and coverage: Key considerations in genomic analyses. Nat. Rev. Genet. 2014, 15, 121–132. [Google Scholar] [CrossRef]
  50. Wu, L.; Schaid, D.J.; Sicotte, H.; Wieben, E.D.; Li, H.; Petersen, G.M. Case-only exome sequencing and complex disease susceptibility gene discovery: Study design considerations. J. Med. Genet. 2015, 52, 10–16. [Google Scholar] [CrossRef] [Green Version]
  51. Gusev, A.; Lee, S.H.; Trynka, G.; Finucane, H.; Vilhjálmsson, B.J.; Xu, H.; Zang, C.; Ripke, S.; Bulik-Sullivan, B.; Stahl, E.; et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 2014, 95, 535–552. [Google Scholar] [CrossRef] [Green Version]
  52. Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A Flexible Trimmer for Illumina Sequence Data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  53. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Genomics 2013, 1303, 3997. [Google Scholar]
  54. McKenna, A.; Hanna, M.; Banks, E.; Sivachenko, A.; Cibulskis, A.; Kernytsky, A.; Garimella, K.; Altshuler, D.; Gabriel, S.; Daly, M.; et al. The Genome Analysis Toolkit: A MapReduce Framework for Analyzing Next-Generation DNA Sequencing Data. Genome Res. 2010, 20, 1297–1303. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  55. Wang, K.; Li, M.; Hakonarson, H. ANNOVAR: Functional Annotation of Genetic Variants from High-Throughput Sequencing Data. Nucleic Acids Res. 2010, 38, e164. [Google Scholar] [CrossRef] [PubMed]
  56. 1000 Genomes Project Consortium; Auton, A.; Brooks, L.D.; Durbin, R.M.; Garrison, E.P.; Kang, H.M.; Korbel, J.O.; Marchini, J.L.; McCarthy, S.; McVean, G.A.; et al. A Global Reference for Human Genetic Variation. Nature 2015, 526, 68–74. [Google Scholar] [CrossRef] [Green Version]
  57. Liu, X.; Wu, C.; Li, C.; Boerwinkle, E. dbNSFP V3.0: A One-Stop Database of Functional Predictions and Annotations for Hu-man Nonsynonymous and Splice-Site SNVs. Hum. Mutat. 2016, 37, 235–241. [Google Scholar] [CrossRef] [Green Version]
  58. Lek, M.; Karczewski, K.J.; Minikel, E.V.; Samocha, K.E.; Banks, E.; Fennell, T.; O’Donnell-Luria, A.H.; Ware, J.S.; Hill, A.J.; Cummings, B.B.; et al. Analysis of Protein-Coding Genetic Variation in 60,706 Humans. Nature 2016, 536, 285–291. [Google Scholar] [CrossRef] [Green Version]
  59. McLaren, W.; Gil, L.; Hunt, S.E.; Riat, H.S.; Ritchie, G.R.S.; Thormann, A.; Flicek, P.; Cunningham, F. The Ensembl Variant Effect Predictor. Genome Biol. 2016, 17, 122. [Google Scholar] [CrossRef] [Green Version]
  60. Karczewski, K.J.; Francioli, L.C.; Tiao, G.; Cummings, B.B.; Alföldi, J.; Wang, Q.; Collins, R.L.; Laricchia, K.M.; Ganna, A.; Birnbaum, D.P.; et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 2020, 581, 434–443. [Google Scholar] [CrossRef]
  61. Forbes, S.A.; Beare, D.; Gunasekaran, P.; Leung, K.; Bindal, N.; Boutselakis, H.; Ding, M.; Bamford, S.; Cole, C.; Ward, S.; et al. COSMIC: Exploring the World’s Knowledge of Somatic Mutations in Human Cancer. Nucleic Acids Res. 2015, 43, D805–D811. [Google Scholar] [CrossRef]
  62. Kumar, P.; Henikoff, S.; Ng, P.C. Predicting the Effects of Coding Non-Synonymous Variants on Protein Function Using the Sift Algorithm. Nat. Protoc. 2009, 4, 1073–1081. [Google Scholar] [CrossRef] [PubMed]
  63. Adzhubei, I.; Jordan, D.M.; Sunyaev, S.R. Predicting Functional Effect of Human Missense Mutations Using Polyphen-2. Curr. Protoc. Hum. Genet. 2013, 7, 7–20. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  64. Kircher, M.; Witten, D.M.; Jain, P.; O’Roak, B.J.; Cooper, G.M.; Shendure, J. A General Framework for Estimating the Relative Pathogenicity of Human Genetic Variants. Nat. Genet. 2014, 46, 310–315. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  65. Reimand, J.; Bader, G.D. Systematic Analysis of Somatic Mutations in Phosphorylation Signaling Predicts Novel Cancer Driv-ers. Mol. Syst. Biol. 2013, 9, 637. [Google Scholar] [CrossRef]
Figure 1. Study design for breast cancer susceptibility gene discovery by whole-exome sequencing (A) and validation by targeted enrichment sequencing (B).
Figure 1. Study design for breast cancer susceptibility gene discovery by whole-exome sequencing (A) and validation by targeted enrichment sequencing (B).
Cancers 14 03363 g001
Figure 2. Risk of breast cancer overall and tumor subtypes associated with loss-of-function and missense variants in known or suspected breast cancer susceptibility genes included in breast cancer gene panels. Shown are odds ratios (ORs) and 95% confidence intervals (CIs) for breast cancer overall, estrogen receptor (ER)-negative breast cancer, and ER-positive breast cancer associated with loss-of-function variants (A) and missense variants (B) in genes for which evidence of association was observed in at least one of the analyses. Red circles indicate p-value < 0.05. * no loss-of-function variants were observed in MEN1.
Figure 2. Risk of breast cancer overall and tumor subtypes associated with loss-of-function and missense variants in known or suspected breast cancer susceptibility genes included in breast cancer gene panels. Shown are odds ratios (ORs) and 95% confidence intervals (CIs) for breast cancer overall, estrogen receptor (ER)-negative breast cancer, and ER-positive breast cancer associated with loss-of-function variants (A) and missense variants (B) in genes for which evidence of association was observed in at least one of the analyses. Red circles indicate p-value < 0.05. * no loss-of-function variants were observed in MEN1.
Cancers 14 03363 g002
Figure 3. Risk of breast cancer overall and tumor subtypes associated with loss-of-function and missense variants in a subset of targeted genes identified at the discovery stage. Shown are odds ratios (ORs) and 95% confidence intervals (CIs) for breast cancer overall, estrogen receptor (ER)-negative breast cancer, and ER-positive breast cancer associated with loss-of-function variants (A), missense variants (B), and combined loss-of-function and missense variants (C) in genes for which evidence of association was observed in at least one of the analyses. Red circles indicate p-value < 0.05. * no loss-of-function variants were observed in SIPA1L1 and NTRK3.
Figure 3. Risk of breast cancer overall and tumor subtypes associated with loss-of-function and missense variants in a subset of targeted genes identified at the discovery stage. Shown are odds ratios (ORs) and 95% confidence intervals (CIs) for breast cancer overall, estrogen receptor (ER)-negative breast cancer, and ER-positive breast cancer associated with loss-of-function variants (A), missense variants (B), and combined loss-of-function and missense variants (C) in genes for which evidence of association was observed in at least one of the analyses. Red circles indicate p-value < 0.05. * no loss-of-function variants were observed in SIPA1L1 and NTRK3.
Cancers 14 03363 g003
Table 1. Associations of loss-of-function variants in 32 genes included in breast cancer gene panels with overall breast cancer risk and ER-negative and ER-positive breast cancer risk. Shown are odds ratios (ORs), 95% confidence intervals (CIs), and p-values. Associations with p < 0.05 are denoted in bold.
Table 1. Associations of loss-of-function variants in 32 genes included in breast cancer gene panels with overall breast cancer risk and ER-negative and ER-positive breast cancer risk. Shown are odds ratios (ORs), 95% confidence intervals (CIs), and p-values. Associations with p < 0.05 are denoted in bold.
OverallER-NegativeER-Positive
GeneNumber of
Carriers
Number of
Carriers
OR95% CIp-ValueNumber of
Carriers
OR95% CIp-ValueNumber of
Carriers
OR95% CIp-Value
Controls (n = 6019)Cases (n = 6211)Cases (n = 808)Cases (n = 2764)
AKT1110.890.0614.250.93500.000.00 Inf0.99711.740.1127.900.694
ATM20743.552.165.825.44 × 10−730.890.263.020.851424.062.376.973.39 × 10−7
BARD14112.540.817.980.11022.980.5416.570.21273.130.9110.710.069
BRE/BABAM200----0----0----
BRIP18141.640.693.930.26221.440.306.840.64971.780.644.970.274
CDH1040.000.00Inf0.90620.000.00 Inf0.99410.000.00 Inf0.994
CHEK2321113.472.335.157.62 × 10−1092.651.245.700.012564.442.826.991.23 × 10−10
EPCAM310.330.043.220.3430 0.00 Inf0.99510.600.065.770.658
FAM175A/ABRAXAS100----0----0----
FANCC15130.860.411.810.69210.630.084.910.66261.000.372.660.995
FANCM39421.030.671.600.89481.520.703.300.295201.090.631.900.749
GEN1231.540.269.260.63617.680.45131.190.15923.560.3239.340.300
MEN100----0----0----
MLH1110.890.0614.250.9350 0.00 Inf0.99711.740.1127.900.694
MRE11A/MRE1100----0----0----
MSH2020.000.00Inf0.9340----10.000.00 Inf0.994
MSH647190.360.210.622.07 × 10−450.550.221.390.204110.420.220.819.00 × 10−3
MUTYH540.780.212.930.71812.060.2120.540.53710.730.077.190.784
NBN27190.650.361.170.15030.610.182.020.416130.900.461.760.766
NF118190.950.501.810.86730.860.252.920.805131.260.622.590.522
PALB210423.951.987.889.68 × 10−576.452.3117.983.72 × 10−4225.272.3911.613.83 × 10−5
PIK3CA00----0----0----
PMS227160.530.290.980.04430.570.171.870.35180.530.241.160.111
PTEN121.780.1619.650.63700.000.00 Inf0.99711.740.1127.900.694
RAD50990.970.382.440.94311.050.138.590.96782.450.906.710.081
RAD51C361.890.477.550.37023.900.6423.880.14121.190.207.140.847
RAD51D372.180.568.450.258511.262.6048.681.19 × 10−310.740.087.310.795
RECQL9121.280.543.050.57600.000.00 Inf0.99061.640.554.920.379
RINT1650.760.232.500.65500.000.00 Inf0.99220.590.122.900.512
STK1100----0----0----
TP5311110.161.3178.670.02615.000.3180.030.25535.240.5450.390.152
XRCC200----0----0----
Table 2. Associations with breast cancer risk of loss-of-function and rare missense variants in top candidate genes identified at the discovery step. Shown are odds ratios (ORs), 95% confidence intervals (CIs), and p-values for breast cancer overall, estrogen receptor (ER)-negative breast cancer, and ER-positive breast cancer. Associations with p < 0.05 are denoted in bold.
Table 2. Associations with breast cancer risk of loss-of-function and rare missense variants in top candidate genes identified at the discovery step. Shown are odds ratios (ORs), 95% confidence intervals (CIs), and p-values for breast cancer overall, estrogen receptor (ER)-negative breast cancer, and ER-positive breast cancer. Associations with p < 0.05 are denoted in bold.
OverallER-NegativeER-Positive
Number of
Carriers
OR95% CIp-ValueNumber of
Carriers
OR95% CIp-ValueNumber of
Carriers
OR95% CIp-Value
Controls
n = 6019
Cases
n = 6211
Cases
n = 808
Cases
n = 2764
(A) Loss-of-function variants
ZFAND131591.731.122.680.014162.961.595.506.37 × 10−4241.500.872.600.146
TMEM206/PACC133561.701.112.620.01682.210.994.930.052211.470.832.610.186
TYRO3891311.401.061.830.016221.661.032.680.038581.340.951.880.092
DNAH1128270.960.561.620.865103.001.426.323.88 × 10−3100.780.371.630.507
PARP2341.430.326.400.64126.891.1142.780.03810.750.087.450.808
LAMC38172.000.864.640.10721.610.337.870.553113.141.208.200.020
MTMR1175951.240.911.680.174111.200.632.290.583491.491.032.170.037
EPN321241.100.611.990.74220.790.183.400.747181.971.033.760.039
SLC22A107101.410.543.720.48411.910.2117.730.57163.701.0113.570.048
TMEM161A121.940.1821.430.59000.000.00 Inf0.99723.560.3239.340.300
SIPA1L100----0----0----
ERCC212120.990.442.200.97210.620.084.830.64650.830.292.380.724
PHAX121.860.1720.520.61200.000.00 Inf0.9970 0.00 Inf0.995
SMARCA2132.850.3027.420.36517.680.45131.190.15911.800.1128.810.678
EML51880.410.180.940.03500.000.00 Inf0.98640.400.141.190.100
NTRK3010.000.00inf0.9300----0----
MED2314201.320.672.620.42431.210.354.270.763111.430.653.170.375
RNF17511211.850.893.850.09921.460.326.750.62871.420.543.720.478
NCKAP1L330.950.194.710.94800.000.00 Inf0.99510.730.077.190.784
ABCC216261.600.862.990.14020.830.193.670.804121.590.743.420.240
(B) Rare missense variants
ZFAND112161.270.602.700.52721.070.234.890.93050.850.292.450.759
TMEM206/PACC17111.480.573.830.41632.370.619.270.21451.290.414.060.669
TYRO31340.290.100.890.03110.510.073.930.51610.150.021.150.068
DNAH111161150.950.741.240.726211.220.761.970.406520.910.651.280.603
PARP21290.730.311.730.47010.590.084.610.61440.690.222.170.524
LAMC321200.930.511.730.82620.690.163.000.620101.060.492.300.888
MTMR1119281.400.782.520.25651.590.594.310.362131.260.622.560.524
EPN318231.260.682.350.46041.980.656.050.23381.150.482.740.751
SLC22A10154.610.5439.460.163320.762.03211.950.01100.000.00 Inf0.995
TMEM161A9242.561.195.510.01632.440.649.260.190112.731.116.710.029
SIPA1L129501.681.062.670.02630.740.222.480.631221.530.872.690.140
ERCC249731.451.012.080.04760.950.402.250.906311.470.922.350.108
PHAX154.990.5842.780.143324.272.42243.376.69 × 10−311.920.1230.680.646
SMARCA26152.360.926.090.07654.921.4716.489.69 × 10−372.230.746.710.153
EML526331.230.732.060.43292.741.256.010.012131.210.612.410.587
NTRK313151.180.562.480.66953.171.119.110.03271.060.422.690.897
MED2325341.310.782.200.30582.291.015.180.048151.300.672.510.434
RNF17511161.430.663.090.36121.340.296.140.710122.591.126.000.026
NCKAP1L10201.920.904.100.09310.650.085.160.686132.571.125.910.027
ABCC257761.290.911.820.15091.140.562.340.714361.360.882.090.163
(C) Combined analysis: loss-of-function and rare missense variants
ZFAND143751.611.12.340.014182.471.404.361.78 × 10−3291.320.822.150.257
TMEM206/PACC140671.671.122.470.011112.261.134.510.021261.430.862.390.169
TYRO31021351.250.971.630.090231.510.952.410.082591.180.851.640.324
DNAH111441420.950.751.210.696311.531.032.290.037620.890.651.210.449
PARP215130.860.411.810.69431.570.445.580.48450.700.251.960.497
LAMC329371.230.762.010.40240.960.332.800.947211.630.912.930.104
MTMR11941231.270.971.670.082161.300.752.240.345621.441.032.010.032
EPN338471.210.791.860.38561.370.573.300.484261.671.002.810.051
SLC22A108151.820.774.310.17145.841.5122.690.01163.120.9210.610.068
TMEM161A10262.501.25.190.01432.260.618.440.224132.831.226.550.015
SIPA1L129501.681.062.670.02630.740.222.480.631221.530.872.690.140
ERCC260831.340.961.880.08370.900.412.010.802351.320.862.040.210
PHAX273.410.7116.40.126313.492.1584.715.50 × 10−311.260.1114.530.854
SMARCA27182.431.015.830.04665.271.7316.053.40 × 10−382.170.786.030.138
EML544410.890.581.360.58191.400.672.910.366170.830.471.470.519
NTRK313161.250.62.60.55253.171.119.110.03271.060.422.690.897
MED2339531.290.851.960.225101.690.833.430.148261.350.822.250.242
RNF17522371.640.972.790.06641.400.474.120.545192.001.073.750.031
NCKAP1L13231.690.863.350.13010.530.074.090.542142.161.014.650.048
ABCC2721021.381.021.870.039111.080.562.060.821481.420.982.070.067
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Dumont, M.; Weber-Lassalle, N.; Joly-Beauparlant, C.; Ernst, C.; Droit, A.; Feng, B.-J.; Dubois, S.; Collin-Deschesnes, A.-C.; Soucy, P.; Vallée, M.; et al. Uncovering the Contribution of Moderate-Penetrance Susceptibility Genes to Breast Cancer by Whole-Exome Sequencing and Targeted Enrichment Sequencing of Candidate Genes in Women of European Ancestry. Cancers 2022, 14, 3363. https://doi.org/10.3390/cancers14143363

AMA Style

Dumont M, Weber-Lassalle N, Joly-Beauparlant C, Ernst C, Droit A, Feng B-J, Dubois S, Collin-Deschesnes A-C, Soucy P, Vallée M, et al. Uncovering the Contribution of Moderate-Penetrance Susceptibility Genes to Breast Cancer by Whole-Exome Sequencing and Targeted Enrichment Sequencing of Candidate Genes in Women of European Ancestry. Cancers. 2022; 14(14):3363. https://doi.org/10.3390/cancers14143363

Chicago/Turabian Style

Dumont, Martine, Nana Weber-Lassalle, Charles Joly-Beauparlant, Corinna Ernst, Arnaud Droit, Bing-Jian Feng, Stéphane Dubois, Annie-Claude Collin-Deschesnes, Penny Soucy, Maxime Vallée, and et al. 2022. "Uncovering the Contribution of Moderate-Penetrance Susceptibility Genes to Breast Cancer by Whole-Exome Sequencing and Targeted Enrichment Sequencing of Candidate Genes in Women of European Ancestry" Cancers 14, no. 14: 3363. https://doi.org/10.3390/cancers14143363

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop