In:
Science, American Association for the Advancement of Science (AAAS), Vol. 380, No. 6648 ( 2023-06-02)
Abstract:
Genome-wide association studies (GWASs) have identified thousands of common genetic variants that are predictive of common disease susceptibility, but these variants individually have mild effects on disease owing to the effects of natural selection. By contrast, rare genetic variants can have large effects on common disease risk, but their use in genetic risk prediction has been limited to date owing to the difficulty of distinguishing pathogenic from benign variants and estimating the magnitude of their effects. RATIONALE PrimateAI-3D is a three-dimensional convolutional neural network for missense variant–effect prediction, which was trained with common genetic variants from the population sequencing of 233 primate species. By applying this method to estimate the pathogenicity of rare coding variants in 454,712 UK Biobank individuals, we aimed to improve rare-variant association tests and genetic risk prediction for common diseases and complex traits. RESULTS We performed rare-variant burden tests for 90 well-powered, clinically relevant phenotypes in the UK Biobank exome dataset. Stratifying missense variants with PrimateAI-3D greatly improved gene discovery, revealing 73% more significant gene-phenotype associations (false discovery rate 〈 0.05) compared with not using PrimateAI-3D. When benchmarked against prior studies, gene-phenotype pairs identified with our method were better supported by orthogonal genetic evidence from GWAS and genes from related Mendelian disorders. In addition, PrimateAI-3D scores showed the strongest correlation among existing variant interpretation algorithms for predicting the quantitative effects of rare variants on continuous clinical phenotypes. Having validated our method for finding gene-phenotype relationships, we next constructed a rare-variant polygenic risk score (PRS) model by combining the rare-variant genes for each phenotype, weighting variants by their PrimateAI-3D prediction score and the direction and effect size of each associated gene. For comparison, we constructed common-variant PRS models and evaluated the performance of the two models for genetic risk prediction in a withheld-test subset of the cohort. Although common variants better explained overall population variance, rare-variant PRSs had more power at the ends of the distribution to identify individuals at the greatest risk for disease, and thus may be more relevant for population genetic screening and risk management. By contrast to common-variant PRS models derived from European populations that show poor generalization to non-Europeans, rare-variant PRSs were substantially more portable to different cohorts and ancestry groups that were not seen during model training. Moreover, because they incorporate orthogonal information from nonoverlapping sets of variants, we combined rare- and common-variant PRS models into a unified model and observed further improvement in genetic risk prediction for common diseases. To understand the extent by which rare-variant PRSs can be expected to improve with increases in discovery cohort size, we repeated our analyses in down-sampled subsets of the UK Biobank cohort. We found that the number of genes contributing to the rare-variant PRS increased linearly, with no signs of plateauing at a half-million exomes. Newly discovered rare-variant genes were strongly enriched at GWAS loci, forming allelic series with effect sizes that were ~10-fold larger on average than the respective common GWAS variant. Among well-powered GWAS loci that could be unambiguously assigned to a single gene, the majority showed subthreshold signal on the rare-variant burden test, indicating that rare penetrant variants exist at a large fraction of GWAS loci and can be incorporated into the rare-variant PRS with further advances in cohort size and variant effect prediction. CONCLUSION Understanding the impact of rare variants in common diseases is of prime interest for both precision medicine and the discovery of drug targets. By leveraging advances in variant effect prediction, we have demonstrated major improvements in rare-variant burden testing and genetic risk prediction. Notably, we observed that nearly all individuals carried at least one rare penetrant variant for the phenotypes we examined, demonstrating the utility of personal genome sequencing for otherwise healthy individuals in the general population. Polygenic contribution of rare genetic variants to complex human traits, shown for serum cholesterol as a representative example. (Left) Rare-variant burden tests capture the direction and effect sizes of genes in known lipid biosynthesis pathways. (Top right) When used in a rare-variant polygenic risk score, individuals at opposite ends of the PRS separate into high- and low-cholesterol groups. (Bottom right) Rare variants in these genes have larger effects compared with common variants identified by GWAS and are strongly predictive of individuals who are phenotypic outliers.
Type of Medium:
Online Resource
ISSN:
0036-8075
,
1095-9203
DOI:
10.1126/science.abo1131
Language:
English
Publisher:
American Association for the Advancement of Science (AAAS)
Publication Date:
2023
detail.hit.zdb_id:
128410-1
detail.hit.zdb_id:
2066996-3
detail.hit.zdb_id:
2060783-0
SSG:
11
Permalink