Introduction

Inherited cardiomyopathies are genetic disorders that cause cellular and/or molecular imbalances that affect the functioning of the cardiac muscle. Different cardiomyopathy subtypes are now recognized by the European Society of Cardiology: hypertrophic cardiomyopathy (HCM), dilated cardiomyopathy (DCM), arrhythmogenic cardiomyopathy (ACM), left ventricular non-compaction cardiomyopathy, and restrictive cardiomyopathy [1]. Cardiomyopathy symptoms can vary from mild symptoms to life-threatening arrhythmias or end-stage heart failure (HF), and the absence or presence of a phenotype is established using a mix of imaging and electrocardiographic techniques. More than 60 genes have now been associated with one or more inherited forms of cardiomyopathy, with all forms characterized by incomplete penetrance and variable expression [2].

One of these genes, the gene encoding phospholamban (PLN), has shown definitive evidence of association with cardiomyopathy [3]. PLN is a regulator of the sarcoplasmic reticulum Ca2+ (SERCA2a) pump and has been implicated in Ca2+ homeostasis in cardiac muscle cells [4]. The pathogenic variant PLN:c.40-42delAGA (p.Arg14del; ClinVar ID VCV000044580.20) is a founder variant identified in the Netherlands, particularly in the northern part of the country (estimated allele frequency 0.07%–0.22%) [5]. This variant has been associated with both DCM and ACM and is characterized by an increased risk of developing malignant ventricular arrhythmias (VAs) and end-stage HF, with age-dependent incomplete penetrance (> 50% of carriers show symptoms above the age of 60 years) [6,7,8].

Given the high penetrance of PLN:c.40-42delAGA, carriers at risk or subjects from suspected families are screened by a cardiologist for early disease detection. However, the under-explored mechanisms behind the incomplete penetrance of this variant create challenges in determining risk for carriers and thus hamper the decision-making process about therapies. To understand the incomplete penetrance and support early disease detection, researchers have been looking for environmental or genetic factors that modify the risk of disease manifestation. Typically, this research collects and analyzes phenotypical, environmental, and demographic information for carriers [8,9,10,11,12]. One very recent study by Verstraelen et al. [13] used abnormalities observed in carriers to calculate the risk for malignant VA and guide the decision of when to start or withhold preventive therapies, including intracardiac defibrillators. Moreover, the search for risk predictors has been extended to investigate the cumulative impact of common genetic variants previously associated with cardiac-related traits in the form of polygenic scores (PGSs), an approach that has generated preliminary evidence for the existence of a polygenic component to both HCM and DCM [14,15,16].

Previous studies searching for modifiers used approaches that had two main features: (i) they only studied carriers and did not include information from the general population and (ii) they focused on factors that increase the risk of developing cardiomyopathy symptoms (i.e., differences between symptomatic and healthy carriers), an approach that could miss potential protective factors that prevent carriers from developing symptoms (i.e., differences in healthy carriers as compared to symptomatic carriers and the general population). An example scenario of the potential gain of including the general population in these designs comes from studies in sickle cell and thalassemia patients, where individuals with milder symptoms showed a higher frequency of common genetic variants in the BCL11A gene, which had previously been associated with increased production of fetal hemoglobin, as compared to both patients with severe symptoms and to the general population [17, 18]. It is only recently that large-scale general population biobanks have enabled unbiased identification of the PLN:c.40-42delAGA variant, providing more power to detect potential protective factors compared to a patient cohort. To our knowledge, no such analysis has been conducted for the detection of potential protective factors to explain the incomplete penetrance of PLN:c.40-42delAGA in cardiomyopathy.

In this study, we aimed to fill this gap by performing association analyses between the absence of cardiomyopathy symptoms and genetic and phenotypic information available for 36,339 volunteers of the Lifelines cohort, a population biobank from the Northern Netherlands [19]. In the absence of the imaging-based measurements typically used for the diagnosis of cardiomyopathy, such as chest X-ray and echocardiogram, we defined the symptomatic group using a combination of electrocardiographic (ECG) abnormalities suggestive of PLN-associated cardiomyopathy and self-reported heart disease symptoms at baseline assessment or during the 5-year follow-up visit (“Methods”). This identified 74 carriers, and we classified the 48 carriers who did not report or show signs or symptoms as asymptomatic. We then compared the distribution of quantitative phenotypes and genetic variation in asymptomatic carriers to the general population (asymptomatic non-carriers) (Fig. 1). We found that, in asymptomatic carriers, QRS duration and heart rate (HR) variability were significantly lower, while HR was increased. We then replicated the lower QRS duration that we observed in asymptomatic carriers in two additional cohorts, providing the first robust evidence for factors that associate with absence of symptoms in carriers. We also observed higher correlation between the PGS of QRS (PGSQRS) and QRS in the symptomatic carriers (meaning a higher response for the polygenic effect in symptomatic carriers). This finding still requires confirmation from replication cohorts with enough genetic information, but it aligns with the emerging evidence of the polygenic nature of many Mendelian disorders [14, 15, 20, 21].

Fig. 1
figure 1

Graphic summary. a Distribution of the PLN:c.40_42delAGA carriers and non-carriers and ratio of the symptomatic and asymptomatic group in the subset of the Lifelines cohort. b Definition of symptomatic status and relative proportions of people showing each clinical outcome or abnormal ECG signs. c Analysis scheme used in this study. Briefly, we screen for potential protective factors by comparing them between asymptomatic non-carriers and asymptomatic carriers and then confirm them by comparing with the other groups (see “Methods”). Furthermore, we assessed whether there is an interaction effect between the symptoms and carrier status and the predictability of PGS for the phenotype that shows potential protective effects. d Potential protective factors analyzed in this study. GWAS, genome-wide association study; HA, heart attack; HF, heart failure; Infarct, infarction; AF, atrial fibrillation; TWAS, transcriptome-wide association study; VE, ventricular ectopic beats; VT, ventricular tachycardia

Results

Distribution of the PLN:c.40_42delAGA Variant and Clinical Profile of Lifelines Samples

Genotyping data, including genotyping information for the PLN:c.40_42delAGA variant, was available for 36,339 Lifelines participants with an average age of 39.9 years (range 4.0–90.0 years, 58.5% males and 41.5% females). Among them, we identified 74 carriers of the deletion (0.2%; 43 females and 31 males), with ages ranging from 8.0 to 68.0 years and an average age of 40.9 years (Supplementary Fig. 1a). The PLN:c.40_42 deletion allele showed a minor allele frequency (MAF) of 0.1%, and none of the participants was homozygous carriers of the variant. Genotype intensities were screened and indicated lack of spurious genotype calls (Supplementary Fig. 2). We then classified all volunteers into four groups according to the presence or absence of self-reported symptoms or ECG abnormalities suggestive of cardiomyopathy (see “Methods”) in the baseline and follow-up visits and to the presence or absence of the PLN:c.40_42delAGA variant. This identified 34,201 asymptomatic non-carriers, 2064 symptomatic non-carriers, 48 asymptomatic carriers, and 26 symptomatic carriers. A description of the self-reported symptoms, diseases, and measured ECG signs that we used for symptomatic definition is shown in the graphic summary (Fig. 1b, Methods).

Using this definition, we estimated an odds ratio (OR) of 8.98 (CI = 5.33–14.79, two-sided p value = 2.18 × 10–14) for the PLN:c.40_42delAGA variant in conferring risk of presenting any of the signs and symptoms and an OR of 11.2 for HF (CI = 5.31–21.64, two-sided p value = 2.05 × 10–8), strongly supporting the previously reported pathogenicity of this variant. We further noticed that asymptomatic carriers were significantly younger than asymptomatic non-carriers (average ages 34.6 and 39.1, respectively, two-sided p value = 0.037), even when restricting the analyses to participants older than 40.0 years (two-sided p value = 0.001, Supplementary Fig. 1b and c), and we therefore used age as a covariate in all our analyses.

Considering that the Lifelines cohort has a family design, in our primary analyses, we also accounted for familial relationships, either using the kinship matrix or removing relatives because they can introduce spurious correlation at both phenotypic and genetic level (“Methods”). While 34 of the 74 carriers did not have other relatives carrying the variant, the remaining 40 carriers belong to 17 families with two or three carriers (Supplementary Fig. 3).

ECG Measurements Are Significantly Different in Asymptomatic Carriers Compared to Population Controls

We analyzed the 38 cardiac, metabolic, and anthropometric quantitative traits that were available for at least 25 of the asymptomatic carriers (Supplementary Table 1). Among these traits, four were significantly different between asymptomatic carriers and asymptomatic non-carriers (multiple testing–adjusted false discovery rate q value ≤ 0.05; see “Methods”) after correcting for age, gender, and kinship. Specifically, asymptomatic carriers showed on average a significantly faster HR of 6.41 beats/min (95% CI = 3.09–9.72, q value = 2.61 × 10–4), a shorter QRS duration of − 5.7 ms (95% CI =  − 9.22 to − 2.24, q value = 1.04 × 10–3) and reduced HR variability measured by both the natural logarithm of the root mean square of successive differences (lnRMSSDc: beta =  − 0.182, standard error (SE) = 0.090, q value = 0.0435) and by the natural logarithm of the standard deviation of the normal-to-normal intervals (lnSDNN: beta =  − 0.216, SE = 0.094, q value = 0.0242) (Supplementary Table 2, Fig. 2). These results were confirmed when we repeated the analyses for these four groups using phenotype values measured in the same individuals during the first follow-up visit (visit 2), which took place approximately 5 years after the first visit, and observed consistent effects (Fig. 2). The results were also consistent when comparing asymptomatic carriers to a subset of age- and sex-matched controls (Supplementary Table 2F).

Fig. 2
figure 2

Forest plot of effect sizes for significant quantitative phenotypes. Squares represent the effect size (beta). Lines represent standard error. Blue squares and the numbers in the columns correspond to analysis results for the baseline visit (visit 1). Red squares correspond to the second visit approximately 5 years later (visit 2). Phenotypes are separated by the scale in panels a and b. N asymptomatic carriers (or non-carriers) refer to participants with non-null information at visit 1; p val, p value for the independent association; q val, false discovery rate for conjoined analysis of all the traits that takes into account 1000 permutations for each of the 38 traits; lnRMSSD, natural logarithm of the root mean square of successive differences; lnRMSSDc, RMSSD corrected by heart rate; lnSSDN, natural logarithm of the normal-to-normal intervals

To exclude the possibility that these results were being driven by a direct impact of the PLN:c.40_42delAGA variant on the phenotype, we compared the distribution of these significant phenotypes among all four groups in our analysis scheme (“Methods”). Here, we observed a consistent association between high HR and low QRS duration and absence of symptoms when comparing asymptomatic carriers to symptomatic non-carriers (Fig. 3, Supplementary Fig. 4): asymptomatic carriers showed significantly higher HR that was faster by 5.58 beats/min (95% CI = 1.63–9.54, two-sided p value = 0.0057) and a QRS duration that was − 7.24 ms lower (95% CI =  − 11.92 to 2.56, two-sided p value = 0.0024). Interestingly, we also found a higher QRS duration in symptomatic carriers (8.52 ms higher, 95% CI = 4.11–12.93, two-sided p value = 1.5 × 10–4) compared to asymptomatic non-carriers, an opposing effect to that seen when comparing the asymptomatic carriers with this group. In contrast, HR variability features were not significantly different in asymptomatic carriers compared to other groups (Supplementary Fig. 5).

Fig. 3
figure 3

Distribution of HR and QRS with carrier status and symptom manifestation. For each phenotype, the distribution within groups is represented in two forms: the truncated violin plots in gray display the density, while the boxplots display the median (center horizontal line), the first and third quartiles (box hinges) and the values up to 1.5 interquartile ranges (IQR) away from the respective hinges (upper and lower whiskers). Values above or below these extremes are represented as individual points. a HR. b QRS duration. The different distribution for asymptomatic carriers compared to all the other groups is visually apparent for both these traits but is more variable in the HR variability-related traits (Supplementary Fig. 3). P values correspond to the regression coefficient of the adjusted trait (Y-axis) in the model described in Eq. 1, comparing the groups indicated by the ticks; only significant p values are shown. The distribution of unadjusted traits can be observed in Supplementary Fig. 3. Asympt, asymptomatic; sympt, symptomatic

Association of Shorter QRS Duration with Absence of Symptoms in Carriers Replicates in an Independent Population Cohort and a Patient Registry

We sought to replicate the associations we observed for HR and QRS in two additional cohorts: a second subset of the Lifelines cohort and a cohort of PLN:c.40_42delAGA carriers from the ACM/PLN patient registry [22], which also includes asymptomatic carriers.

The Lifelines subset used for replication comprised 21,771 volunteers (62.7% females) with an average age of 41.8y (range 8.0–93.0 years), of which 50 carry the PLN:c.40_42delAGA variant. Following the criteria for group assignment from the primary analysis (Fig. 1), this cohort was divided into 32 asymptomatic and 18 symptomatic carriers and 20,447 asymptomatic and 1274 symptomatic non-carriers. When comparing the distribution of QRS and HR between the two Lifelines subsets, we observed the same direction of effect and comparable magnitude (Cochran’s Q test two-sided p value for differences in effect size = 0.134) for QRS duration only (decrease =  − 3.87 ms, 95% CI =  − 7.79 to 0.056, one-sided p value = 0.028, Fig. 4, Supplementary Table 3A). Since some of the volunteers in this second Lifelines subset were relatives of volunteers in our discovery set, we carried out a joint analysis, adjusting for batch-effect and relatedness, which confirmed the legitimacy of this association (pdiscovery = 1 × 10–3 vs pjoint = 2 × 10–4, Supplementary Table 3A).

Fig. 4
figure 4

Comparison of the effect sizes for HR and QRS duration in the two replication datasets and the discovery dataset. The figure shows the effect sizes from the Lifelines discovery cohort and the two replication cohorts: an additional subset of Lifelines participants and the ACM/PLN registry. Note that the effect in the Lifelines cohorts reflects the difference between the asymptomatic carriers and asymptomatic non-carriers, whereas replication in the ACM/PLN registry compares symptomatic carriers to asymptomatic carriers

The observation of low QRS duration was also confirmed in the ACM/PLN patient registry, which comprises 592 carriers of the PLN:c.40_42delAGA variant with an average age of 41 years (range 2.0–80.0 years) (Supplementary Fig. 6). Among the 592 carriers, 423 were classified as symptomatic and 169 as asymptomatic. We analyzed the heart-related phenotypes collected during each individual’s first visit using the same model used in the discovery cohort (“Methods”) except for the genetic principal component (PC) confounders, which were not available for this study because no genome-wide genotyping data was available. We also evaluated if the observed associations showed consistent effect when the symptomatic are defined using the symptoms definitions of ACM/PLN registry (Supplementary Table 3B). Here, we observed consistent replication for QRS durations (Fig. 4, Supplementary Fig. 6, Supplementary Table 3C). Asymptomatic carriers in the ACM/PLN registry showed significantly decreased QRS duration (− 6.91 ms, 95% CI =  − 3.69 to − 10.12, two-sided p value = 2.0 × 10–4) compared to symptomatic carriers, and this effect was consistent even after stratifying the cohort into younger (< 40.0 years) and older (≥ 40.0 years) carriers (Supplementary Fig. 6, Supplementary Table 3C).

Two Common Genetic Variants May Be Associated with Absence of Symptoms in Carriers

We then searched for genetic factors that could explain the incomplete penetrance of PLN:c.40_42delAGA. We first analyzed the genetic region near the variant to look for other variants that could explain symptoms and signs of disease manifestation in carriers. We observed that the majority of carriers (N = 70) share a long identical haplotype of common and low-frequency variants (MAF ≥ 0.01) spanning 1.38 Mb (from 604 to − 766 Kb from the PLN gene), which is consistent with previous observations [8], while the remaining four samples carried distinct haplotypes varying from 1 to 330 single-nucleotide polymorphisms (SNPs). We then took a closer look by including rare variants with minor allele counts ≥ 4 in the carrier chromosomes, which allowed us to test more SNPs differentiating haplotype groups (n ≥ 3). However, neither these SNPs nor those at the boundaries of the long and shared haplotypes were significantly associated with manifestation of symptoms (Supplementary Table 4 and Supplementary Fig. 7).

We next expanded our search for potentially associated genetic factors to the rest of the genome by analyzing the single additive effect of 16,478,162 common variants in autosomes (see “Methods”). For this analysis, we used only unrelated samples (39 asymptomatic carriers and 20,103 asymptomatic non-carriers). Here, we observed a long stretch of hundreds of variants located on chromosome 6 spanning a region of ~ 9.6 Mb that show significant associations at a classical genome-wide threshold of p value ≤ 5 × 10–8 (Supplementary Fig. 8, Supplementary Table 5). All these associations, however, could be attributed to common variants tagging the haplotype carrying the PLN:c.40_42delAGA, and thus, this association was not related to presence/absence of symptoms (Supplementary Text). We also found two associations independent of PLN:c.40_42delAGA (Supplementary Fig. 9) that were close to the classical genome-wide significance threshold of 5 × 10–8: one in chromosome 3 with SNP rs6768326 (allele G beta = 1.52 corresponding to an OR = 4.57, SE = 0.29, two-sided p value = 9.26 × 10–8, Fig. 5a) located near (~ 180 Kb) the gene RAP2B and one in chromosome 16 with SNP rs112525682 (allele A beta = 1.42 corresponding to an OR = 4.14, SE = 0.28, two-sided p value = 2.64 × 10–7, Fig. 5b) located in the gene GSE1. However, we were unable to test for replication of these signals in the additional Lifelines subset as none of these SNPs was directly available due to the different genotyping platforms (“Methods). For the signal in chromosome 3, we could assess a proxy SNP, rs6791558 (r2 = 0.85 rs6768326 in 1000 Genomes Europeans) but found no significant association with asymptomatic carriers in the replication cohort (beta = 0.27, SE = 0.40, two-sided p-value = 0.49, Supplementary Table 5C). No proxy SNP was available for the signal in chromosome 16, so this SNP could not be tested.

Fig. 5
figure 5

Genotype distribution of SNPs in chromosomes 3 and 6 according to carrier status and symptom manifestation. Frequency bar plots depicting the genotypes for a SNP rs6768326 in chromosome 3 and b SNP rs112525682 in chromosome 16. Stacked bars are used to represent the relative counts of individuals with 2 (red), 1 (green), or 0 (blue) alternative alleles. Discrete genotypes were derived by rounding the allele dosage to the nearest integer value or setting it to missing if their value was more than 0.3 away. p values are derived from linear models adjusted for age, sex, and the first PC

Rare Variant Loads in Cardiac-Related Genes Did Not Associate with Presence or Absence of Symptoms in PLN:c.40_42delAGA Carriers in Our Dataset

Finally, we searched for any potential aggregated effect of rare variants at 80 genomic regions previously associated with cardiomyopathies or cardiac phenotypes and for the regions surrounding our two GWAS signals on chromosome 3 and chromosome 16 (see “Methods”). Here, we observed a significant association for variants in the gene SLC35F1 when using the low-frequency threshold (MAF ≤ 0.05, two-sided p value = 4.3 × 10–22) and the rare variant threshold (MAF ≤ 0.01, two-sided p value = 2.8 × 10–26) for variant inclusion (Supplementary Table 6). Subregion analyses in the same gene revealed significant associations with asymptomatic carriers in regions containing exons 1, 2, 4, and 8. These results are interesting because the common variants near SLC35F1 are already known to associate with QRS duration [23], QT interval [24], and resting HR [25]. However, this gene is also in close proximity (< 250 kb) to the PLN gene and spurious association may thus arise due to different combinations of variants in the core haplotype shared by carriers compared to non-carriers. Due to the limited sample size of the symptomatic carrier group, we could not compare symptomatic carriers with asymptomatic carriers to show a different load of rare variants between groups. However, when comparing the rare variant burden in the same sub-regions between symptomatic carriers and asymptomatic non-carriers, these were also significantly different and had the same effect direction (Supplementary Table 6). We therefore concluded that these signals at SLC35F1 are the result of a confounding effect driven by the specific genetic composition of the founder haplotype carrying the PLN:c.40_42delAGA variant and cannot be attributed to the presence or absence of symptoms.

Polygenic Burden for Most Cardiac Phenotypes Does Not Differ Between Asymptomatic and Symptomatic Carriers of the PLN:c.40_42delAGA Variant

We next evaluated potential associations between common genetic variants known to associate with cardiac traits and absence of symptoms by calculating PGSs for 89 unique traits and compared PGS distributions between asymptomatic carriers and asymptomatic non-carriers (“Methods” and Supplementary Table 7A). Several PGSs were significantly different (false discovery rate q value < 0.05, see “Methods”) in asymptomatic carriers (Supplementary Fig. 10 and Supplementary Table 7B), but no PGSs showed significant differences when comparing symptomatic and asymptomatic carriers (Supplementary Fig. 11). When we repeated the associations using PGSs calculated without the variants located in chromosome 6, we only observed a lower PGS for left ventricular end systolic volume corrected by body surface area (LVESVI) in asymptomatic carriers (PGSLVESVI, beta =  − 0.30, SE = 0.15, q = 0.035) when compared to the asymptomatic non-carriers, which indicates that the other differences can be attributed to common variations located near the PLN:c.40_42delAGA variant (Supplementary Table 7C). PGSLVESVI was also lower, albeit non-significant, when comparing symptomatic carriers to the asymptomatic non-carriers; thus, the different polygenic burden observed is likely unrelated to the symptoms.

Polygenic Score of QRS Duration Better Predicts QRS Duration in Symptomatic Carriers

To further investigate the effect of PGSs for cardiac measurements, we modeled their prediction utility on their respective measurements and potential interactions with age, gender, and participant group assignment. Here, we found that the PGSQRS has a stronger impact for predicting QRS in symptomatic carriers (two-sided p value = 4.27 × 10–11, Fig. 6b, Supplementary Table 8) compared to the other groups, and this interaction remained significant when applying inverse-rank normalization to the trait (two-sided p value = 1.53 × 10–5) and when calculating PGSQRS without chromosome 6, thus excluding variants within or near PLN (two-sided p value = 2.30 × 10–5, Supplementary Table 8). Furthermore, PGSQRS was not associated with the severity of symptoms and signs observed in symptomatic carriers (Supplementary Fig. 12). We also found a significant interaction for PGSPR, but this was not robust to normalization of the measurement.

Fig. 6
figure 6

Interaction of PGSQRS with symptomatic carrier status. a Schematic overview of the interaction. PLN:c.40_42delAGA carriers show an increased risk of developing cardiac symptoms. In turn, symptomatic carriers show a higher correlation between PGSQRS and QRS compared to asymptomatic non-carriers. b Scatter plot and regression line for the partial correlation between PGSQRS and QRS for each group studied (colors according to the legend)

No Evidence for Gene Expression Differences Predicted by Common Variants

Finally, we searched for potential gene expression differences that could explain the presence/absence of symptoms in PLN:c.40_42delAGA carriers by performing a transcriptome-wide association study (TWAS). Briefly, we used PrediXcan [26] to impute gene expression levels for up to 8610 genes in the tissues artery aorta, artery tibial, heart atrial appendage, and heart left ventricle and compared the imputed gene expression between asymptomatic carriers and asymptomatic non-carriers and between symptomatic carriers and symptomatic non-carriers (“Methods”). Expression of the PLN gene was also predicted and evaluated. Among all the genes tested, four (MARVELD2, LAMA4, FAM184A, and NCBP1) were significantly different in one of the comparisons and one (NT5DC1) was significantly different in both comparisons. We closely examined the TWAS results in the most relevant tissue, heart left ventricle, but found that these association signals were similar in size and effect direction in both comparisons (Supplementary Fig. 13, Supplementary Table 9); therefore, differences in these genes cannot be related to presence/absence of symptoms in PLN:c.40_42delAGA variant carriers.

Discussion

In this study, we used the Lifelines Dutch general population biobank to search for modifiers of the PLN:c.40_42delAGA variant, a Dutch founder variant known to increase a carrier’s risk of developing malignant VA and HF. We observed that QRS duration was significantly shorter in PLN:c.40_42delAGA carriers who showed no signs or symptoms of the disease as compared to other carriers and non-carriers. We then replicated this observation in another subset of the same population and in a patient registry. QRS duration reflects ventricular activation and depolarization [27], and QRS widening has been established as an intrinsic element of HF that comes with worse outcomes. In line with this, deformations of the QRS complex are frequently found in DCM patients [28], and an increased QRS (≥ 120 ms) duration is frequently observed in severe cardiac outcome (up to 47% of patients with HF) [29]. Increased QRS duration was also previously observed in symptomatic PLN:c.40_42delAGA carriers compared to asymptomatic carriers [13], and thus, until now, increased QRS duration was only considered a risk factor. Here, however, we show that QRS duration was significantly lower in asymptomatic carriers even when compared to the general population, which suggests that short QRS may offset or postpone disease. On the other hand, QRS widening is a risk factor, as explained by its established relationship with myocardial scarring and remodeling. Intriguingly, the lower QRS duration in asymptomatic carriers and the higher QRS in symptomatic carriers could not be explained by differences in genetic predisposition conferred by common variants that were previously associated to QRS duration (PGSQRS).

We observe that the effect of common genetic variation on QRS duration is more pronounced in symptomatic carriers, with the correlation between PGSQRS and measured QRS duration significantly stronger in symptomatic carriers of the PLN:c.40_42delAGA variant. This is more complex than just a genetic interaction because this effect was not seen in asymptomatic PLN:c.40_42delAGA carriers. This effect is also unlikely to be caused by HF by itself because people with the wild-type PLN allele (non-carriers) and HF also show normal sensitivity to the PGSQRS. That said, we have not identified a mechanistic explanation and these results require replication in an independent cohort. However, replication will be challenging because the PLN:c.40_42delAGA variant is less frequent in other populations. According to gnomAD, the variant was identified in only 2 out of 25,328 North-western European samples sequenced and was not found in any of the 39,118 samples of other European or non-European origin.

For now, we can only speculate as to why we observe this complex interaction effect. In this manuscript, we explored the modifiers that make the PLN:c.40_42delAGA variant pathogenic in some people but not in others. However, it might also be that the genetic or environmental modifiers that influence the penetrance of the PLN variant also increase the impact of PGSQRS. This is not unlikely as the main mechanism disturbed by the PLN:c.40_42delAGA variant is calcium homeostasis, and we can infer that part of the pathogenicity might be explained by disturbance in this system. Several genes implicated by the GWAS on QRS are calcium-handling proteins [30], and it has been shown that calcium levels can influence ventricular repolarization, although the impact on QRS duration is weak [31].

Further studies are needed to investigate this hypothesis. For example, as larger GWAS on QRS are carried out and more associated genetic variants are discovered, the increased power and resolution of PGSQRS could help clarify the signal in both groups of carriers by pinpointing a subset(s) of genes involved in specific pathways through pathway-based PGS [32, 33]. Of note, full summary statistics will be required for the PGS construction; if we would only take the leading variants of the QRS GWAS the interaction is not observed (p > 0.05).

Studies aiming to investigate differences in non-genetic factors between symptomatic and asymptomatic carriers will be more difficult to establish. Although genetics explains less than half of the intra-individual variation in QRS [34], the environmental factors that contribute to differences in QRS duration in the general population are largely unknown. We can speculate that physical activity might be one of these, as athletes and those engaging in regular physical training have relevant cardiac structural and electrical changes [35]. Nonetheless, in our Lifelines population cohort, we did not find significant associations between QRS duration and exercise or between QRS duration and participants living in less urban locations (Supplementary Text, Supplementary Table 10A).

We cannot rule out that the association of QRS duration that we observe is merely a proxy for other potentially protective factors that were not measured in Lifelines participants. For example, a recent study carried out on the ACM/PLN patient registry identified certain ECG- or MRI-derived features, such as left ventricular ejection fraction (LVEF) and 24-h premature ventricular contraction (PVC) count, as stronger risk factors for developing ventricular arrhythmia than QRS duration [13]. Given the existing correlation between QRS and LVEF and the 24-h PVC count (Supplementary Fig. 14, Supplementary Text), we cannot exclude that our signal based on QRS actually captures the effect of unmeasured LVEF and 24-h PVC count in our cohort. However, genetics and experimental model studies suggest that QRS duration could have a non-null causal role in symptom manifestation. Previous studies have shown computational evidence and suggestive experimental evidence in other organisms. In Zebrafish, the plna R14del variant causes beat-to-beat variations in cardiac output in the absence of cardiac remodeling, suggesting that the cardiac contractile dysfunction is not caused by but rather causal for cardiac remodeling [36]. Causal relationships between anomalies in the QRS complex, specifically in the Q-R upslope, and increased risk of DCM have been found through the Mendelian randomization approach using genetic variants of ECG signatures [37].

Given the replicated association between shorter QRS duration and absence of symptoms in carriers, we further explored its potential clinical predictive value in a recently proposed predictive model [13], by replacing PVC and LVEF with QRS to simulate situations of absence of MRI measurements, whereas Verstraelen et al. [13] found QRS duration to have limited predictive ability for VA compared to other cardiac measurements, we found that its effect still significantly improved the baseline prediction model in the ACM/PLN registry (Supplementary Fig. 15, Supplementary Text). Therefore, we suggest that QRS duration could be further explored for its clinical value in predicting other severe events and interim outcomes for PLN:c.40_42delAGA carriers.

We increased the statistical power to identify a protective effect by comparing asymptomatic non-carriers to a large set of asymptomatic non-carriers from the general population. This approach has great potential for use in other population-based cohorts, especially for more common monogenic traits with incomplete penetrance. For example, mutations in BRCA genes show an approximate prevalence of carriers without breast cancer symptoms ranging from 0.05 to 0.30 [38]. In this scenario, a biobank with a sample size comparable to our study (n = 40,000) would have 80% power to detect ORs as small as 1.5 (Supplementary Fig. 16C).

Our study comes with limitations. While we used a large population cohort to search for genetic variants that could act as potential genetic modifiers of PLN:c.40_42delAGA, our study was still underpowered for finding associations at the genome-wide level. For example, our sample size provides 80% power for detection of ORs > 4.5 at variants with MAF > 0.15 or of ORs > 5 at variants with MAF > 0.1, with p < 5 × 10–8 (Supplementary Fig. 16). In addition, as we had no power to detect similar or smaller effects at rarer variants, we implemented additional strategies to investigate the cumulative impact of rare variants by aggregating their effects, but we still found no significant signal. As our study did not have whole-genome or exome sequencing data available, our rare variants analyses are underpowered since they are limited to those rare and very rare variants that were genotyped directly or well imputed. We were also limited in investigating our findings in the replication datasets because of lack of genotyping information; no genotyping information was available for the ACM/PLN registry, and the information for the second Lifelines subset came from a different genotyping array and did not include imputed variants. Therefore, we could not validate one of the two suggestively significant findings in GWAS and were unable to replicate the PGS interaction in symptomatic carriers. Additionally, for the general population in the Lifelines cohort, we did not have specialized cardiac phenotypes such as cardiac MRI measurements that could be better indicators of risk or protection. Finally, we recognize that it is possible for both carriers and non-carriers to be misclassified as asymptomatic because, for participants of young age, we cannot establish with certainty who will remain asymptomatic over the long-term. To mitigate this, we used all available follow-up data (5-year follow-up data in Lifelines, including death records, and an average of 9.5 years follow-up in the ACM/PLN registry) to classify individuals in the two groups. Notably, our results hold true when analyzing only the 22 individuals aged > 40 years in Lifelines (Supplementary Table 2E), and replication was consistent in the ACM/PLN registry with the 248 individuals aged > 40 years (Supplementary Table 3C), suggesting that our findings are robust. Additionally, the association of lower QRS duration with asymptomatic carriers was confirmed in the ACM registry when repeating the analyses using a more comprehensive definition of symptomatic groups (Supplementary Table 3C). For this analysis, no individuals who were classified as “asymptomatic” later exhibited symptoms.

In conclusion, we used a large population cohort to identify phenotypic and genetic modifiers of PLN:c.40_42delAGA, a variant that leads to an inherited genetic disorder with incomplete penetrance. By taking advantage of the unbiased identification and inclusion of carriers showing no symptoms, we identified and replicated an association between shorter QRS duration and a lack of disease signs and symptoms. We envision that this approach can be applied to other highly penetrant disorders in other large population cohorts, thereby capitalizing on the hundreds of thousands or millions of participants so that rare disease-causing variants reach a substantial number of carriers with variable symptoms.

Methods

Cohort Description

Lifelines is a multi-disciplinary population-based cohort study with a three-generation design that examines the health and health-related behaviors of 167,729 people living in the north of the Netherlands. Lifelines employs a broad range of investigative procedures to assess the biomedical, socio-demographic, behavioral, physical, and psychological factors that contribute to the health, disease, and quantitative traits of the general population, with a special focus on multi-morbidity and genetics [19]. Participants visited Lifelines research sites for a physical examination, including blood pressure and ECG measurements and blood and urine collection to measure several disease biomarkers. Data from one follow-up visit 5 years after the baseline recruitment visit is now available for most participants. The Lifelines study was approved by the medical ethical committee from the University Medical Center Groningen (METc number: 2007/152). An informed consent was collected for all participants.

Genotype Data

A subset of 38,030 Lifelines participants were genotyped within the UMCG Genotyping Lifelines Initiative (UGLI) using the Infinium Global Screening array® (GSA) Multiethnic Diseases version, which includes around 700 K variants. Samples were processed at the Rotterdam Genotyping Center and the Department of Genetics of the UMCG, following manufacturer instructions. Standard quality control procedures were used to flag and remove low-quality samples and genetic markers, and the genetic content of the final quality-controlled dataset was augmented through genotype imputation carried out using the Haplotype Reference Consortium panel v1.172, as described previously [39]. Quality-controlled, genotyped and imputed genotype information was obtained for 36,339 participants and 39,131,578 genetic variants on autosomes, which were used for genetic analyses. Based on a PC analysis that analyzed all the 36,339 Lifelines participants together with the 1000 Genomes samples [39], only 34 participants were of non-European origin and were not used for the analyses.

Deletion/insertion of rs397516784-AGA (PLN:c.40_42delAGA) was genotyped directly. Genotype intensities were screened and indicated a lack of spurious genotype calls (Supplementary Fig. 2). Quality control parameters indicated that the variant was genotyped with high quality (genotyping call rate = 99.96%, Hardy–Weinberg equilibrium p value = 1). Genotypes at this variant were used to classify samples into carriers and non-carriers according to the presence or absence of the deletion. All identified carriers were of European ancestry.

Definition of Asymptomatic Participants in the Lifelines Cohort

We defined a participant to be asymptomatic for cardiomyopathic disease if they did not report any cardiac symptoms in the self-reported questionnaire at baseline or follow-up visit and the cardiologist did not report any sign of the disease during the ECG measurements at these time points, which are spaced a maximum of 5 years apart. We considered the following to be symptoms of disease: HF, heart attack, infarct, and being treated for arrhythmia. We defined signs of disease as any of the following indicators noted by the ECG software and reviewed by the cardiologists: low voltage or micro-voltages, negative T, ventricular tachycardia, ventricular extrasystole, or atrial fibrillation. A participant presenting one or more of these symptoms or signs was classified as symptomatic. Additionally, for the carriers, we looked for the cause of death report (if death had occurred) to confirm if death was related to cardiac symptoms.

Analysis Scheme

To identify potentially protective factors, we analyzed phenotype and genetic variables using a 2-step procedure to compare different groups of participants according to carrier status and symptom manifestation and followed up significant results with more in-depth analyses (step 3) and replication (step 4) (Fig. 1). Specifically, in step 1, we screened all available variables by comparing asymptomatic carriers with asymptomatic non-carriers. In step 2, variables showing significant differences (q < 0.05) were further investigated by comparing their distribution among the other groups. Specifically, we compared these variables between symptomatic carriers and asymptomatic non-carriers, where we expect to find either no difference or a difference in the opposite direction to that observed in step 1. At the same time, we compared these variables between symptomatic carriers and asymptomatic carriers, where we expect to find a difference in the same direction as in step 1. For these comparisons, we used the equations described in the sections below by coding the outcome variable accordingly.

We used permutations to derive a proper significance threshold for step 1 that accounts for the high correlation between variables and the multiple tests performed. Specifically, we derived empirical q values from the p values of the joint results for all variables of 1000 permutations for each variable. In each permutation, the variable’s value was shuffled independently of the genetically related variables (kinship, PCs and carrier status), which remained linked. This was applied to analyses that investigated quantitative phenotypes and PGSs. For GWAS and rare variant burden analyses, we used the standard thresholds of p < 5 × 10–8 and 2.5 × 10–6 [40], respectively.

Finally, to test the robustness of our results, we repeated the analyses with the normalized phenotypes using the Rank.norm() function of the RNOmni v1.0 R package [41], implemented in R v4.0.3, and with a case–control ratio of 1:4 in age- and sex-matched asymptomatic carriers and asymptomatic non-carriers.

Quantitative Phenotypes: Definition and Analysis

We considered 64 quantitative phenotypes representing anthropometric measures and cardiometabolic functions as potentially protective factors. These phenotypes were measured as previously described [19]. Outliers and unrealistic values were removed, as described in our previous work [39], and further extreme or unrealistic values (HR > 200, QRS > 200, QTC > 600, PQ > 320) were removed for cardiac traits [42]. We also included four quantitative traits associated with HR variability that were derived from the ECG and calculated in the Lifelines cohort in a previous work [43]. The full description of the quantitative traits investigated included, along with cohort descriptive statistics, can be found in Supplementary Table 1. We decided not to use phenotypes with fewer than 25 participants in the asymptomatic carriers group. After applying all the filters described above, 38 phenotypes were considered for further analyses.

To evaluate the association with absence of symptoms, we compared the distribution of each variable between asymptomatic carriers and asymptomatic non-carriers. We used the lmekin() function of the R package coxme v2.2 [44] to fit a linear mixed model that predicts the phenotype under study based on the carrier status (outcome) and adjusted the model by age, age squared, the first four genetic PCs and sex, while accounting for familial relationships. Specifically, we used the following model:

$$\mathrm{Y}\sim \mathrm{age}+{\mathrm{age}}^{2}+\mathrm{sex}+\mathrm{PC}1+\mathrm{PC}2+\mathrm{PC}3+\mathrm{PC}4+\mathrm{outcome}+\sim (1|\mathrm{ID})$$
(1)

where ~ (1|ID) is a random effect equivalent to twice the kinship matrix and outcome is a variable coded as 1 for asymptomatic carriers and 0 for asymptomatic non-carriers. Genetic PCs were included to rule out the potential confounding effects of different geographic distribution of carriers and non-carriers.

PGS Definition and Analysis

We downloaded summary statistics of GWAS on quantitative phenotypes and diseases related to cardiovascular function in the public repositories (Supplementary Table 7). When more than one study or repository was found for the same trait, we kept the study with the largest sample size and the highest number of available SNPs. In total, we found summary statistics for 89 traits (55 cardiac and circulatory system diseases, 10 cardiac magnetic resonance parameters, 15 echocardiographic parameters, 4 electrocardiographic parameters, 2 HR variability parameters and 3 anthropometric measures as control).

The selected summary statistics were used to calculate PGSs using the PRS-CS software, which uses the full summary statistics in a Bayesian regression framework by placing a continuous shrinkage prior on the effect size of SNPs while adjusting by the linkage disequilibrium from a reference panel [45]. This method has been shown to be more efficient, in multiple settings, than the classical clumping and thresholding method [46]. We used the default parameters for all our calculations.

The resulting PGSs were used to identify the genetic signatures of the phenotypes they represent and to assess if these signatures are associated with absence of symptoms in PLN:c.40_42delAGA carriers. We compared the distribution of PGSs between asymptomatic carriers and asymptomatic non-carriers using the following linear model:

$$\mathrm{PGSs}\sim \mathrm{age}+\mathrm{sex}+\mathrm{outcome}$$
(2)

where outcome is a variable coded as 1 for asymptomatic carriers and 0 for asymptomatic non-carriers.

PGSs in this model were used as continuous percentiles of the standardized PGS. We also analyzed the effect of having an extreme PGS value by recoding the PGS variable as binary in the model. Specifically, we assessed the impact of having very high PGS by recoding the variable to 1 when PGS values were above the 80th percentile and to 0 otherwise. Likewise, we assessed the impact of low PGS values by recoding the variable to 1 when PGS values were below the 20th percentile and to 0 otherwise.

Genetic Association Analyses

To evaluate the genomic region nearby the PLN:c.40_42delAGA variant, we selected the imputed genotypes of all genetic variants located within ± 2 Mb, with an imputation information score higher than 0.8 and with no Mendelian errors. We then phased the selected variants in the entire Lifelines cohort using SHAPEITv2.904 [47] with the options –duohmm and –output-max. Focusing only on the 74 carriers and the phased chromosome carrying the variant, we screened the haplotypes in increasing windows surrounding the PLN pArg14del variant. We started with the PLN pArg14del variant and added one SNP on each side to form a window and counted how many different haplotype-alleles were present among carriers. We then repeatedly added one SNP to each side. If the addition of a SNP to either side resulted in a split of the most frequent haplotype of more than a threshold number of samples (a split threshold), we stopped adding SNPs to this side and continued the other. To identify the longest shared haplotype among carriers of the PLN pArg14del variant, we first selected SNPs with MAF ≥ 0.01 and set the split threshold at 3. To include haplotypes that could possibly explain the difference in symptoms, we selected SNPs with a minor allele count ≥ 4 in the group of carriers and set the split threshold at 10. We then compared the frequency of the most frequent haplotypes between symptomatic and asymptomatic carriers using a chi-square test only in the main splits (split threshold ≥ 3). We used the Ghap version 2.0 package in R [48] for the statistical analyses and counting of the haplotypes.

To assess the potential modifier effect of common variants, we carried out a genome-wide association scan and compared allele-dosages of each variant between asymptomatic carriers and asymptomatic non-carriers using plink2.0 (version alpha2-20191006) [49]. For this analysis, we only used unrelated samples because currently available software that can account for family-based cohorts cannot properly analyze such an imbalanced number of cases (asymptomatic carriers) and controls (asymptomatic non-carriers). To identify unrelated samples, we first calculated pairwise kinship coefficients using KING [50] v2.2 and removed individuals with high kinship (> 0.125) in their respective group. For each pair of related individuals, we removed the individual of lower age. We also removed individuals in the group of controls who were related to individuals in the cases group. In all, we investigated the additive effects of 16,478,162 common variants (after filtering by MAF > 0.1 for asymptomatic carriers and MAF > 0.05 for non-carriers, and INFO score > 0.4) with a logistic model that included the first PC, age, gender and the dosages of imputed or genotyped genetic variants, as in the following model:

$$\mathrm{logit}(\mathrm{outcome})\sim \mathrm{age}+\mathrm{sex}+\mathrm{PC}1+\mathrm{SNP}\_\mathrm{dosage}$$
(3)

To screen for association signals that could be driven by an imbalanced allocation of variants in the long haplotype of the PLN:c.40_42delAGA variant, and thus not related to the absence of symptoms, we assessed the correlation between the strength of the association (p value) of SNPs with p value ≤ 5 × 10–4 and their linkage disequilibrium (r2) with the PLN:c.40_42delAGA variant. SNPs that did not fit on the line of this correlation (by visual inspection) were considered suggestive associations and further investigated (Supplementary Fig. 9).

To evaluate the effects of rare variants, we carried out region-based association tests only on low frequency (MAF ≤ 0.05) or rare (MAF ≤ 0.01) variants with INFO score > 0.8, using the robust omnibus approach in the package SKAT [51]. This approach combines variance component tests and variant collapsing tests and corrects for case–control imbalance in binary traits [40]. We interrogated the regions ± 50 Kb in and around 80 genes with known genetic associations to cardiovascular diseases and ECG measurements [52] and regions ± 250 Kb around the two variants identified by our GWAS. For these analyses, we used the same logistic model and samples as in Eq. 3. We confirmed our results using dosages and adjusting for the common variant effect. For the significant genes and regions (p < 2.5 × 10–6) [53], we further analyzed the subregion level by splitting the significant region into ten segments.

Transcriptome-Wide Association Study

We performed a TWAS using PrediXcan [26, 54, 55]. We used the algorithm of PrediXcan to predict individual gene expression levels based on their genotypes. The models, available from PredictDB (http://predictdb.org), are elastic net–based models trained on European populations in the GTex version 8 release (elastic_net_eqtl.tar). For four heart-related tissues (artery aorta, artery tibial, heart atrial appendage and heart left ventricle), we predicted expression profiles for all the individuals included in our study (N = 36,339). Specifically, we imputed gene expression for 7589 genes in artery aorta, 8,610 genes in artery tibial, 6632 genes in heart atrial appendage and 6006 genes in heart left ventricle in all Lifelines participants. We then performed logistic regression to investigate differences in gene expression between asymptomatic carriers and asymptomatic non-carriers, with age, gender and the first 4 PCs from the genotype data as covariates. Specifically, we used the following model:

$$\mathrm{logit}(\mathrm{outcome})\sim \mathrm{age}+\mathrm{sex}+\mathrm{gene}\_\mathrm{expression}+\mathrm{PC}1+\mathrm{PC}2+\mathrm{PC}3+\mathrm{PC}4$$
(4)

Furthermore, we adjusted the p values with Bonferroni correction (implemented with the Statsmodels Python package [56] using the statsmodels.stats.multitest.multipletests() function, with the method variable being “bonferroni”). Similar to our GWAS, we used the same model (4) to investigate the differences for the suggestive genes in the remaining groups’ comparisons.

Validation in the Additional Lifelines Subset Cohort

We first sought to replicate our results in a second subset of 21,771 Lifelines participants for whom the same phenotype variables were available. This subset included individuals who were not genotyped with the Infinium Global Screening array but rather with the FinnGen ThermoFisher Axiom custom array (see URLs), which also includes the PLN:c.40_42delAGA variant. As the dataset had been recently genotyped, no imputed data was available. Neither of the two genetic variants showing suggestive association (SNP rs6768326 on chromosome 3 and SNP rs112525682 on chromosome 16) were directly genotyped with the Axiom array. For analyses, we classified subjects to be asymptomatic or symptomatic carriers or non-carriers following the same definitions we used for our discovery dataset and used the same models described above (models 1 and 3, with the exception of the PCs as they were unavailable in the second subset) to investigate the effect of quantitative phenotypes and genetic variants on asymptomatic carriers. To properly account for known relationships between the discovery Lifelines dataset and this second subset, we also jointly analyzed all Lifelines samples while correcting for the pedigree-derived kinship matrix and the subset in addition to other other covariants.

Validation in the ACM/PLN Patient Registry

We sought to further validate the HR and QRS phenotypes as potentially protective factors in 592 PLN:c.40_42delAGA carriers from the Netherlands Arrhythmogenic Cardiomyopathy (ACM) Registry [22], a national observational cohort study that includes patients with a definite ACM diagnosis and their at-risk relatives. The Registry is coordinated by the Netherlands Heart Institute (NHI, Utrecht, The Netherlands) and follows the Code of Conduct and the Use of Data in Health Research, and the national inclusion of patients is exempt from the Medical Research Involving Human Subjects Act (WMO) as per judgment of the Medical Ethics Committee (METC 18–126/C, Utrecht, The Netherlands). The ACM/PLN Registry is registered at the Netherlands Trial Registry, project 7097 (which can be consulted at the International Clinical Trial Registry Platform (ICTRP),  see URLs).

The registry collects participants’ medical history and (non-)invasive test information (e.g., electrocardiograms, Holter recordings, imaging and electrophysiological studies, pathology reports) at baseline and follow-up visits. No genome-wide array data was available in the registry. For analyses, we classified subjects as asymptomatic or symptomatic carriers or non-carriers according to the definition used in our discovery dataset and used the same models as described above (models 1 and 3) to investigate the effect of quantitative phenotypes on asymptomatic carriers. Out of the 947 individuals in the registry, we excluded individuals who could not be classified as “asymptomatic” or “symptomatic” because of missing values, leaving 592 individuals for further analysis. We further evaluated if the observed associations remained significant and showed consistent effect directions when defining the symptomatic and asymptomatic status using additional outcome parameters collected in the ACM/PLN registry [22], including (non)sustained VA, intracardiac defibrillator interventions, atrial arrhythmias, HF symptoms, hospitalizations and (cardiac) death (Supplementary Table 3B). Additionally, to eliminate possible false correlations introduced by age differences, we further stratified the participants into younger (≤ 40 years of age) and older (> 40 years) groups and repeated the regression analysis. For the analyses in the ACM/PLN registry, we did not remove individuals from the same family because the exact family relationships were not recorded.