Analyzing age-specific genetic effects on human extreme age survival in cohort-based longitudinal studies

feature-image

Play all audios:

Loading...

The analysis of age-specific genetic effects on human survival over extreme ages is confronted with a deceleration pattern in mortality that deviates from traditional survival models and


sparse genetic data available. As human late life is a distinct phase of life history, exploring the genetic effects on extreme age survival can be of special interest to evolutionary


biology and health science. We introduce a non-parametric survival analysis approach that combines population survival information with individual genotype data in assessing the genetic


effects in cohort-based longitudinal studies. Our approach is characterized by non-parametric analysis of late age survival to capture the observed pattern of mortality deceleration and


frailty modeling to account for individual heterogeneity in unobserved frailty. The method is applied to ApoE genotype data in the Danish 1905 birth cohort to estimate effect of the e4


allele. Our results revealed an age-specific relative risk of the allele that increases nonlinearly with age and non-proportional patterns in hazard of death for carriers and non-carriers of


the allele, suggesting that the e4 mutation preserves its deleterious effect that progressively affect human survival even at extreme ages.


The evolutionary theory of aging assumes that the effect of a gene could change over an individual’s life course as genetic mutation functioning at late ages are subject to weaker selection


than early-acting mutations.1 The age-specific genetic effects have been shown to affect fitness traits in animal models.2 In humans, age-specific effects of genetic variations have been


reported to influence body mass index,3 blood pressure4, 5 and survival.6 At late life, the force of natural selection during the reproductive period stops. In term of survival, mortality


deviates significantly from the popular Gompertz model with a reliable attribute characterized by deceleration in age-specific mortality rates.7 The paradoxical ‘plateaued’ mortality pattern


implies that late life is a distinct phase of life history8 for which exploring the genetic effects can be of special interest to evolutionary biology and health science.


The estimation of an age-dependent genetic effect on survival can often be confounded by differential life course exposure to environmental factors or the birth cohort effect in


age-structured populations.9 For that reason, a good choice is to conduct a follow-up or longitudinal study on a birth cohort, which has only been feasible in animal experiments. In human


studies, however, longitudinal analysis on genetic association with human longevity can be done with old-aged birth cohorts, for example, the Danish 1905 birth cohort,10 to look for genes


that affect extreme age survival.9, 11, 12 Although of great interest, estimating genetic effects on late life survival is confronted with the distinct mortality pattern and sparse genetic


data available. In the literature, different theories or models have been proposed to explain the late life-mortality pattern,8 among them the heterogeneity model,13 which assumes individual


heterogeneity in unobserved frailty that follows a gamma distribution. Jacobsen et al11 applied a Cox regression model with gamma-distributed frailty to the Danish 1905 birth cohort data to


estimate the age-dependent effect on extreme age survival for the ApoE gene, the only gene whose role on longevity has been consistently demonstrated.14 This paper introduces a demographic


heterogeneity model that combines sparse individual genotype data with population survival information to measure age-specific genetic effect on survival at advanced ages. The method is


applied to ApoE genotype data from the Danish 1905 birth cohort10 to illustrate the patterns of the age-specific effect of the e4 allele in affecting extreme age survival. Results with and


without consideration of unobserved frailty will be compared and genotype-specific mortality patterns illustrated.


For a given genetic variation, for example, a SNP, individuals can be grouped according to their genotypes for a certain allele as non-carriers (0 allele), heterozygous (1 allele) and


homozygous (2 alleles) carriers based on which effect of the allele can be assumed to be additive, dominant or recessive. For simplicity, we divide individuals as carriers and non-carriers


of the allele, which is equivalent to a dominant assumption. In term of survival, the population survival rate in a birth cohort is the weighted mean for allele carriers (≥1 allele) and


non-carriers (0 allele),15


Here, s̄(x) is the mean survival rate in the birth cohort at age x, p is frequency of carriers of the allele, s1(x) and so(x) are survival rates for carriers and non-carriers of the allele.


The relationship between s1(x) and s0(x) reflects relative risk of the allele on survival. In a simple proportional hazard model, the hazards of death corresponding to s1(x) and s0(x) are


related as μ1(x)=rμ0(x) such that


The relationship above is based on the assumption that individuals are homogenous except for their genotypes of the allele. However, in reality, individuals are heterogeneous in their


unobserved factors or frailty, including genetic make-ups, which serves as the basis for existing theories that explain mortality deceleration at advanced ages, among which is the


demographic heterogeneity theory by Vaupel et al.13 It follows that, when an individual’s unobserved frailty designated as z is gamma-distributed with mean 1 and variance σ2, instead of (2),


the relationship between s1(x) and s0(x) becomes


s′(x) is a homogenous baseline survival function. Note that the integration of (3) with (1) combines population survival with genotype frequency and relative risk parameters, which allows


assessment of genetic effect on survival.


Based on (1), the proportions of carriers and non-carriers of the allele at any age x can be estimated as and , respectively. When genotype data is available for a random sample from the


cohort, a likelihood function based on binomial distribution can be constructed at each age x as


In (4), n1(x) and n0(x) are the number of counts for carriers and non-carriers of the allele at age x, p is proportion of carriers in the population, which can be available for specific


populations and s̄(x is population survival rate at age x obtainable from population statistics. With known s̄(x and p, (4) can be maximized to estimate the relative risk on survival for


carrying the allele. In a longitudinal study on a birth cohort, (4) can be done for each age or year of follow-up so that age-specific effects can be estimated. The maximum likelihood


estimation (MLE) is obtained by introducing a constraint as specified in (1) and optimization of (4) with numerical gradient and Hessian. Note that our MLE is free from specification of any


parametric form for the survival function and is thus a non-parametric approach. In addition, it combines or makes use of population data in the analysis of genetic effect. Moreover,


genotype-specific survival or mortality rates can be calculated at each age to further illustrate the genetic influence on mortality at advanced ages.


Finally, our model allows analysis of sex-specific effect16 by simply replacing the mean cohort survival in (1) with survival rate for males or females and performing the analysis for each


sex separately. However, because most of the survivors at extreme ages are females, insignificant results in males due to small sample sizes available may not necessarily mean sex-specific


effect. In this case, a combined analysis should be preferred.


We applied our method to the ApoE genotype data collected on 2662 individuals (584 males and 2078 females) from the Danish 1905 birth cohort10 collected in a longitudinal survey initiated in


1998. All participants were genotyped at age 92–93 years. Individual survival information has been collected with the latest update at the end of 2010 when 10 subjects were still alive with


their ages over 104 or 105. For the entire 1905 birth cohort, cohort-specific survival information is available from the Human Mortality Database at http://www.mortality.org/ jointly hosted


by University of California, Berkeley, and the Max Planck Institute for Demographic Research, Rostock, Germany. For the ApoE gene, frequency for the allele of interest, that is, the e4


allele, was estimated to be 0.174 in the Danish population,17 which corresponds to a carrier frequency of 0.318. As a ‘thrifty’ allele,18 carriers of e4 have been shown to have a higher


susceptibility to cardiovascular and Alzheimer’s diseases, and are associated with higher mortality as compared with non-carriers under the contemporary environmental condition.17, 19 As


such, frequency of the allele is expected to decrease with increasing age in a birth cohort. In our genotype data for the 1905 birth cohort, it is interesting to see that the deceasing


pattern continues even at extreme ages starting with 21.7% at age 93 until 7.8% at age 104, a rapid decrease of about 14% in 11 years (Figure 1). The declining nonlinear pattern in e4 allele


frequency that accelerates with age gives a clear indication of a deleterious effect of the allele on human extreme age survival, which needs to be characterized or measured by proper


statistical models.


Frequency of e4 allele carriers in the 2662 subject from the 1905 birth cohort starting from age 93 until age 104. There is a clear pattern of rapid decline as age increases, suggesting the


increased risk of the allele on survival over advanced ages.


With known population survival for the entire 1905 birth cohort and frequency of e4 allele in the Danish population, we first fitted the likelihood function in (4) without frailty using


genotype-specific survival as defined in (2). For each age x, our procedure estimated an age-specific relative risk on surviving from age x to x+1 (Table 1). Our results showed that the


estimated risks were all significantly different from one over all ages with a slight trend of increase at later ages. Figure 2a plots the estimated age-specific relative risks together with


their 95% confidence intervals. The figure clearly displays the increasing risk for the e4 allele in the oldest survivors. The highest risk of 1.23 (P=0.026) was obtained at the highest age


of 104. We continued our analysis with frailty modeling by introducing gamma-distributed frailty with mean of 1 and variance of 0.1 (according to our experience in fitting frailty models to


oldest-old mortality). From the estimated relative risks (Table 1), one could see that the frailty model gives higher risk estimates as compared with the no frailty model. In addition to


the increased risk, the age-dependent increase in risk estimates is more clearly seen with frailty modeling, although the overall pattern of increase remained (Figure 2b).


Estimated age-specific relative risks for carrying the e4 allele over extreme age survival with 95% confidence intervals, which deviate from constant and increase slightly over ages. Risk


estimates without (a) and with (b) consideration of unobserved heterogeneity show obvious underestimation by the former, suggesting the necessity of frailty modeling.


Using the relationships in (2) and (3), age-specific survivals for carriers and non-carriers of the e4 allele can be calculated with the estimated relative risk and baseline survival rate.


This allows calculation of age-specific hazard rate μ(x) because μ(x)=−d(ln s(x))/dx . In Figure 3, we show the non-parametric age-specific hazard functions for the total population starting


from age 80 (solid line) and the e4 allele carriers (dashed line) and non-carriers (dash-dotted line) starting from age 93. Although mortality patterns for carriers and non-carriers


followed the main pattern of the whole cohort, carriers had higher whereas non-carriers had lower instant probability of death than that for the mean population, and overall this deviation


grew larger at later ages. Moreover, the population mortality pattern in Figure 3 also exhibits the mortality leveling-off at high ages, suggesting the necessity of frailty modeling.


Age-specific hazards of death for the whole 1905 birth cohort starting from age 80 (solid line) and e4 allele carriers (dashed line) and non-carriers (dash-dotted line) starting from age 93.


The genotype-specific mortality deviates remarkably from proportional.


Note that the calculated patterns of genotype-specific hazards were the same for both frailty and no frailty models as optimization of (4) was done for each age, however, the genetic risk


was underestimated when unobserved heterogeneity in frailty is ignored.


Finally, we applied the frailty model to another example for SNP rs2764264 in the FOXO3A gene. The SNP was first reported to show association with human longevity in a case–control study


conducted in the Italian population.20 Recently, the SNP was tested in both case–control samples and the Danish 1905 cohort with the significant association replicated only in the


case–control samples.21 In Figure 4, we show age-specific risks estimated from our frailty modeling (frequency of carriers of minor allele set to 0.495 according to Soerensen et al21).


Different from the e4 all of ApoE gene, no risk estimate in Figure 4 reached statistical significance, although there is a slight trend toward a protective effect similar to that reported in


the literature in case–control studies.20, 21


Age-specific relative risks for carrying minor allele of SNP rs2764264 in the FOXO3A gene estimated with consideration of unobserved heterogeneity. No risk estimate reached statistical


significance, although there is a slight trend of protective effect.


The cohort study is deemed as the most ideal design for assessing risk factors that affect human longevity9 and in characterizing their age-specific effects. In humans, longitudinal


following up for survival analysis is only feasible in very old cohorts, such as the Danish 1905 birth cohort. However, at advanced ages, human survival is characterized by mortality


deceleration, which challenges conventional survival models.8 We introduced a non-parametric survival analysis that combines population survival information with individual genotype data in


estimating the genetic effects on human longevity. Our method conducts frailty modeling by introducing the simple gamma frailty model. Our comparison with a model that ignores unobserved


heterogeneity showed underestimated genetic effect by the latter, which emphasizes the importance of frailty modeling in genetic risk assessment at advanced ages. The constraint likelihood


for parameter estimation integrates population data with individual genotype data and allows non-parametric estimation of genetic risk parameters and the baseline survival function to avoid


specification of parametric survival models that deviate from the observed mortality pattern. In addition to parameter estimation, our procedure also calculates non-parametric


genotype-specific hazard of death over the observed ages to allow comparison with population mean death rate (Figure 3).


Our likelihood-based procedure is made possible by restricting estimation on each age separately. As an advantage, this allows measurement of age-specific genetic effect. As shown by Figure


2, the age-specific pattern of the estimated genetic risk deviates clearly from being constant or linear, which contradicts to the proportional hazard assumption. From the hazard functions


for carriers and non-carriers of e4 allele, one can easily see that they are far from proportional. Such a pattern will be missed by traditional survival analysis, such as the Cox’s


proportional hazard model. In Table 2, we compare the different analyses that have been applied to the ApoE genotype data in the 1905 cohort. The early analysis (with high censoring rate of


17%) by Bathum et al22 (Table 2) obtained an overall risk for e4 carriers, which was only borderline significant. Jacobsen et al11 introduced Aalen’s additive hazards model,23 an extended


Cox model, to estimate age-dependent risk assuming additive risks over age intervals. It is interesting that, when applied to the same updated data set (censoring rate 4%), their analysis


also reported the increased effect of the e4 allele on longevity, although their analysis was limited to three age intervals. In comparison, our combined analysis of population and


individual data enabled estimation for each age until the age as high as 104 years such that patterns of the mean genetic effects and genotype-specific mortality at extreme ages can be


examined (Table 2). It can be expected that, with the rapid development in the SNP genotyping and genome sequencing, more genetic data will be available for association analysis of human


extreme age survival for which proper statistical models can contribute.


This work was partially supported by the EU Seventh Framework Programme (FP7/2007–2011) under grant agreement no 259679 and NIH/NIA grant P01 AG08761.


Epidemiology, Institute of Public Health, University of Southern Denmark, Odense C, Denmark


Qihua Tan, Rune Jacobsen, Mette Sørensen, Lene Christiansen & Kaare Christensen


Department of Clinical Genetics, Odense University Hospital, Odense C, Denmark


Qihua Tan, Lene Christiansen, Torben A Kruse & Kaare Christensen


Department of Biochemistry and Pharmacology, Odense University Hospital, Odense C, Denmark


Anyone you share the following link with will be able to read this content: