 Research article
 Open Access
 Published:
Joint modelling of longitudinal lipids and time to coronary heart disease in the Jackson Heart Study
BMC Medical Research Methodology volume 20, Article number: 294 (2020)
Abstract
Background
Multiple longitudinal responses together with timetoevent outcome are common in biomedical studies. There are several instances where the longitudinal responses are correlated with each other and at the same time each longitudinal response is associated with the survival outcome. The main purpose of this study is to present and explore a joint modeling approach for multiple correlated longitudinal responses and a survival outcome. The method will be illustrated using the Jackson Heart Study (JHS), which is one of the largest cardiovascular studies among African Americans.
Methods
Four longitudinal responses, i.e., total cholesterol (TC), high density lipoprotein (HDL) cholesterol, triglyceride (TG) and inflammation measured by highsensitivity Creactive protein (hsCRP); and timetocoronary heart disease (CHD) were considered from the JHS. The repeated lipid and hsCRP measurements from a given subject overtime are likely correlated with each other and could influence the subject’s risk for CHD. A joint modeling framework is considered. To deal with the high dimensionality due to the multiple longitudinal profiles, we use a pairwise bivariate model fitting approach that was developed in the context of multivariate Gaussian random effects models. The method is further explored through simulations.
Results
The proposed model performed well in terms of bias and relative efficiency. The JHS data analysis showed that lipid and hsCRP trajectories could exhibit interdependence in their joint evolution and have impact on CHD risk.
Conclusions
We applied a unified and flexible joint modeling approach to analyze multiple correlated longitudinal responses and survival outcome. The method accounts for the correlation among the longitudinal responses as well as the association between each longitudinal response and the survival outcome at once. This helps to explore how the combination of multiple longitudinal trajectories could be related to the survival process.
Background
Longitudinal data together with timetoevent measurements are common in biomedical studies. In cardiovascular researches, for example, as we focus in this paper, lipid levels, i.e., total cholesterol (TC), low density lipoprotein (LDL) cholesterol, high density lipoprotein (HDL) cholesterol and triglyceride (TG) could be measured repeatedly over time along with timetoevent data such as time to coronary heart disease (CHD). Abnormalities in LDL cholesterol, HDL cholesterol, TC and TG levels have all been linked to increased risk for CHD. Furthermore, inflammation measured by highsensitivity Creactive protein (hsCRP) correlates with CHD status. As people age, body mass and composition as well as physical activity levels and diet tend to change which are often associated with changes in lipid levels.
There is an extensive literature available on separate analyses of longitudinal measurement data and timetoevent data including the popular linear mixed effects model for the former [1, 2], and parametric (Weibull model or exponential model) or semiparametric (Cox) proportional hazards model for the latter [3]. However, separate analyses could be inappropriate in some applications as health status (for example, risk to CHD) is likely correlated with longitudinal lipid level trajectory. Furthermore, separate analysis may not adequately address the underlying research questions. Wang and Taylor [4] include the longitudinal marker as a timedependent covariate in the (proportional hazards) survival model. It has been shown, however, that including longitudinal measurements as timevarying covariate may be inappropriate as the measurements could be subject to measurement error [5]. Several approaches of joint modeling of a longitudinal continuous response and a timetoevent outcome have been available. Tsiatis and Davidian [6] proposed a twostage joint model in application to longitudinal CD4 count data as surrogate marker to HIV/AIDS survival whereby a repeated measures random components model is considered in the first stage for the longitudinal CD4 count data while a Cox model is employed in the second stage for estimating the parameters. Henderson et al. [7] proposed a flexible joint modelling framework such that the longitudinal and survival processes are connected via a latent bivariate Gaussian process. A fully Bayesian version of this approach, implemented via Markov chain Monte Carlo (MCMC) methods is presented in Guo and Carlin [8]. Ibrahim et al. [9] applied such joint modeling and explored why it is needed in cancer research. A joint model for longitudinal and survival data with nonparametirc multiplicative random effects approach has been studied by Ding and Wang [10].
The majority of the joint models in the literature focus on a single longitudinal response and a survival outcome. Little attention has been given to modeling multiple longitudinal responses and a survival process simultaneously. Flexible semiparametric multivariate joint model that relate multiple longitudinal outcomes to a timetoevent have been proposed [11, 12] with emphasis on the association between the longitudinal data and survival outcome. Such multivariate joint analysis is also needed when one aims to explore not only the association between an individual longitudinal outcome and a survival process but also the correlation between the multiple longitudinal responses. In the Jackson Heart Study (JHS), described in “Methods” section, longitudinal HDL, TC, TG and hsCRP measurements taken from each subject repeatedly over time are subject to measurement error, potentially correlated with each other, and at the same time each of these lipid and hsCRP profiles is likely associated with the subject’s risk for CHD. It is then useful to assess these features by studying their association and simultaneous covariate effects, which can be addressed in joint modelling context. However, joint modeling of such multivariate longitudinal responses and a survival outcome poses computational challenges due to the high dimensionality in the random effects model.
This paper aims to present a joint modeling approach for multiple longitudinal responses and a survival data. To deal with the high dimensionality due to multiple longitudinal components, a pairwise model fitting approach that was developed in the context of multivariate Gaussian random effects is adopted [13].
The rest of the paper is organized as follows. In “Study data” section, a motivating dataset is described with analysis reported in “Results” section. “Review of basic models” section focuses on brief review of models for separate and joint analysis of a continuous longitudinal response and a survival outcome. “Multivariate joint model” section focuses on a joint model for multivariate continuous longitudinal responses and a timetoevent outcome, followed by a likelihood based estimation approach along the lines of Fieuws and Verbeke [13] in “Estimation” section. “Simulation” section presents results of a simulation study. Discussion and concluding statements are given in “Discussion” and “Conclusion” sections, respectively.
Methods
Study data
The Jackson Heart Study (JHS) is a longitudinal communitybased study that investigates the causes of cardiovascular disease among African Americans. JHS participants were recruited from urban and rural areas from the three counties that make up the Jackson metropolitan area (Hinds, Madison, and Rankin). The 5,306 enrolled participants received three clinical examinations (Exam 1, 200004; Exam 2, 200508; Exam 3, 200913) that generated a plethora of longitudinal data including, but not limited to, cardiovascular disease risk factors, socioeconomic information and biochemical analytes. In addition to the clinic exams, JHS has kept track of the occurrence of events of interest (i.e. coronary heart disease, abbreviated as CHD) through adjudicated events. Details on the design methods and data collection of the JHS can be obtained from Taylor et al. [14] and Carpenter et al. [15].
This paper uses data from two sources of JHS: (1) the longitudinal clinical examinations where sex, diabetes status, hypertension status, statin use, body mass index, inflammation, and lipid levels are extracted from; and (2) the annual followup surveys where Coronary Heart Disease (CHD) adjudicated events through 2016 are obtained. We allow for right censoring, i.e., those who experienced CHD since the baseline year (2000) were considered as incident cases and those who have died, withdrew from the study or have not seen CHD until 2016 are treated as censored. The time variable (event or censoring) begins from the baseline year and we haven’t assumed truncation as present. These variables are summarized in Table 1 by CHD status across the three measurement occasions where count and percentages are used for categorical variables and means and SDs for continuous variables. There are some notable highlights. Of the 4232 participants who have lipids and hsCRP measurements available at baseline, 236 (5.6%) experienced CHD event with a median survival time of 13.8 years. The mean age at baseline for nonCHD was 53.2 ±12 years while 62.0 ±10 years for those who had CHD. The proportion of diabetes as well as hypertension is higher in the CHD incident cases as compared to those without CHD across all visits. The mean HDL seems to increase over time, though the magnitude of mean level in nonCHD participants appears slightly higher when compared across the respective visits. The mean TG, however, tends to decline in both groups and nonCHD participants appear to have lower values. Incident cases are more likely to be men. Individual profiles plots of TG, hsCRP, HDL and TC for a random sample of one hundred participants are shown in Fig. 1. Clearly, there is evidence of betweensubjects variability both at baseline and over time. On the other hand, Fig. 2 shows the average profiles plots of TG, hsCRP, HDL and TC for the full cohort. The average TG trajectory tends to decline while hsCRP and HDL seem to suggest an increasing average trend. TC appears to be more or less stable. A formal statistical test is needed to see the significance of these comparisons which will be dealt with later in “Results” section.
The Jackson Heart Study provides a unique opportunity to study the longitudinal trajectory of lipids and hsCRP and incident CHD. Therefore, we will assess the association between the longitudinal lipids and hsCRP with incident CHD as well as the correlation between the longitudinal lipids and hsCRP measurements in African American participants.
Review of basic models
In “Linear mixed model for longitudinal data”, we will briefly review the linear mixed model for continuous longitudinal data, followed by a basic parametric model for survival data in “Model for timetoevent data” sections. “Joint model” section focuses on a joint model for a longitudinal response and timetoevent outcome.
Linear mixed model for longitudinal data
Let \(\phantom {\dot {i}\!}y_{i1}, y_{i2}, \ldots y_{i n_{i}}\) be longitudinal measurements from the i^{th} subject at times \(\phantom {\dot {i}\!}t_{i1}, t_{i2}, \ldots t_{i n_{i}}\). The model can be written as
where x_{1i}(t_{ij}) are regressors corresponding to unknown regression coefficients β,b_{1i} are random effects with design matrix z_{1}(t_{ij}), and ε_{i} are measurement errors. The subjectspecific random effects b_{1i} and measurement error ε_{i} are independent and assumed to follow normal distribution, i.e., b_{1i}∼N(0,D), and ε_{i}∼N(0,Σ_{i}). Details on linear mixed models can be found in Laird and Ware [1].
Model for timetoevent data
Suppose s_{i} denotes the survival time for subject \(i,~i=1, 2, \dots, n.\) The event of interest may not be observed for some of the subjects until the end of the followup period and hence their event times are right censored. Let c_{i} be the censoring time for subject i. For an indicator function δ_{i}=I(s_{i}≤c_{i}), it follows that, δ_{i}=0 if the survival time for subject i is right censored and δ_{i}=1 otherwise. For subject i the observed timetoevent data are (t_{i},δ_{i}) where t_{i}=min(s_{i},c_{i}),i=1,2…,n.
Weibull and exponential models are commonly used parameteric models in survival data analysis. In a Weibull model the density for the survival time for the i^{th} subject is given by
α>0,λ_{i} can be linked with covariates as λ_{i}=x_{2i}ξ, where x_{2i} is the vector of covariates corresponding to the i^{th} observation and ξ is a vector of regression coefficients. The Weibull model in (2) reduces to the exponential model when α=1.
Joint model
The longitudinal response in “Linear mixed model for longitudinal data” and survival process in “Model for timetoevent data” sections can be correlated with each other. One possible route to study this association is through joint modeling, where the linear mixed model in (1) is linked with the survival model in (2) through shared random effects.
Assuming a linear mixed effects model for the longitudinal part and a Weibull model for the survival part, the joint distribution of the longitudinal response and the timetoevent outcome conditional on the random effects is:
where b_{2i}=θb_{1i},b_{1i} and b_{2i} are latent zeromean bivariate Gaussian processes corresponding to the longitudinal measurements and events, θ is vector of association parameters, and the rest of the quantities are as defined in “Linear mixed model for longitudinal data” and “Model for timetoevent data” sections. For random intercepts b_{0i} and random slopes b_{1i}, for example, one possible model could be
where θ_{1} and θ_{2} measure the association between the two submodels induced by the random intercepts and random slopes, respectively. Other ways of parametrization for the association of the two submodels can also be used as well [7]. Furthermore, b_{1i}=(b_{0i},b_{1i}) is assumed to follow bivariate normal distribution, possibly correlated, with mean 0 and variancecovariance matrix D given by
where ρ is the correlation between the random intercepts and random slopes.
Multivariate joint model
The joint mode in “Joint model” section can be extended to a multivariate setting where k longitudinal continuous responses can be simultaneously modeled with timetoevent data. A multivariate joint model formulation of the k longitudinal responses and the timetoevent outcome follows from combining each single joint model of the longitudinal continuous response and the timetoevent outcome via random effects that are assumed to follow multivariate normal distribution. The general expression for the resulting multivariate joint model can be given by
where f(y_{ℓ},t) is the single joint model corresponding to the ℓ^{th} continuous response and the timetoevent data as described in “Joint model” section and ϕ is a kdimensional Gaussian density for the random effects (b_{1},…,b_{k}), such that, for subject i, each b_{ℓi} is shared between the linear mixed submodel corresponding to the ℓ^{th} longitudinal response and the timetoevent submodel through b_{1i} and b_{2i}, respectively, as shown in “Joint model” section. The likelihood contribution of subject i is l_{i}(Y_{1i},…,Y_{ki},t_{i}Λ), where Λ is the vector of fixed effect and covariance parameters in the joint model. The full multivariate joint model in (4) is complex due to the high dimensionality in the random effects and inference for Λ is not feasible in the standard statistical softwares. An estimation approach to overcome this computational challenge and reduce the dimensionality along the lines of Fieuws and Verbeke [13] will be the topic of “Estimation” section.
Estimation
We employ the pairwise bivariate fitting to reduce the high dimensionality of (4) and estimate the model parameters in Λ [13]. By pairwise fitting we mean that bivariate joint models of all possible pairwise combinations of the longitudinal continuous responses and a timetoevent data are fitted and duplicate parameter estimates are averaged across all pairs. Consider k randomintercept models that are specific to k single joint models where each joint model represents a longitudinal continuous responses and timetoevent outcome as described in “Joint model” section. The possible number of pairs out of the k continuous responses is \(P=\frac {k(k1)}{2}\). For subject i, consider a vector of the r^{th} and s^{th} bivariate longitudinal responses and the timetoevent outcome (Y_{ri}^{′},t_{i}) and (Y_{si}^{′},t_{i}), respectively, where \(\boldsymbol {Y_{{ri}}}=\left (y_{ri1}, \ldots, y_{ri n_{{ri}}}\right)\) and \(\boldsymbol {Y_{{si}}}=\left (y_{si1}, \ldots, y_{si n_{{si}}}\right), r=1, \ldots, k1, s=r+1,\ldots, k.\) We assume that (Y_{ri},t_{i}) and (Y_{si},t_{i}) are conditionally independent given b_{i}=(b_{ri},b_{si})^{′}, where b_{i}∼MVN(0,D) is vector of random effects corresponding to the r^{th} and s^{th} bivariate joint models with mean 0 and covariance matrix D.
The likelihood to be maximized corresponding to the r^{th} and s^{th} bivariate joint model of two longitudinal continuous responses the timetoevent data takes the form
where f and ϕ are as defined in (4) and N is the total number of subjects. Fitting these bivariate models is computationally feasible as there are only two correlated random effects. The loglikelihood across all possible pairs can be given as \({\sum \nolimits }_{p=1}^{P} l\left (\boldsymbol {Y_{p}}  \boldsymbol {\Lambda }^{\prime }_{p}\right)\), where p=1,…,P, and \(\boldsymbol {\Lambda }^{\prime }_{p}\) is the vector of parameters corresponding to the p^{th} bivariate joint model. All pairspecific parameter vectors \(\boldsymbol {\Lambda }^{\prime }_{p}\) can be combined and the so resulting vector of parameters is denoted by Λ^{′}. Assuming that \(\widehat {\boldsymbol {\Lambda }^{\prime }_{p}}\) maximizes \(l\left (\boldsymbol {Y_{p}}  \boldsymbol {\Lambda }^{\prime }_{p}\right), \widehat {\boldsymbol {\Lambda }^{\prime }}\) would maximize l(Λ^{′}). An estimate for each parameter in Λ of the full multivariate joint model defined in “Multivariate joint model” section is obtained by averaging all pairspecific estimates in \(\widehat {\boldsymbol {\Lambda }^{\prime }}\). The resulting averages are asymptotically normal. However, the standard errors can’t be directly computed by simple averaging of their corresponding pairspecific estimates. Details on the averaging of pair specific estimates and expression of the asymptotic normality of Λ^{′} follows in a similar way as the multivariate linear mixed model and multivariate semicontinuous data and can be obtained in Fieuws and Verbeke [13]. This approach can be well generalized to random intercepts and random slopes model of course with additional computational complexity. Each bivariate pair consists of a longitudinal response and a timetoevent outcome as shown in (5) does not have a closed form solution and is fitted numerically with the SAS procedure NLMIXED using adaptive Gaussian quadrature based on quasiNewton optimization technique. Other flexible nonlinear optimizers can be employed as well. A SAS macro has been used to fit the P bivariate models and combine the results.
Results
We analyze the JHS data introduced in “Study data” section. A multivariate joint model of “Multivariate joint model” section will be fitted for the longitudinal measurements of high density lipoprotein (HDL) cholesterol, triglyceride (TG), total cholesterol (TC), highsensitivity Creactive protein (hsCRP), and time to coronary heart disease (CHD) to capture the association between each of these longitudinal response and the timetoCHD as well as the correlation between HDL, TG, TC and hsCRP. Logtransformation was applied to the four longitudinal responses based on earlier exploratory analysis. Let y_{ℓij},ℓ=1,…,4 be the ℓ^{th} logtransformed continuous response for subject i at time j and we model the mean μ_{ℓij} as
where A_{i} baseline age for subject i, M_{i} a gender indicator for subject i coded as (1: male; 0: female), D_{ij} is a diabetic status indicator for subject i at time j coded as (1: diabetic; 0: nondiabetic), H_{ij} is a hypertension status indicator for subject i at time j coded as (1: hypertensive; 0: nonhypertensive); T_{ij} is the j measurement time for subject i in years from baseline; BMI_{ij} is body mass index for subject i at time j, ST_{ij} is a statin medication indictor for subject i at time j coded as (1: medication; 0: no medication), SM_{i} is current smoking indicator for subject i at baseline coded as (1: smoker; 0: nonsmoker) and b_{ℓi} are subject specific random intercepts for the ℓ^{th} continuous response.
Turning to the timetoCHD, we consider an exponential submodel in each joint model with the ℓ^{th} continuous response, and further the hazard at time t is modeled as,
where θ_{ℓ} captures the association between the linear mixed submodel and the exponential submodel induced by the random effects. The correlation between any two pairs of longitudinal responses is captured by the correlation of the respective random intercepts.
Six pairs of bivariate models were fitted for the four responses using the pairwise fitting approach outlined in “Estimation” section. Results are shown in Table 2. Clearly, the estimated association parameter of TG and CHD is significant, suggesting evidence of strong negative association between the two submodels (\(\hat {\theta }=0.4473, p=0.009\)), and this implies that high level of TG is associated with poor survival or shorter timetoCHD. The same is true for the association between hsCRP and CHD (\(\hat {\theta }=0.2504, p=0.005\)). On the contrary, the estimate corresponding to HDL (\(\hat {\theta }=1.0472, p=0.002\)) is positive and significant suggesting high HDL level is associated with better survival. On the other hand, the association parameter estimate corresponding to TC provides no evidence of association between total cholestol and CHD (p=0.151). Consider testing hypotheses about the four association parameters jointly. One can employ a multivariate Wald test for the hypothesis H_{0}=L(θ)=0, where L is a 4 ×4 identity matrix corresponding to \(\widehat {\theta }_{\ell }, \ell =1, \ldots, 4\). Thus, \(W^{2}=\left (\mathbf {L} \boldsymbol {\widehat {\theta }}\right)^{\prime } {\left (\mathbf {L}\text {COV}(\widehat {\boldsymbol {\theta }}) \mathbf {L}^{\prime }\right)}^{1}\left (\mathbf {L}\widehat {\boldsymbol {\theta }}\right) \sim {{\chi }^{2}}_{(4)}\), where \(\text {COV}\left (\widehat {\boldsymbol {\theta }}\right)\) is the 4 ×4 covariance matrix of \(\widehat {\boldsymbol {\theta }}\) obtained from fitting the multivariate joint model. This test suggests joint significance of the association between the four longitudinal responses and CHD (χ^{2}=18.72,p=0.0009). Another important output of this analysis is estimated correlation of the random intercepts. In this regard, we observe evidence of negative correlation between TG and HDL (−0.5026) as well as HDL and hsCRP (−0.1112), while positive correlation between TG and TC (0.4390), TG and hsCRP (0.1478), and HDL and TC (0.2235). However, there is no evidence of significant correlation between TC and hsCRP. The estimated standard deviation of the random effects are significant across all the responses, implying the presence of considerable betweensubjects variability that needs to be appropriately accounted for. All the estimates corresponding to time in the linear mixed submodel \(\left (\hat {\beta _{5}}\right)\) are significant, i.e., negative time effect for TG, but positive effect for HDL, TC and hsCRP, and this is inline with what has been observed from the average profiles plot in Fig. 2. Those who are in statin medication have lower TG, TC and hsCRP, while no signifcant assocation was observed between HDL and statin use. Smokers tend to have elevated mean TG and hsCRP as compared to nonsmokers. Those who are diabetic have higher TG and hsCRP but lower HDL and TC. The estimates of covariate effects in the survival submodel are quite similar except some slight variation depending on which of the longitudinal responses of TG, HDL, TC or hsCRP was modeled in the linear mixed submodel, and this is expected as these lipid and hsCRP measures have different effects on CHD survival as demonstrated by the differences in the estimates of the association parameter (θ) as well as the random intercept standard deviations (d). CHD survival is estimated to be shorter among hypertensive patients. For example, looking the parameter estimate of HTN \(\left (\hat {\xi _{4}}\right)\) correspond to TG, we observe that those who are hypertensive have about 60% (1−e^{−0.9204}=1−0.3984) shorter survival time. Similarly, smokers, diabetic patients, male gender and older age have shorter survival or higher risk to CHD. But no significant association was observed between BMI and CHD survival. Likely due to the smaller number of repeated measurements per subject, incorporating an additional random slope term and association parameter θ_{2} as described in “‘Joint model” section did not improve model fit. This implies that the subject specific random intercepts are sufficient to describe the longitudinal aspect in these data.
For comparisons purpose, separate joint model of “Joint model” section were fitted. The results are summarized in Table 3. When compared with their respective multivariate joint model estimates of Table 2, we notice slight differences in the association parameters estimates corresponding to TG and hsCRP as well as the intercept terms of the survival component. The rest of the parameter estimates appear to be similar. Furthermore, we fitted an exponential model for timetoCHD by including only baseline values of lipids separately while adjusting for the same covariates of the survival submodel considered so far. The estimated regression coefficients (standard error) corresponding to baseline TG, HDL, hsCRP and TC are −0.2354 (0.0876),0.6206 (0.1834),−0.1674 (0.0421), and −0.5710 (0.2344), respectively. Comparing these results with the association parameter estimates of joint multivariate model in Table 2, we clearly observe a remarkable difference in modeling the longitudinal trajectories versus the baseline measurements of lipids. We will explore this further in “Simulation” section with simulation.
Simulation
In this section, we report on a simulation study set up to examine the performance of the pairwise bivariate fitting in joint modeling of multiple longitudinal measurements and a timetoevent data. The estimates from the pairwise bivariate fitting will be compared with the full likelihood multivariate joint model in terms of bias and relative efficiency.
We randomly generated 1000 data sets for three correlated longitudinal responses y_{ℓij} and a timetoevent outcome t_{i} from the multivariate joint model of “Multivariate joint model” section for 1000 subject at 7 time points. The ℓ^{th} longitudinal data were generated as y_{ℓij}=β_{ℓ0}+β_{ℓ1}time_{ij}+β_{ℓ2}male_{i}+b_{ℓi}+ε_{ℓij}, i=1,…,1000;j=0,…,6;ℓ=1,2,3. Similarly, the timetoevent data corresponding to the y_{ℓij} is generated from Weibull(α_{ℓ},μ_{ℓi}), where μ_{ℓi}=ξ_{ℓ0}+ξ_{ℓ1}male_{i}+θ_{ℓ}b_{ℓi}, and the true parameter values for the three responses were β_{1}=(3.7,−1,0.5)^{′},β_{2}=(4,1,−0.5)^{′},β_{3}=(5,1,−0.5)^{′},ξ_{ℓ}=ξ=(5,−0.5), vector of association parameters of the longitudinal and timetoevent data θ=(θ_{1},θ_{2},θ_{3})=(−0.8,0.6,−0.5),α_{ℓ}=α=0.5, the residual errors ε_{ijℓ} are assumed independent, each generated from normal distribution with mean 0 and standard deviations 0.8, 1 and 1. The random intercepts (b_{1i},b_{2i},b_{3i}) are assumed correlated and generated from multivariate normal distribution with mean 0, standard deviations 0.7, 0.5, and 0.5, and correlations: ρ_{12}=−0.5,ρ_{13}=0.6 and ρ_{13}=0.3.
The simulated data were analyzed by the multivariate joint model using pairwise bivariate fitting and full likelihood trivariate joint model. All the 1000 simulations converged in the pairwise fitting while 985 simulations do so in the full trivariate model, i.e., nearly 1.5% failed to converge in the latter. For comparison purposes, results of the 985 successful simulations of the full multivariate joint model and the same number of simulations from the pairwise fitting, based on simulation ID number, were used. The average estimated values from the pairwise bivariate fitting (Mean_{P}) and full trivariate joint model (Mean_{F}) are summarized in Table 4. All of the mean estimates from the pairwise fitting are close to the true values and nearly unbiased for the proposed approach. Furthermore, as shown in Table 5, the Monte Carlo standard errors from the pairwise fitting (MC.SE_{P}) are close to those from the full trivariate joint model (MC.SE_{F}) with their ratios range from 0.997 to 1.056, and this suggests that no considerable efficiency loss of the pairwise bivariate approach relative to the full likelihood multivariate model.
Furthermore, the simulated data were analyzed by Weibull model of “Model for timetoevent data” section to investigate the impact of omitting the longitudinal aspect as well as the correlation between the longitudinal responses. This was achieved by including only basline measurements of the longitudinal responses as covariates in the regression model for the timeoevent outcome. Three separate models were fitted as Weibull(α_{ℓ},μ_{ℓi}), where μ_{ℓi}=ξ_{ℓ0}+ξ_{ℓ1}male_{i}+θ_{ℓ}y_{ℓi},y_{ℓi} is baseline values for the longitudinal response y_{ℓij} and the rest of the quantizes remain defined earlier. The results are summarized in Table 6. Clearly, the impact of taking just the baseline values of y_{ℓi} and ignoring the longitudinal trajectories is remarkable. This can be clearly observed from the considerable bias of the regression coefficients including the association parameter θ.
Discussion
The method has been used to jointly analyze four multivariate longitudinal responses and a timetoevent outcome from a cardiovascular study on African Americans. The data analysis showed inverse relationship between HDL cholesterol and incidence of coronary heart disease, while elevated triglyceride level and highsensitivity Creactive protein are associated with increased risk for CHD. Duncan et al. [16] who studied association between longitudinal lipid trajectories and atherosclerotic cardiovascular disease (ASCVD) including CHD reported a similar finding on the association of HDL cholesterol and ASCVD. This is inline with Wilikins et al. [17] and Skrettebegr et al. [18]. While Duncan’s investigation did not find a significant association between triglyceride and ASCVD, we observe an association of triglyceride with CHD which is consistent with Sarwar et al. [19]. This is likely because in Ducan’s study ASCVDE includes CHD, stroke, or transient ischemic attack, and intermittent claudiction while the present study and Sarwar’s investigation focus on CHD. Jeong et al. [20] reported a similar finding on the association of triglyceride and CHD but reported a significant association between total cholesterol and CHD which is not the case for total cholesterol in the current study. This could be attributed to the fact that Jeong et al. [20] examined young adults (aged 2039 years) while JHS participants are relatively older.
Our simulation study suggest that bivariate fitting of such multivariate joint model yields unbiased estimates which are as nearly efficient as the full likelihood model. These results are inline with the performance of the pairwise bivariate fitting in the context of multivariate Gaussian longitudinal data as shown in Fieuws and Verbeke [13] and of semicontinuous longitudinal data given in KassahunYimer et al. [21]. Pairwise bivariate fitting is approximately unbiased in missing completely at random (MCAR) and missing at random (MAR) when outcomespecific parameters are considered, but these results may not necessarily be generalized to all missing data problems [21]. Details on missing data mechanisms can be obtained in Little and Rubin [22]. Alison [23] presented a survey of methods to handle missing data and indicated that maximum likelihood based inference for handling missing data have nearly optimal statistical properties under MAR. Ding and Wang [10], however, showed that in joint modeling of univariate longitudinal response and timetoevent modeling, informative dropout on the longitudinal component could result in serious bias. Multiple imputation is also an attractive approach in handling missing data in some applications despite the several questions posed in relation to the distributional choices, number of imputations and iterations and prior knowledge about the missingness pattern [22, 23]. Fieuws and Verbeke [13] pointed out efficiency loss can be present when some of the parameters are shared by a set of outcomes. An additional simulation conducted to investigate the impact of taking just the baseline values of longitudinal response and including as covariates in survival regression indicated that parameter estimates could be severely biased. This is inline with earlier simulation studies. For example, Henderson et al. [7] and Ibrahim et al. [9] pointed out that severe bias could occur for some parameter estimates when there is ignored latent association between the timetoevent and longitudinal data.
Apart from several advantages, our proposed approach has also limitations. First, right censoring is assumed for the timetoevent outcome, though other mechanisms of censoring could also be operating in practice. Second, we assumed the link between the longitudinal and the survival submodels is constant over time. Third, earlier simulations suggest that pairwise bivariate fitting is approximately unbiased in MCAR and MAR but this may not be the case for all longitudinal data with missingness. Hence, exploring the proposed approach in various missing data mechanisms and patters together with appropriate model diagnostic tools is a useful area of future work.
Conclusion
In this paper, we have presented and studied a multivariate joint model for multiple longitudinal Gaussian responses and a timetoevent outcome which follows from combining each single joint model of a longitudinal continuous response and the timetoevent outcome via random effects whereby a linear mixed submodel is used for the longitudinal continuous part while a timetoevent submodel is considered for the survival part. The model was illustrated using the Jackson Heart Study. The association between lipid levels and CHD risk obtained from the joint model that takes into account the longitudinal lipid trajectories is substantially higher than that estimated using the baseline lipid values only. The Weibull model (exponential model as its special case) is most common for timetoevent data due to its flexibility, but alternative distributions, such as, the lognormal, loglogistic and gamma could provide better fit to a particular dataset. The choice of one distribution from the other could be based on exploratory graphical tools or formal statistical tests. The pairwise bivariate model fitting approach employed for the Weibull model can be well extended to any other probability density function in the survival submodel.
Availability of data and materials
Data used in this analysis were produced and used in accordance with the policies of the Jackson Heart Study under contracts from the National, Heart, Lung, and Blood Institute and are not the domain of the authors but that of the Jackson Heart Study. These data are available to other researchers for purposes of reproducing the results or replicating the procedures by submitting a manuscript proposal to the Jackson Heart Study at jhspub @umc.edu, as described at https://www.jacksonheartstudy.org/Research/Publications#submitmanuscript. Data updates for the Jackson Heart Study are also deposited regularly in the National Institutes of Health data repositories, dbGaP (https://www.ncbi.nlm.nih.gov/gap/) and BioLincc (https://biolincc.nhlbi.nih.gov/home/).
Abbreviations
 CHD:

Coronary heart disease
 HDL:

High density lipoprotein
 JHS:

hsCRP: Highsensitivity Creactive protein
 Jackson heart study; SD:

Standard deviation
 TC:

Total cholesterol
 TG:

Triglyceride
References
 1
Laird N, Ware J. Randomeffects models for longitudinal data. Biometrics. 1982; 38:963–74.
 2
Verbeke G, Molenberghs G. Linear mixed models for longitudinal data. New York: Springer; 2000.
 3
Hosmer D, Lemeshow S, May Susanne. Applied survival analysis: regression modeling of timetoevent data. New York: John Wiley & Sons, Inc.; 2008.
 4
Wang Y, Taylor J. Jointly modeling longitudinal and event time data with application to acquired immunodeficiency syndrome. J Am Stat Assoc. 2001; 96:895–905.
 5
Rizopoulos D. Dynamic predictions and prospective accuracy in joint models for longitudinal and timetoevent data. Biometrics. 2011; 67:819–29.
 6
Tsiatis A, Davidian M. Joint modeling of longitudinal and timetoevent data: An Overview. Stat Sin. 2004; 14:809–34.
 7
Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000; 1:465–80.
 8
Guo X, Carlin B. Separate and Joint modeling of longitudinal and event time data using standard computer packages. Am Stat. 2004; 58:16–24.
 9
Ibrahim J, Chu H, Chen L. Basic concepts and methods for joint models of longitudinal and survival data. J Clin Oncol. 2010; 28:2796–801.
 10
Ding J, Wang J. Modeling longitudinal data with nonparametric multiplicative random effects jointly with survival data. Biometrics. 2008; 64:546–56.
 11
Rizopoulos D, Ghosh Pulak. A Bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a timetoevent. Stat Med. 2011; 30:1366–80.
 12
Brown E, Ibrahim J, DeGruttola V. A flexible Bspline model for multiple longitudinal biomarkers and survival. Biometrics. 2005; 61:64–73.
 13
Fieuws S, Verbeke G. Pairwise fitting of mixed models for the joint modeling of multivariate longitudinal profiles. Biometrics. 2006; 62:424–31.
 14
Taylor H, Wilson J, Jones D, Sarpong D, Srinivasan A, Garrison R, Nelson C, Wyatt S. Towards resolution of cardivascular health disparities in African Americans: design and methods of the Jackson Heart Study. Ethicity Dis. 2005; 15:S64–S617.
 15
Carpenter M, Crow R, Steffes W, Heilbraun J, Evans G, Skelton T, Jensen R, Sarpong D. Laboratory, reading center and coordinating center data management methods in the Jackson Heart Study. Am J Med Sci. 2004; 328:131–44.
 16
Duncan M, Vasan R, Xanthakis V. Trajectories of blood lipid concentrations over the adult life course and risk of cardiovascular disease and allcause mortality: observations from the Framingham study over 35 years. J Am Heart Assoc. 2019; 8:e011433.
 17
Wilkins J, Ning H, Stone N, Criqui M, Zhao L, Greenland P, LIoydJones D. Coronary heart disease risks associated with high levels of HDL cholesterol. J Am Heart Assoc. 2014; 3:e000519.
 18
Skretteberg P, Grundvold I, Kjeldsen S, Erikssen J, Sandvik L, Liestøl K, Erikssen G, Pedersen T, Bodegrad J. HDLcholesterol and prediction of coronary heart disease: Modified by physical fitness? A 28year followup of apparently healthy men. Atherosclerosis. 2012; 220:250–6.
 19
Sarwar N, Danesh J, Eiriksdottir G, Sigurdsson G, Wareham N, Bingham S, Boekholdt M, KayTee K. Triglycerides and the risk of coronary heart disease 10 158 incident cases among 262 525 participants in 29 western prospective studies. Circulation. 2007; 115:450–8.
 20
Jeong SuM, Choi S, Kim K, Kim S, Lee J, Park S, Kim YY, Son J, Yun JM, Park S. Effect of change in total cholesterol levels on cardiovascular disease among young adults. J Am Heart Assoc. 2018; 7:e008819.
 21
KassahunYimer W, Albert P, Lipsky L, Nansel T, Liu A. A joint model for multivariate hierarchical semicontinuous data with replications. Stat Methods Med Res. 2019; 28:858–70.
 22
Little RJA, Rubin DB, Statistical analysis with missing data. NJ: John Wiley & Sons, Inc; 2002.
 23
Allison P. Missing data techniques for structural equation modeling. J Abnorm Psychol. 2003; 112:545–57.
Acknowledgements
The authors thank the Editor and the reviewers for their helpful and constructive comments. The authors also wish to thank the staffs and participants of the JHS. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the U.S. Department of Health and Human Services.
Funding
The Jackson Heart Study is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), the Mississippi State Department of Health (HHSN268201800015I) and the University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I and HHSN268201800012I) contracts from the National Heart, Lung, and Blood Institute (NHLBI) and the National Institute on Minority Health and Health Disparities (NIMHD). The views expressed in this article are those of the authors and do not necessarily represent the views of the NHLBI, the NIH, or the US Department of Health and Human Services.
Author information
Affiliations
Contributions
WKY performed statistical analysis and drafted the manuscript; KAV participated in the data management and reviewed the manuscript; AAO and MEH contributed in the drafting and reviewed the manuscript; YIM, LCS and PA contributed in the data coordination and reviewed the manuscript; AC was involved in data coordination, drafting and critically reviewed the manuscript. All author(s) read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The JHS followed guidelines expressed by the Declaration of Helsinki, and all participants completed written informed consent. The available data contained deidentified information. The JHS was approved by the Institutional Review Boards of Jackson State University, Tougaloo College, and the University of Mississippi Medical Center in Jackson.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
KassahunYimer, W., Valle, K.A., Oshunbade, A.A. et al. Joint modelling of longitudinal lipids and time to coronary heart disease in the Jackson Heart Study. BMC Med Res Methodol 20, 294 (2020). https://doi.org/10.1186/s12874020011777
Received:
Accepted:
Published:
Keywords
 Joint modeling
 Multivariate longitudinal
 Survival data
 Correlated responses