Modeling Frailty Using Birnbaum Saunders Distribution for Bivariate Survival Data
- DOI
- 10.2991/jsta.d.210222.001How to use a DOI?
- Keywords
- Bayesian model comparison; Birnbaum–Saunders frailty; Exponential power distribution; Pareto distribution; Shared frailty; Weibull distribution
- Abstract
Frailty models are survival models, which are designed to explore the properties of the unobserved heterogeneity in individuals pertaining to disease and death. In this paper, we propose the new frailty model called Birnbaum–Saunders frailty model with Weibull, Pareto, and exponential power as baseline distributions. Estimation of the proposed model parameters was done using Bayesian estimation procedure under Markov chain Monte Carlo technique. A simulation study was also presented to compare the true values and the estimated parameters. We then applied these proposed models to kidney infection data. Models comparison was done using various information criteria and Bayes factor. The best model for the data related to infection occur due to insertion of catheter is suggested.
- Copyright
- © 2021 The Authors. Published by Atlantis Press B.V.
- Open Access
- This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).
1. INTRODUCTION
Many researchers have used Cox [1] proportional-hazards model to analyze the survival data by considering the study population as homogeneous indicating that all the individuals are considered as having the same risk for getting diseases or death. But, in many practical situations, it may not be possible to consider all the individuals as homogeneous since they are differ from each other in contracting disease due to genetic factor or environmental factor or some other factor, which are generally omitted during the analysis of the survival data. In these conditions, it is not possible to assume the dependency between the survival times. To investigate such data, it is important to explain dependency within the subject in the multiple event times. In such case, it is expected to present an additional random part “frailty” into the proportional hazard model. Here, the random family impact itself is of prime interest in the investigation. These models are becoming alluring and more attractive due to the introduction of extra unobserved variable and name as “frailty,” which is assumed to be shared by a pair of organs.
The frailty model is also known as a random effect model, which is designed to account for heterogeneity amongst the groups of individuals due to unobserved factors. Generally, it is omitted by many researchers. Clayton [2] first proposed the model for such data without using frailty. The term frailty was first used by Vaupel et al. [3]. Keyfitz and Littman [4] also indicate that omitting individual heterogeneity can cause inaccurate conclusions. Balakrishnan and Liu [5] discussed Birnbaum–Saunders (BS) as frailty distribution in semi-parametric frailty model. In addition, BSas frailty distribution is used to analyze medical and cancer data (Leao et al. [6–8]). Many distributions such as gamma, inverse Gaussian, positive stable distribution, power variance function, Weibull, compound Poisson are also engaged as frailty distribution for modeling heterogeneity in the populations.
Generally, the frailty model is developed as an unobserved random variable is adding multiplicatively on the baseline hazard function (BHF). Considering univariate frailty model, let a continuous random variable
The marginal survival function(SF) can be obtained by taking integration of Equation (1.3) with respect to frailty variable
In this manuscript, we consider right censored data with BS as the frailty distribution. Weibull, Pareto and exponential power are used as the baseline distributions to explore the salient features of the shared BS frailty models based on multiplicative hazard. Here, the dependence between the survival times is due to BS distributed common frailty variable
2. GENERAL SHARED FRAILTY MODEL
In the study, suppose there are n individuals having first and second failure times (
The conditional cumulative hazard function (CuHF) for the
The CSF for
Under the assumption of independence, the bivariate CSF for given frailty
The unconditional SF (in bivariate case) at time
Here onward we represent
3. SHARED BIRNBAUM SAUNDERS FRAILTY MODEL
The first frailty distribution we have considered here is BS distribution [9]. The pdf of the BS distribution is unimodal. The shape of the HF plays a significant role in the analysis of lifetime data. Leiva [10] also showed that BS distribution is right-skewed (asymmetrical), continuous, and unimodal. It is otherwise called the fatigue life distribution and has gotten extensive consideration because of its theoretical arguments, its alluring properties and its connection with the normal distribution. In this regard, Mann et al. [11] first conjectured that the HF is not an increasing function, but the average HF is nearly nondecreasing function. Later, Kundu et al. [12] and Bebbington et al. [13] showed that the HF of the BS distribution is a unimodal function. In many real-life situations, the HF may not be monotone, and it increases up to a point and then decreases.
A continuous random variable
The LT is derived as
For the identifiability issues, we assume V has mean one which presents restriction on the parameters
Substituting the LT in Equation (2.5), we get the unconditional bivariate survival function for the
4. BASELINE DISTRIBUTIONS
The first baseline distribution engaged here is Weibull distribution, which is one of the most commonly used distribution for modeling life-time data in survival analysis. A continuous random variable
The second baseline distribution engaged here is the Pareto distribution [14]. A continuous random variable
It is observed that
The third and last baseline distribution we have considered here is the exponential power distribution. A continuous random variable
It is observed that the HF is increasing, decreasing and
5. PROPOSED MODELS
Replacing the CuHF for the Weibull distribution, Pareto distribution, and exponential power distribution in Equation (3.3), we get the unconditional bivariate SFs at time
Here from now onwards we denote Equations (5.1−5.3) as Model I, Model II, and Model III, respectively. Model I is shared BS frailty model with Weibull baseline distribution, Model II is shared BS frailty model with Pareto baseline distribution, and Model III is shared BS frailty model with exponential power baseline distribution.
6. ESTIMATION STRATEGIES
Let us consider n individuals under the study having first and second failure times given by (
The likelihood function (LF) associated with the
Where
The LFs given in Equation (6.1) is very difficult to solve using the Newton–Raphson method. Maximum likelihood estimations (MLEs) may not be convergent since large number of parameters are involved in the models. Therefore, the Bayesian approach of estimation is engaged to estimate the parameters involved in the models, which does not suffer any such kind of troubles. Many authors have studied the Bayesian approach for the estimation of frailty model parameters. Some of the authors are Ibrahim et al. [15] and references therein, Santos and Achcar [16]. Santos and Achcar [16] have used log-normal as frailty distribution with Weibull and generalized gamma distribution as the baseline in parametric models. Ibrahim et al. [15] and references therein used gamma frailty in the Weibull model and the piecewise exponential model.
The joint posterior density function associated with the parameters for given failure times is obtained as
Metropolis–Hasting and Gibbs Sampler Algorithm is utilized for the estimation of the parameters in the models fitted with the above prior density function and likelihood equation. Geweke test and Gelman–Rubin Statistics given by Geweke [17] and Gelman and Rubin [18] to observe the Markov chain to be converged to a stationary distribution. To explore the property of the chain, to calculate the burn-in period and to choose autocorrelation lag, we utilized trace plots, coupling from the past plots and sample autocorrelation plots, respectively.
The adequacy of the models is checked by using predictive intervals, which is based on the sample generated from the posterior predictive density. For comparing different proposed models, we used various information criteria such as Bayesian Information Criteria (BIC), Akaike Information Criteria (AIC), and Deviance Information Criteria (DIC). Also, the Bayes factor
7. APPLICATION TO REAL-LIFE DATA
The models performance was examined by applying the proposed models to the data related to infection of kidney occurring due to catheter insertion [20]. First, Kolmogorov–Smirnov (KS) test is utilized to check the goodness of fit for the given data and it is seen that the p values for the first and second recurrences are adequately enormous to say that there is no motivation to reject the hypothesis that the first and second recurrence time to follow the distributions with SF as given in Equations (4.1), (4.4), and (4.7) in univariate case and this condition is assumed to be valid for bivariate case. The p values obtained from KS test are given in Table 1.
Recurrence Times |
||
---|---|---|
Baseline Distribution | First | Second |
Weibull | 0.6671 | 0.5003 |
Pareto | 0.9736 | 0.8452 |
Exponential power | 0.7012 | 0.4929 |
p values of Kolmogorov–Smirnov (KS) statistics for goodness of fit test for kidney infection data set.
The posterior results of the different proposed models are presented in Tables 2–4. It comprises the estimate (posterior mean), standard error (SE), 95% lower and upper credible limits (LCL and UCL), Gelman and Rubin (GR) statistics values with p values and Geweke (GK) test values. From Tables 2–4, it is observed that regression coefficients for all the models are more or less the same. Also for all these different proposed models, the value zero is not a credible value for all the credible intervals of the regression coefficients, so all the covariates seem to be significant. It is observed that the values of frailty parameter for all the models are different. In particular, the sex effect
Parameter | Estimate | SE | LCL | UCL | GK Values | p Values | GR Values |
---|---|---|---|---|---|---|---|
Burn in period = 7200; | Autocorrelation lag = 250 | ||||||
1.2271 | 0.0322 | 1.1701 | 1.2938 | 0.0070 | 0.5028 | 0.9999 | |
1.1558 | 0.0482 | 1.0510 | 1.2344 | 0.0115 | 0.5046 | 1.0017 | |
0.0178 | 0.0047 | 0.0093 | 0.0271 | 0.0097 | 0.5038 | 1.0002 | |
0.0150 | 0.0045 | 0.0074 | 0.0243 | 0.0019 | 0.5007 | 0.9999 | |
1.1217 | 0.0486 | 1.0409 | 1.2200 | 0.0094 | 0.5037 | 1.0011 | |
0.0082 | 0.0005 | 0.0072 | 0.0091 | 0.0002 | 0.5000 | 1.0001 | |
−1.8987 | 0.3475 | −2.5616 | −1.2270 | −0.0120 | 0.5000 | 1.0008 | |
0.0421 | 0.0047 | 0.0331 | 0.0510 | 0.0106 | 0.5042 | 1.0000 | |
0.0616 | 0.0050 | 0.0528 | 0.0711 | 0.0049 | 0.5019 | 1.0001 | |
0.0021 | 0.0005 | 0.0011 | 0.0030 | 0.0006 | 0.5002 | 1.0000 |
SE, standard error; LCL, lower credible limit; UCL, upper credible limit; GK, Geweke.
Posterior results for Model I using kidney infection data set.
Parameter | Estimate | SE | LCL | UCL | GK Values | p Values | GR Values |
---|---|---|---|---|---|---|---|
Burn in period = 4900; | Autocorrelation lag = 220 | ||||||
2.2022 | 0.2959 | 1.6299 | 2.8432 | −0.0023 | 0.4990 | 1.0005 | |
2.6089 | 0.4641 | 1.7035 | 3.4416 | 0.0045 | 0.5018 | 1.0085 | |
0.0159 | 0.0043 | 0.0102 | 0.0264 | 0.0069 | 0.5027 | 1.0000 | |
0.0089 | 0.0005 | 0.0080 | 0.0099 | 0.0021 | 0.5008 | 0.9999 | |
0.6175 | 0.0514 | 0.5427 | 0.7241 | 0.0234 | 0.5093 | 1.0014 | |
0.0020 | 0.0005 | 0.0010 | 0.0029 | 0.0048 | 0.5019 | 1.0006 | |
−0.8597 | 0.1796 | −1.1968 | −0.5025 | −0.0074 | 0.5019 | 1.0058 | |
0.0620 | 0.0051 | 0.0529 | 0.0712 | 0.0180 | 0.5072 | 1.0033 | |
0.0620 | 0.0051 | 0.0525 | 0.0712 | 0.0051 | 0.5020 | 1.0056 | |
0.0209 | 0.0052 | 0.0116 | 0.0303 | −0.0006 | 0.4997 | 1.0035 |
SE, standard error; LCL, lower credible limit; UCL, upper credible limit; GK, Geweke.
Posterior results for Model II using kidney infection data set.
Parameter | Estimate | SE | LCL | UCL | GK Values | p Values | GR Values |
---|---|---|---|---|---|---|---|
Burn in period = 5100; | Autocorrelation lag = 375 | ||||||
0.5117 | 0.0293 | 0.4515 | 0.5720 | −0.0024 | 0.4990 | 1.0014 | |
0.5952 | 0.0250 | 0.5517 | 0.6415 | −0.0042 | 0.4982 | 0.9999 | |
0.1623 | 0.0330 | 0.1073 | 0.2432 | 0.0024 | 0.5009 | 1.0005 | |
0.0891 | 0.0049 | 0.0807 | 0.0989 | 0.0038 | 0.5015 | 0.9999 | |
1.1938 | 0.0479 | 1.1116 | 1.2846 | 0.0054 | 0.5021 | 1.0020 | |
0.0015 | 0.0004 | 0.0006 | 0.0024 | −0.0043 | 0.4982 | 1.0009 | |
−1.6724 | 0.3614 | −2.3837 | -0.8913 | 0.0023 | 0.4982 | 1.0001 | |
0.0741 | 0.0048 | 0.0646 | 0.0828 | 0.0083 | 0.5033 | 1.0000 | |
0.0822 | 0.0049 | 0.0731 | 0.0906 | 0.0001 | 0.4999 | 1.0000 | |
0.0021 | 0.0005 | 0.0012 | 0.0030 | 0.0089 | 0.5035 | 1.0009 |
SE, standard error; LCL, lower credible limit; UCL, upper credible limit; GK, Geweke.
Posterior results for Model II using kidney infection data set.
The adequacy of the different proposed models is checked by constructing 95% and 50% predictive intervals of a generated random sample from predictive distribution as given in Gelfand and Ghosh [21] and Sahu et al. [22] and counted the number of times recurrence happen in the intervals for first and second kidney infections. Under the Model I, Model II, and Model III, 76, 65; 76, 62; and 71, 58, respectively out of 76 observations are contained in the 95% and 50% predictive intervals. This shows that the three models are adequate for the data.
The best model as per AIC, BIC, and DIC values is Model I, since the values of AIC, BIC, and DIC of Model I are smaller than Model II and Model III, which is given in Table 5. But the distinction between AIC, BIC, and DIC values for Model I, Model II, and Model III are very little, so AIC, BIC, and DIC values may not be commendable to make a choice between the proposed models. Here, we consider Bayes factor to compare model
Model No. | AIC | BIC | DIC | Log likelihood |
---|---|---|---|---|
Model I | 690.1043 | 706.4801 | 675.1946 | −335.0521 |
Model II | 702.0792 | 718.4551 | 685.4432 | −341.0396 |
Model III | 698.0372 | 714.413 | 685.1324 | −339.0186 |
BIC, Bayesian Information Criteria; AIC, Akaike Information Criteria; DIC, Deviance Information Criteria.
AIC, BIC, and DIC values for all the proposed models.
Numerator Model Against Denominator Model | Range | Evidence Against Model in Denominator | |
---|---|---|---|
11.1430 | Very Strong Positive | ||
14.1158 | Not significant | ||
2.9728 | Positive |
The values of Bayes factor and decision based on it.
8. CONCLUSION
In this paper we discussed the results for BS shared frailty models with Weibull, Pareto, and exponential power as baseline distributions. Our main objective is to investigate which model fits better with the given infectious disease data. We performed the analysis of the given data by using R. Since, the LFs do not converge and maximum likelihood fails to estimate the parameters, Bayesian approach was utilized to analyze the data.
The comparison between the three proposed models was done using AIC, BIC, and DIC values. The smallest AIC value was Model I(BS frailty model with Weibull baseline distribution). The same result holds for BIC and DIC value. But these differences are not much significant. To take the decision about Model I, Model II, and Model III, we used the Bayes factor. We observed that, Model I is the best. The values of frailty variances were also calculated, which are 1.2197, 0.3971 and 1.3515 for Model I, Model II and Model III respectively indicating that there is a strong evidence of high degree of heterogeneity in the population of patients. Bayes factor was employed to test the frailty parameter
By referring all the above analysis, we suggested that the new shared BS frailty model with Weibull distribution as baseline distribution is the best in the proposed models for modeling of kidney infection data. This also confirms that model with frailty is appropriate to model kidney infection dataset. Therefore, from the above results, it is evident that cluster effects (frailty variable) significantly incorporate with the recurrence time of kidney infection due to insertion of catheter. One important property that was observed from BS frailty model with Weibull baseline distribution is the estimate of frailty variance, which is larger than earlier frailty models given by McGrilchrist and Aisbett [20] on log-normal frailty, Hanagal and Dabade [23] on compound Poisson frailty, and Hanagal and Dabade [24] on gamma frailty models.
CONFLICTS OF INTEREST
The authors declare that they have no conflicts of interest.
AUTHORS' CONTRIBUTIONS
The author developed the models, performed the analytic calculations, numerical simulations, wrote the manuscript and discussed the results.
ACKNOWLEDGEMENTS
The author would like to thank the referee for their valuable suggestions and comments which improved the earlier version of the manuscript.
REFERENCES
Cite this article
TY - JOUR AU - Lal Pawimawha PY - 2021 DA - 2021/03/12 TI - Modeling Frailty Using Birnbaum Saunders Distribution for Bivariate Survival Data JO - Journal of Statistical Theory and Applications SP - 355 EP - 363 VL - 20 IS - 2 SN - 2214-1766 UR - https://doi.org/10.2991/jsta.d.210222.001 DO - 10.2991/jsta.d.210222.001 ID - Pawimawha2021 ER -