The package FrailtyCompRisk is originally designed for statisticians and epidemiologists in order to analyze time-to-event data where individuals are nested within centers (e.g., hospitals or clinics), and where multiple causes of failure may occur.
The package is build upon the random-effects model for the subdistribution hazard presented in “Analysing multicentre competing risks data with a mixed proportional hazards model for the subdistribution” [Katsahian S, Resche-Rigon M, Chevret S, Porcher R.].
Consider \(N\) observations of a time-to-event variable \(T\), a cause of event \(\epsilon\), and a covariate \(Z\): \((T_i, \epsilon_i, Z_i)_{i=1,\dots,N}\). Let \(\epsilon = 1\) denote the cause of failure of interest.
The cumulative incidence function (CIF) for cause 1 is defined as: \[ F_1(t)=P(T \le t,\epsilon = 1) \] Fine and Gray proposed a proportional hazards model for the subdistribution hazard associated with \(F_1(t)\), defined as: \[ \alpha_1(t) = \frac{\frac{dF_1(t)}{dt}}{1 - F_1(t)} \] The subdistribution hazard for individual \(i\) is modeled as:
\[ \alpha_{1}(t | Z_i, u) = \alpha_{1,0}(t) e^{(Z_i^\top \beta)} \] where:
\(\alpha_{1,0}(t)\) is an unspecified baseline subdistribution hazard,
\(\beta\) is the vector of covariate effects.
This model assumes that all individuals are independent. To account for clustering (e.g., patients within centers), a random effect is added:
Let \(k(i) \in {1, \dots, K}\) be the cluster (center) to which subject \(i\) belongs. Then the subdistribution hazard becomes:
\[ \alpha_{1,k}(t | Z_i, u) = \alpha_{1,0}(t) e^{(Z_i^\top \beta \ + \ u_{k(i)})} \] where \(u_k \sim \mathcal{N}(0, \theta)\) are independent Gaussian random effects for each cluster \(k\), with variance \(\theta\).
The package uses a restricted maximum likelihood (REML) or maximum likelihood (ML) approach to estimate the parameters \(\beta\), \(\theta\), and \(u_k\). The estimation relies on a Newton-Raphson optimization routine applied to the (restricted) log-likelihood.
The package provides several main functions, \(\\\) Reml_CompRisk_frailty()
Estimates \(\beta\), \(\theta\), and \(u_k\) under the subdistribution hazard
model with random effects. Also returns the \(p\)-value for testing \(H_0: \theta = 0\).
Reml_CompRisk_frailty() can take 5 arguments:
data (mandatory), a data frame with the following
columns:
times : Observed times to event or
censoring.
status : Event type: 0 = censored, 1 = cause of
interest, >1 = other competing risks.
clusters : Cluster membership (e.g., centers or
subjects).
covariates : (optional) Columns from the 4th onward
are used as covariates.
cluster_censoring (optional): Logical. If TRUE,
accounts for center-specific censoring using a frailty model on
censoring times. Default is FALSE.
max_iter (optional): Maximum number of iterations
for the Newton-Raphson algorithm (default = 300).
tol (optional): Convergence tolerance for the
likelihood and frailty variance (default = 1e-6).
threshold (optional): Lower bound for the frailty
variance parameter \(\theta\). If the
estimated value falls below this threshold, frailty is considered
negligible (default = 1e-5).
\(\\\)
Ml_CompRisk() Estimates \(\beta\) for the subdistribution hazard
model without random effects.
Ml_CompRisk() can take 3 arguments:
data (mandatory), a data frame with the following
columns:
times : Observed times to event or
censoring.
status : Event type: 0 = censored, 1 = cause of
interest, >1 = other competing risks.
covariates : (optional) Columns from the 3th onward
are used as covariates.
max_iter (optional): Maximum number of iterations
for the Newton-Raphson algorithm (default = 100).
tol (optional): Convergence tolerance for the
likelihood and frailty variance (default = 1e-6).
\(\\\)
Reml_Cox_frailty() Estimates \(\beta\), \(\theta\), and \(u_k\) under a Cox model with random
effects, and provides a \(p_{value}\)
for testing \(\theta = 0\).
Reml_Cox_frailty() can take 4 arguments:
data (mandatory), a data frame with the following
columns:
times : Observed times to event or
censoring.
status : Event type: 0 = censored, 1 = cause of
interest
clusters : Cluster membership (e.g., centers or
subjects).
covariates : (optional) Columns from the 4th onward
are used as covariates.
max_iter (optional): Maximum number of iterations
for the Newton-Raphson algorithm (default = 300).
tol (optional): Convergence tolerance for the
likelihood and frailty variance (default = 1e-6).
threshold (optional): Lower bound for the frailty
variance parameter \(\theta\). If the
estimated value falls below this threshold, frailty is considered
negligible (default = 1e-5).
\(\\\)
Ml_Cox() Estimates \(\beta\) under a standard Cox proportional
hazards model.
Ml_Cox() can take 3 arguments:
data (mandatory), a data frame with the following
columns:
times : Observed times to event or
censoring.
status : Event type: 0 = censored, 1 = cause of
interest
covariates : (optional) Columns from the 3th onward
are used as covariates.
max_iter (optional): Maximum number of iterations
for the Newton-Raphson algorithm (default = 100).
tol (optional): Convergence tolerance for the
likelihood and frailty variance (default = 1e-6).
Parameters_estimation() Encapsulates the four previous
methods into one unified function.
Parameters_estimation() can take 6 arguments:
data (mandatory), a data frame with the following
columns:
times : Observed times to event or
censoring.
status : Event type: 0 = censored, 1 = cause of
interest, >1 = other competing risks.
clusters : Cluster membership (e.g., centers or
subjects).
covariates : (optional) Columns from the 4th onward
are used as covariates.
method (optional):
cluster_censoring (optional): Logical. If TRUE,
accounts for center-specific censoring using a frailty model on
censoring times. Default is FALSE.
max_iter (optional): Maximum number of iterations
for the Newton-Raphson algorithm (default = 100 or 300, depends on the
method chosen).
tol (optional): Convergence tolerance for the
likelihood and frailty variance (default = 1e-6).
threshold (optional): Lower bound for the frailty
variance parameter \(\theta\). If the
estimated value falls below this threshold, frailty is considered
negligible (default = 1e-5).
\(\\\)
simulate_data() Simulates clustered time-to-event data
with competing risks.
simulate_data() can take 8 arguments:
G : a vector of group or cluster identifiers (length
N, where N is the size sample). Each value indicates which cluster the
individual belongs to.
Z : a matrix of covariates (dimensions N x p), where
p is the size of the vector describing an individual). Can be set to
Matrix(0,0,N) if no covariates are used.
prop : proportion of individuals susceptible to
cause 1 (default: 0.6).
beta : a numeric vector of regression coefficients,
one per covariate (p).
theta : variance of the shared frailty term for
event times (cause 1).
cens : logical, indicating whether censoring should
be simulated (default: FALSE).
pcens : target censoring proportion (default:
0.25).
tau : variance of the shared frailty term for
censoring times (default: 0). \(\\\)
check_data_format() Verifies input data structure and
formatting.
check_data_format() can only take 1 argument, a data
frame. \(\\\)
set.seed(123)
n_cov = 2
n_per_cluster = 15
n_cluster = 20
n = n_cluster * n_per_cluster
G = rep(1:n_cluster, each = n_per_cluster)
Z = matrix(rnorm(n*n_cov,0,1),ncol = n_cov)
df = simulate_data(G,Z,prop = 0.6,beta = c(1,1.2),theta = 0.6,cens = TRUE)
#Estimate using REML with frailty
res <- Reml_CompRisk_frailty(df)
#Print estimated coefficients
res$beta
#> V1 V2
#> 0.8818009 1.0918708
res$theta
#> [1] 0.6087043The algorithm involves computing weighted matrices and solving penalized systems in each iteration. For \(N\) individuals, \(K\) clusters and \(p\) covariables describing each individual:
Sparse matrix multiplication and integration steps: ~ \(\mathcal{O}(N^2)\)
Newton-Raphson updates (inversion of Fisher information matrix): ~ \(\mathcal{O}((p + K)^3)\) in the worst case
In total, the complexity of the algorithm is ~\(\mathcal{O}(N^2 + Np + NK + (p + K)^3)\) in the worst case.
However, the implementation uses sparse matrices (Matrix
package) to significantly reduce practical computational cost.
The Matrix package
Katsahian S, Resche-Rigon M, Chevret S, Porcher R. (2006). Analysing multicentre competing risks data with a mixed proportional hazards model for the subdistribution. Statistics in Medicine, 25(26), 4267–4278.