inlamemi

Fitting measurement error models and missing data imputation models in INLA is not trivial, and requires several workarounds in order to fit the model. At the same time, a Bayesian hierarchical framework is very attractive for modeling these types of data, since the hierarchical model structure allows us to describe how the data were collected, and any errors they may have, through the model specification. By supplying informative priors, we may be able to adjust for biases due to these errors, and propagate any uncertainty they cause. Until recently, it has been complicated to implement these kinds of models in R-INLA. This package provides a helpful interface that makes measurement error and missing data modelling in R-INLA much more feasible.

Installation

You can install the development version of inlamemi from GitHub with:

# install.packages("devtools")
devtools::install_github("emmaSkarstein/inlamemi")

When should I use this package?

This package is designed for fitting models where you have one covariate that has classical measurement error, Berkson measurement error, missing observations, or any combination of these three. That could mean that you only have missing data, and if so this package can do missing data imputation.

The model itself must be of the class of models that is possible to fit with R-INLA. That means that it can be used for most common regression types, like linear regression and logistic regression, and you can include as many error free covariates as needed. You can also include random effects, the same way as you would normally include such effects in R-INLA. Interaction effects with the error prone variable are also dealt with by the package. It is even possible to fit models for cases where you have multiple variables with error or missingness, though this is something one should be careful with, as these models rely very heavily on the priors you give them.

Overview of examples

Examples of how to use the package can be found in the vignettes.

Vignette name Likelihood for MOI Error type Other features
Influence of systolic blood pressure on coronary heart disease Binomial Classical Repeated measurements of error prone variable, interaction effect with error variable
Survival data with repeated systolic blood pressure measurements Weibull survival Classical, missing Repeated measurements of error prone variable
Simulated examples (multiple examples) Gaussian, Binomial, Poisson Berkson, classical, missing Random effect(s) in the model of interest, interaction effects
Multiple variables with measurement error and missingness Gaussian Classical Multiple mismeasured covariates
How are the models structured? A deep dive into how the data is structured in order to correctly fit the model.
How to not use inlamemi An illustration of how the models can be fit without using inlamemi, in case you would like to extend the model in a way beyond what inlamemi will allow. This vignette can also be useful if you are familiar with R-INLA and want to understand how the models are structured.

Quick guide: How can I use this package?

The dataset simple_data is included in the package, and is a very simple simulated data set, used to illustrate the package. In this data set, we have a response variable \(y\), an error free covariate \(z\), and a covariate that is observed with classical error, Berkson error and missingness, called \(x\). We wish to fit the model

\[ y = \beta_0 + \beta_x x + \beta_z z + \varepsilon \ , \]

but adjusting for all the errors in \(x\).

First load the package:

library(inlamemi)

Next, we need to specify the formula for the main model and the formula for the imputation model. This is done in the standard way in R:

main_formula <- y ~ x + z

For the imputation model, we take advantage of any correlation between \(x\) and the error free covariate \(z\), so the imputation model will be

\[ x = \alpha_0 + \alpha_z z + \varepsilon_x \ . \]

We write that as

imputation_formula <- x ~ z

When adjusting for measurement error, we are completely dependent on having some information about the measurement error variances \(\sigma_{u_c}^2\) (for the classical error) and \(\sigma_{u_b}^2\) (for the Berkson error), since the size of this variance will affect how the estimates are biased. We can gain information about these variances in a few different ways, if repeated measurements have been made then these can be put directly into the model to estimate the error variance, or if we have some expert knowledge about the error size, then that can be used to specify an informative prior for the variance (or precision, since in INLA the precision is used, rather than the variance).

In this case, since we have in fact simulated the data ourselves, we know that the error variances are in both cases close to 1, so we specify priors that have modes at 1.

In the fit_inlamemi function we also need to specify the likelihood for the model of interest, which in this case is Gaussian.

simple_model <- fit_inlamemi(data = simple_data, 
                           formula_moi = main_formula, 
                           formula_imp = imputation_formula, 
                           family_moi = "gaussian",
                           error_type = c("berkson", "classical", "missing"),
                           prior.prec.moi = c(10, 9),
                           prior.prec.berkson = c(10, 9), 
                           prior.prec.classical = c(10, 9),
                           prior.prec.imp = c(10, 9),
                           prior.beta.error = c(0, 1/1000),
                           initial.prec.moi = 1,
                           initial.prec.berkson = 1,
                           initial.prec.classical = 1,
                           initial.prec.imp = 1)

Once we have fit the model, we can view the summary:

summary(simple_model)
#> Formula for model of interest: 
#> y ~ x + z
#> 
#> Formula for imputation model: 
#> x ~ z
#> 
#> Error types: 
#> [1] "berkson"   "classical" "missing"  
#> 
#> Fixed effects for model of interest: 
#>            mean        sd 0.025quant 0.5quant 0.975quant     mode
#> beta.0 1.032389 0.2177000  0.6108145 1.032628   1.437021 1.026118
#> beta.z 1.916272 0.3860412  1.2169144 1.906198   2.559336 1.911731
#> 
#> Coefficient for variable with measurement error and/or missingness: 
#>            mean       sd 0.025quant 0.5quant 0.975quant     mode
#> beta.x 1.973025 0.198868    1.57613 1.974856   2.359133 1.982591
#> 
#> Fixed effects for imputation model: 
#>               mean         sd 0.025quant 0.5quant 0.975quant     mode
#> alpha.x.0 1.033067 0.05061128  0.9337774 1.033074   1.132312 1.033075
#> alpha.x.z 2.024737 0.05227363  1.9222578 2.024721   2.127313 2.024721
#> 
#> Model hyperparameters (apart from beta.x): 
#>                                      mean        sd 0.025quant  0.5quant
#> Precision for model of interest 1.1324156 0.3673197  0.5626732 1.0819324
#> Precision for x berkson model   1.1278492 0.3391049  0.5991666 1.0820368
#> Precision for x classical model 0.9251226 0.1076384  0.7332180 0.9182943
#> Precision for x imp model       0.9774281 0.1239492  0.7552130 0.9699965
#>                                 0.975quant      mode
#> Precision for model of interest   1.992291 0.9892229
#> Precision for x berkson model     1.921300 0.9969446
#> Precision for x classical model   1.156456 0.9035748
#> Precision for x imp model         1.242409 0.9559363

And we can use the default plot function to see a plot of the fixed effects and estimated coefficient for the variable with error:

plot(simple_model)