An example of how to estimate the dynamics of a latent factor. Assume that indicators are observed at two time points and you want to estimate how the period-2 value depends on the period-1 value. You need two things:
f₂ = α + β·f₁ + ε.factorana supports this with a two-stage workflow, and two helper functions make the setup a few lines:
define_dynamic_measurement() builds
the Stage 1 measurement model: a 2-factor independent system (or
k-factor, for k periods) with loadings and residual sigmas tied across
periods via equality constraints, but measurement intercepts left
period-specific.build_dynamic_previous_stage()
converts the Stage 1 estimation result into a
previous_stage object for Stage 2, carrying the anchor
period’s intercepts into every factor slot. This anchors the measurement
level under E[f₁] = 0 and lets the observed mean shift
between periods identify the structural intercept α.This vignette walks through a simulated example where α,
β, and σ²_ε are known, shows the recovery, and
explains one subtlety of the workflow (why the anchor-period intercepts
are the right ones to carry into Stage 2).
One latent construct, three linear indicators per period, two periods. The DGP:
\[ f_1 \sim N(0,\sigma^2_{f_1}), \qquad f_2 = \alpha + \beta\,f_1 + \varepsilon, \quad \varepsilon \sim N(0,\sigma^2_\varepsilon) \]
\[ Y_{t,m} = \tau_m + \lambda_m\,f_t + u_{t,m}, \quad u_{t,m} \sim N(0,\sigma_m^2) \]
with measurement parameters τ_m, λ_m,
σ_m shared across the two periods.
library(factorana)
set.seed(41)
n <- 1500
# Structural parameters (what we want to recover)
true_var_f1 <- 1.0
true_alpha <- 0.4
true_beta <- 0.6
true_sigma_e2 <- 0.5
# Shared measurement parameters
item_int <- c(1.5, 1.0, 0.8)
item_load <- c(1.0, 0.9, 1.1) # first loading fixed to 1 in the model
item_sigma <- c(0.7, 0.75, 0.65)
f1 <- rnorm(n, 0, sqrt(true_var_f1))
eps <- rnorm(n, 0, sqrt(true_sigma_e2))
f2 <- true_alpha + true_beta * f1 + eps
gen_Y <- function(f, i) {
item_int[i] + item_load[i] * f + rnorm(length(f), 0, item_sigma[i])
}
dat <- data.frame(
intercept = 1,
eval = 1L,
Y_t1_m1 = gen_Y(f1, 1), Y_t1_m2 = gen_Y(f1, 2), Y_t1_m3 = gen_Y(f1, 3),
Y_t2_m1 = gen_Y(f2, 1), Y_t2_m2 = gen_Y(f2, 2), Y_t2_m3 = gen_Y(f2, 3)
)The wide-format data has columns named
<prefix><item>: Y_t1_m1,
Y_t1_m2, Y_t1_m3 for wave 1, and
Y_t2_m1, … for wave 2.
define_dynamic_measurement()One function call builds the measurement model: a 2-factor independent system (one factor per period) with loadings and sigmas tied across periods, intercepts left free per period.
dyn <- define_dynamic_measurement(
data = dat,
items = c("m1", "m2", "m3"),
period_prefixes = c("Y_t1_", "Y_t2_"),
model_type = "linear",
evaluation_indicator = "eval"
)
# The wrapper generates 5 equality constraints: 2 for loadings (items
# m2, m3; item m1's loading is fixed to 1 on its factor slot) and 3
# for sigmas.
length(dyn$equality_constraints)
#> [1] 5Estimate Stage 1 with estimate_model_rcpp() exactly as
for any model system:
ctrl <- define_estimation_control(n_quad_points = 8, num_cores = 1)
result_stage1 <- estimate_model_rcpp(
model_system = dyn$model_system,
data = dat,
control = ctrl,
optimizer = "nlminb",
parallel = FALSE,
verbose = FALSE
)
result_stage1$convergence
#> [1] 0Inspect the intercepts: the wave-1 intercepts should match the DGP
τ_m (because E[f₁] = 0 by convention). The
wave-2 intercepts are shifted by λ_m · E[f₂], which is the
mean-drift artefact we are about to discard.
est <- result_stage1$estimates
tab <- data.frame(
m = 1:3,
DGP_tau = item_int,
wave_1 = round(c(est["Y_t1_m1_intercept"],
est["Y_t1_m2_intercept"],
est["Y_t1_m3_intercept"]), 3),
wave_2 = round(c(est["Y_t2_m1_intercept"],
est["Y_t2_m2_intercept"],
est["Y_t2_m3_intercept"]), 3)
)
knitr::kable(tab, row.names = FALSE)| m | DGP_tau | wave_1 | wave_2 |
|---|---|---|---|
| 1 | 1.5 | 1.488 | 1.905 |
| 2 | 1.0 | 1.000 | 1.332 |
| 3 | 0.8 | 0.778 | 1.234 |
A natural alternative is to stack period-1 and period-2 rows into one
long-format dataset and fit a single-factor CFA. The resulting
intercepts would be the pooled means of Y across
both periods, biased by λ_m · E[f₂]/2 (because the pooled
mean averages the period-1 and period-2 factor means). Plugging those
into Stage 2 under-estimates α by roughly
E[f₂]/2.
The wrapper avoids that: Stage 1 estimates period-specific
intercepts, then build_dynamic_previous_stage() carries
only the anchor period’s intercepts into Stage 2.
build_dynamic_previous_stage() +
SE_lineardummy <- build_dynamic_previous_stage(
dyn = dyn,
stage1_result = result_stage1,
data = dat,
anchor_period = 1L
)
fm_stage2 <- define_factor_model(
n_factors = 2,
n_types = 1,
factor_structure = "SE_linear"
)
ms_stage2 <- define_model_system(
components = list(), # measurement components prepended from previous_stage
factor = fm_stage2,
previous_stage = dummy
)
#> Two-stage SE estimation: Stage 1 (independent) -> Stage 2 (SE_linear)
#> Measurement parameters (loadings, thresholds) will be fixed from Stage 1
#> Factor distribution parameters will use Stage 2 structure
#> Fixing 16 measurement parameters, ignoring 2 factor distribution parameters
init_s2 <- initialize_parameters(ms_stage2, dat, verbose = FALSE)
init_s2$init_params["factor_var_1"] <- unname(dummy$estimates["factor_var_1"])
init_s2$init_params["se_intercept"] <- 0.0
init_s2$init_params["se_linear_1"] <- 0.5
init_s2$init_params["se_residual_var"] <- 0.5
result_stage2 <- estimate_model_rcpp(
model_system = ms_stage2,
data = dat,
init_params = init_s2$init_params,
control = ctrl,
optimizer = "nlminb",
parallel = FALSE,
verbose = FALSE
)
result_stage2$convergence
#> [1] 0est <- result_stage2$estimates
se <- result_stage2$std_errors
ps <- c("factor_var_1", "se_intercept", "se_linear_1", "se_residual_var")
tab <- data.frame(
param = ps,
true = c(true_var_f1, true_alpha, true_beta, true_sigma_e2),
est = round(unname(est[ps]), 4),
se = round(unname(se[ps]), 4)
)
tab$z <- round((tab$est - tab$true) / tab$se, 2)
knitr::kable(tab, row.names = FALSE)| param | true | est | se | z |
|---|---|---|---|---|
| factor_var_1 | 1.0 | 0.9104 | 0.0325 | -2.76 |
| se_intercept | 0.4 | 0.4036 | 0.0217 | 0.17 |
| se_linear_1 | 0.6 | 0.5880 | 0.0236 | -0.51 |
| se_residual_var | 0.5 | 0.5051 | 0.0261 | 0.20 |
se_intercept (the structural α) recovers essentially
exactly, which is the whole point of the workflow. The slope β, residual
variance σ²_ε, and input factor variance σ²_f₁ also recover within their
standard errors.
The standard errors above are model-based (inverse observed
information). When individuals are grouped (for example pupils within
schools, or siblings within households) and that grouping induces
dependence the model does not capture, vcov_factorana()
provides cluster-robust standard errors. Pass cluster as
the name of a grouping column in the data (or a vector of length
nrow(dat)).
Here we attach a synthetic school id (50 schools) and cluster on it:
set.seed(7)
dat$school <- sample(1:50, n, replace = TRUE)
se_cluster <- robust_se(result_stage2, dat, type = "cluster", cluster = "school")
ps <- c("factor_var_1", "se_intercept", "se_linear_1", "se_residual_var")
data.frame(
param = ps,
se_model = round(unname(result_stage2$std_errors[ps]), 4),
se_cluster = round(unname(se_cluster[ps]), 4),
row.names = NULL
)
#> param se_model se_cluster
#> 1 factor_var_1 0.0325 0.0328
#> 2 se_intercept 0.0217 0.0212
#> 3 se_linear_1 0.0236 0.0256
#> 4 se_residual_var 0.0261 0.0302Because the schools were assigned at random here, the cluster-robust and model-based standard errors are close; with genuine within-cluster dependence the cluster-robust values would typically be larger.
Two-stage caveat. These standard errors are computed on the Stage 2 fit and treat the Stage 1 measurement parameters (loadings, sigmas, intercepts) as known. They therefore understate the total uncertainty (the generated-regressor problem). For honest two-stage inference, resample clusters and re-run both stages in a bootstrap; a single-stage robust or cluster-robust covariance is exact only when the measurement system is not itself estimated.
A cluster bootstrap that re-runs both stages on each
resample accounts for the first-stage uncertainty.
bootstrap_factorana_multistage() drives this in one call:
it resamples clusters, and for each replicate fits Stage 1 and then
Stage 2 chained on that replicate’s own Stage 1 fit. Each
(stage, sample) fit is saved to disk and skipped if it
already exists, so the run is restartable and the per-sample work can be
scattered across a compute cluster with
bootstrap_fit_sample().
stage_builders <- list(
# Stage 1: the measurement system (prev_fits is empty for the first stage)
function(prev_fits, data) dyn$model_system,
# Stage 2: SE_linear, chaining on THIS replicate's Stage 1 fit
function(prev_fits, data) {
prev <- build_dynamic_previous_stage(
dyn = dyn, stage1_result = prev_fits[[1]], data = data, anchor_period = 1L)
fm2 <- define_factor_model(n_factors = 2, factor_structure = "SE_linear")
define_model_system(components = list(), factor = fm2, previous_stage = prev)
}
)
boot <- bootstrap_factorana_multistage(
stage_builders, dat, R = 500, cluster = "person",
dir = "boot_run", control = ctrl,
optimizer = "nlminb", parallel = FALSE, verbose = FALSE
)
boot$final$boot_se # Stage 2 bootstrap standard errors (include Stage 1 uncertainty)
boot$final$ci # percentile intervalsFor a multi-node run, generate the samples once with
generate_bootstrap_samples() and call
bootstrap_fit_sample() from an array job (one task per
sample, per stage), then summarise with
collect_bootstrap().
period_prefixes
can have length 3, 4, or more. The wrapper builds a k-factor independent
Stage 1 with all k · (n_items - 1) loadings and
k · n_items sigmas tied across periods. For Stage 2 SE
modelling, pick any two factors (e.g., periods t-1 and
t) and build a 2-factor SE model with
previous_stage.model_type = "oprobit" and n_categories = K.
The wrapper ties the K-1 cutpoints of each item across
periods (in place of sigma).covariates = c("intercept", "x1", ...). Covariate
coefficients on items are estimated per-period in Stage 1 (not tied by
this wrapper); the anchor-period values are carried into Stage 2.?define_factor_model:
factor_structure = "SE_quadratic" adds a quadratic term
γ·f_1² to the structural equation (trap/threshold
dynamics).?define_model_system: the underlying
equality_constraints argument, used by the wrapper.vignette("measurement_system", package = "factorana"):
basics of measurement-system estimation.vignette("roy_model", package = "factorana"): a
different structural setting (discrete sector choice plus continuous
outcomes sharing a latent ability).