v0.1.1 changes the default GAM fitting backend from
mgcv::gam to mgcv::bam (Feature 4
below). bam uses fREML estimation instead of REML, which
differs by ~1-3% in effective degrees of freedom on identical data.
Existing v0.1.0 users will see slightly different EDFs,
asymmetry indices, and per-cell colour fills when they upgrade.
This is the single non-byte-identical change in v0.1.1 — every other
v0.1.1 feature is strictly additive.
Recovery. Set engine = "gam" on
janusplot() or janusplot_data() to reproduce
v0.1.0 numerical output verbatim. The package’s vdiffr visual-regression
suite pins engine = "gam" for exactly this purpose — every
old snapshot remains valid under the backward-compat escape.
bam (with
gam escape)engine = c("bam", "gam") argument,
default "bam". At janusplot’s scale (k = 15-25 vars, 600+
fits per call) mgcv::bam’s block-Lanczos solve + fREML
estimation delivers ~3-10x wall-time speedup vs mgcv::gam
without any new dependency (mgcv already exports
both).bam defaults to fREML
(fast REML); gam defaults to REML. The two methods optimise
the same penalty target via different paths — EDFs differ by ~1-3% on
identical data. v0.1.1 surfaces this as the one numerical break with
v0.1.0, documented prominently above.method argument default is now
NULL, resolved per-engine: "fREML"
for bam, "REML" for gam. Users who passed
method explicitly in v0.1.0 see no behaviour change.discrete = FALSE (bam-only) — opt-in
to mgcv’s covariate-discretisation optimisation. Further ~2-5x speedup
at sub-pixel prediction shift cost.nthreads = 1L (bam-only) — intra-fit
threading. Default 1 to avoid oversubscription when combined with
parallel = TRUE (which fans out across pair-fits
already).engine + method
provenance — both columns now appear in
janusplot(..., with_data = TRUE)$data and on every
janusplot_data()$pairs[[i]] entry. Useful for paper figures
whose methodology section needs to document which backend produced the
EDFs they report.bam inherits from gam —
class(bam_fit) is
c("bam", "gam", "glm", "lm"), so
mgcv::k.check(), predict.gam(), derivative
LP-matrix arithmetic, and every shape-metric extractor work without
modification. Engine is plumbing, not redesign.engine = "gam"
so every legacy visual snapshot continues to validate under the v0.1.0
backward-compat escape. Acts as a regression gate on the
engine = "gam" recovery path.axes rendering knob with four modes:
"original" (default), "standardised",
"centred", "rank". Applied per variable from
raw data; same transform propagates consistently to (a) raw scatter, (b)
spline prediction grid,
"mpg (z)", "mpg (centred)",
"rank(mpg)").mgcv fits are byte-identical across modes — verified in
tests/testthat/test-axes.R via direct column comparison of
with_data output across all four modes on a non-linear
DGP.(k, mode) combination across
k ∈ {2, 5, 10, 15, 20, 26} and
mode ∈ {original, standardised, centred, rank} — 24 panels
— and asserts none emit warnings. Border-label suffixing is
independently verified at k ∈ {3, 12, 20} by walking
patchwork’s plot list and matching against the expected suffix
pattern.n_var >= 25) the cells render only colour fill +
shape-class glyph (no curve, no scatter), so axes is a
documented no-op there. The border labels still pick up the mode suffix
so the user can tell which transform they’re looking at.save_as = NULL — new file-output knob.
When set to a path with extension, writes the assembled matrix to disk
via [ggplot2::ggsave()]; the device is inferred from the extension.
Supported: .png, .pdf, .svg,
.jpg / .jpeg, .tif /
.tiff, .eps, .ps,
.bmp. The function still returns the ggplot — the file is a
side-effect.save_width / save_height /
save_dpi — overrides for the auto-resolved square
(pmax(6, 0.65 * k_n) inches, 300 dpi for raster). Useful
when targeting a fixed-size R Journal figure.compact
argument with values "auto" (default),
"always", and "never". Under
"auto", each matrix renders at a tier resolved from the
pixel-budget ladder:
n_var < 12) — v0.1.0 behaviour: spline + CI
+ scatter
12 <= n_var < 18) — drop raw scatter;
keep spline, CI, colour fill, and at most one annotation
(k_warn if it was requested).18 <= n_var < 25) — spline + colour fill
only.n_var >= 25) — colour mini-tile +
shape-class glyph at the cell centre; no spline.compact_threshold = 12L sets where Tier 1 begins; the full
ladder shifts with it (t2 = t1 + 6,
t3 = t1 + 13). Power users can override the ladder directly
via
compact_levels = list(t1 = ..., t2 = ..., t3 = ...).compact = "never" forces Tier 0 on any matrix, reproducing
v0.1.0 output regardless of n_var.
compact = "always" forces at least Tier 1 even at small
n_var (useful for very dense fixed-size renders).focus_by
argument accepts NA (default — no filter),
"asymmetry", "edf",
"non_linearity" (defined as edf - 1), or
"k_flag". Cells whose chosen metric falls below
focus_threshold (default "q90", a quantile
string, or a numeric cutoff) are rendered in grey85 at
alpha focus_dim_alpha (default 0.25). The
matrix shape is preserved; attention drains visually to high-metric
cells. This is a visual filter, not a statistical one —
the underlying fits and the with_data table are
unchanged.with_data
summary table are byte-identical across compact settings —
a regression-tested invariant.mgcv::k.check():
k_prime, k_index, k_p,
k_flag (Wood’s trifecta:
edf/k' > 0.9 & k-index < 1 & p < 0.05),
and k_check_status ("ok",
"flagged", "unreliable"). Cells with
n_unique(x) < 10 are marked "unreliable"
rather than flagged — k.check()’s simulation p-value is
meaningless at very low unique-x.auto_refit_k = FALSE (default) and
k_max_iter = 2L arguments on janusplot() and
janusplot_data(). With auto_refit_k = TRUE,
every flagged cell is refit with a doubling-k loop until either the flag
clears, the per-cell unique-x cap is reached, or k_max_iter
iterations have passed. Refit trajectory persisted on each cell:
k_initial, k_final, k_iterations,
k_at_cap.k_check_thresholds = list(edf_ratio = 0.9, k_index = 1.0, p = 0.05)
argument. Defaults track mgcv::gam.check() and Wood (2017)
§5.9.cli::cli_inform() summary fires at the end of
janusplot() / janusplot_data() whenever at
least one cell is flagged: total flagged, chance-expected count under α
= 0.05, and a recovery hint pointing at either
auto_refit_k = TRUE or k_max_iter. Quiet on
clean datasets."k_warn" entry in
the annotations vocabulary on janusplot().
When included, flagged cells render a red ! (ASCII) or
⚠ (Unicode) in the top-left corner. Off by default — opt in
with annotations = c("edf", "A", "k_warn").janusplot(..., with_data = TRUE)$data gains 9 new columns:
k_prime, k_index, k_p,
k_flag, k_check_status,
k_initial, k_final, k_iterations,
k_at_cap.mgcv::k.check() runs
its own simulation (default n.rep = 400) for the
basis-deficiency p-value. The diagnostic’s RNG draw is now isolated via
an internal seed-preservation helper, so it does not shift downstream
Monte Carlo consumers — derivative_ci = "simultaneous"
bands and future.seed = TRUE reproducibility are
unaffected.Initial public release of janusplot. The package renders
a pairwise, asymmetric smoothed-association matrix of continuous
variables, with each cell showing the fitted spline from an
mgcv generalised additive model. Upper-triangle cells plot
gam(x_j ~ s(x_i)); lower-triangle cells plot
gam(x_i ~ s(x_j)). The intentional asymmetry surfaces
heteroscedasticity, leverage, and directional non-linearity that a
single scalar correlation hides.
Public surface includes:
janusplot() — the matrix-plot entry point, with
per-cell fit / CI / raw-data controls, hclust-ordered variables,
random-effects adjustment via s(g, bs = "re"), and opt-in
parallel fits via future.apply.janusplot_data() — sibling returning the fitted-model
metadata (EDF, F-test p, asymmetry index, shape classification) without
rendering.janusplot_shape_metrics() +
janusplot_shape_cutoffs() +
janusplot_shape_hierarchy() — the 24-category shape
taxonomy.janusplot_shape_sensitivity() + helpers — recovery-rate
simulation over 12 ground-truth shapes × 7 sample sizes × 500
replicates, with the precomputed shape_sensitivity_demo
dataset for quick inspection.The remainder of this file documents pre-release development history, retained for provenance.
User-visible changes:
data.table dependency removed.
janusplot(..., with_data = TRUE)$data and any other tabular
returns are now always a plain data.frame — no runtime or
documented fallback to data.table::as.data.table(). The
package charter bans data.table as a dependency (plotting
function, overhead unearned). data.table is no longer
listed in Suggests:.janusplot() example no longer passes
show_asymmetry = TRUE (which has been deprecated since
0.0.0.9000 and emitted a soft deprecation warning on every
example render). The default annotations = c("edf", "A")
already surfaces the asymmetry index, so the example is materially
unchanged.Internal / house-style fixes:
match.arg() →
rlang::arg_match() in the two remaining sites
(.shape_glyph() in R/shape-metrics.R and
.display_title() in R/janusplot.R). Brings
every enumerated-argument gate to the same rlang discipline
CONTRIBUTING already documents. Function signatures updated so the enum
set lives on the formal default (as rlang::arg_match()
requires).@family tags added to 7 exports —
janusplot_shape_cutoffs(),
janusplot_shape_hierarchy(),
janusplot_shape_metrics(),
janusplot_shape_sensitivity(),
janusplot_shape_sensitivity_shapes(),
janusplot_shape_sensitivity_summary(),
janusplot_shape_sensitivity_plot(). The
_pkgdown.yml reference index now organises these two
families (shape-metrics and shape-sensitivity)
alongside the existing smooth-associations family,
replacing an alphabetical dump.inst/WORDLIST extended with the 24
Phase-F shape-category names (linear_up,
linear_down, convex_up,
convex_down, concave_up,
concave_down, s_shape, u_shape,
inverted_u, skewed_peak,
broad_peak, rippled_peak,
rippled_monotone, rippled_wave,
warped_wave, complex_wave,
bimodal_ripple, bi_wave,
bi_wave_ripple, …). Keeps
devtools::spell_check() clean.inst/CITATION Article bibentry now
carries email= and comment = c(ORCID = ...)
for consistency with the Manual bibentry, CITATION.cff, and
codemeta.json.DESCRIPTION, CITATION.cff, and
inst/CITATION at 0.0.0.9001.Deferred to a later maintenance sweep (not blocking 0.0.0.9001):
renv.lock snapshot.covr::package_coverage() > 0.85 CI-side floor
assertion.adr/ relocation to docs/adr/.barheight = grid::unit(0.6, "npc") override inside
the guide_colourbar() for both diverging and sequential
scales, and set legend.key.height = grid::unit(1, "null")
in the legend plot’s theme. The bar now tracks the matrix panel height
robustly across figure sizes (corrplot-like behaviour), closing the
visible proportional-scaling mismatch that appeared between narrow and
wide renders. Added a thin frame.colour = "grey50" around
the bar for visual weight.man/figures/lifecycle-*.svg added.
Running usethis::use_lifecycle() copied
lifecycle-experimental, lifecycle-stable,
lifecycle-superseded, and lifecycle-deprecated
into man/figures/. The HTML help pages for every
lifecycle::badge()-tagged function now render the badge
instead of the broken-image placeholder (? in a box) that
appeared when the SVG file was absent.display parameter on
janusplot(). Scalar — one of "fit"
(default, unchanged behaviour for legacy callers), "d1", or
"d2". Controls which single quantity is rendered in every
off-diagonal cell of the matrix. A matrix-level title
("Direct fit", "First derivative f'(x)", or
"Second derivative f''(x)") names the displayed quantity.
To compare fit vs derivative, issue two or three
janusplot() calls and place them side-by-side; each call
keeps its own with_data = TRUE summary table, which now
carries a display column tagging the rendered mode.mgcv’s linear-predictor matrix
via D %*% coef(fit) with variance
D %*% Vp %*% t(D). The method of Wood (2017, §7.2.4), as
popularised for GAMs by Simpson (2018, Frontiers in Ecology and
Evolution). No new dependency (gratia is not
required).derivative_ci parameter on
janusplot() and janusplot_data(). One
of "none" (default), "pointwise", or
"simultaneous". Default is "none" — no CI
ribbon is drawn on derivative panels unless the caller opts in, because
pointwise derivative ribbons over-read local features.
"pointwise" draws fit ± 1.96 * se from the
LP-matrix SE. "simultaneous" draws Monte Carlo
critical-multiplier bands per Simpson (2018): draw \(\tilde{\boldsymbol\beta}_b \sim
N(\hat{\boldsymbol\beta}, V_p)\) and use the \((1-\alpha)\) quantile of the normalised
max-deviation statistic as the critical multiplier on the pointwise
SE.derivative_ci_nsim parameter.
Integer number of Monte Carlo samples used when
derivative_ci = "simultaneous". Default 1000L
(Simpson 2018 uses 10000; 1000 is a throughput-quantile accuracy
compromise affordable for medium matrices).n_grid parameter on
janusplot() and janusplot_data().
Prediction-grid resolution. NULL (default) resolves to 100
when display = "fit" and 200 otherwise. Callers may
override. Larger grids shift the shape-metric values (M,
C, turning / inflection counts) slightly because they are
computed on this same grid — a deliberate side-effect, flagged in
?janusplot and in the Limitations section of the vignette.
Shapes and asymmetry are the primary matrix reading; M/C/counts are
secondary diagnostics.derivatives parameter on
janusplot_data(). Integer vector of orders in
1:2 (multi-order is allowed — unlike
janusplot()’s scalar display, the data
companion returns arbitrary requested orders in a single call). Every
per-pair list in the return carries deriv_yx and
deriv_xy named lists keyed by order, each a data frame with
columns x, fit, se,
lo, hi, ci_type. When
derivative_ci = "simultaneous" each derivative frame also
carries a "crit_multiplier" attribute.references/janusplot-derivatives.bib for re-use in the R
Journal paper.display enforces the
scalar choice via rlang::arg_match();
n_grid < 10 is an error and n_grid > 500
raises an informational note. Orders ≥ 3 are refused by design —
higher-order derivatives of penalised regression splines amplify noise
beyond usable signal at realistic sample sizes (Eilers, Marx &
Durbán 2015).display as a vector (display = c("fit", "d1"))
that stacked panels inside every cell. That pattern is removed; the
scalar API is the only one supported. Callers that reached the
stacked-panel intermediate must switch to one janusplot()
call per quantity.labels parameter with three modes:
"border" (default — variable names along the top + left
margins, mirroring corrplot’s tl.pos = "lt"
convention), "diagonal" (previous in-matrix layout), and
"none" (suppressed). Default flipped to
"border" because border labels free the diagonal cells and
scale better to k > 4 variables.label_srt — rotation of top labels
when labels = "border". Default 45° matches
the visual reference; 0 and 90 are
accepted.label_cex — positive multiplier on
border-label font size. Default 1.labels != "diagonal", giving the matrix a uniform grid
reading.M and C columns renamed to
monotonicity_index and convexity_index across
every data surface: the flat data frame from
janusplot(..., with_data = TRUE),
janusplot_data() per-pair lists
(monotonicity_index_yx / convexity_index_yx /
_xy variants), the return of
janusplot_shape_metrics(), the raw output of
janusplot_shape_sensitivity(), and the precomputed
shape_sensitivity_demo dataset. Paper symbols
M / C remain as mnemonics in the
documentation. Internal classifier parameter names (M,
C) are unchanged because they match the math
convention.janusplot
vignette.result$M / result$C
must update to result$monotonicity_index /
result$convexity_index.janusplot_shape_sensitivity() — new
public function that runs a full-factorial shape-recognition sensitivity
sweep (shapes × sample sizes × noise levels × replicates) and returns a
tidy raw data frame. Optional parallel dispatch via
future.apply.janusplot_shape_sensitivity_shapes() —
lists the 14 canonical ground-truth shapes available to the sweep.janusplot_shape_sensitivity_summary()
— per-cell accuracy aggregation at the fine (24-category) or archetype
(7-family) level.janusplot_shape_sensitivity_plot() —
four built-in diagnostic plots: fine / archetype confusion matrices,
per-shape accuracy grid, recovery curves.shape_sensitivity_demo dataset —
precomputed 2160-fit sweep shipped with the package so the vignette and
examples don’t need to re-run the sweep. Regenerated deterministically
via data-raw/shape_sensitivity_demo.R.shape-recognition-sensitivity.Rmd — presents the sweep
design, the pre-registered hypotheses, and every diagnostic plot on the
precomputed demo dataset.LazyData: true in DESCRIPTION so
shape_sensitivity_demo is accessible without an explicit
data() call under default user expectations.colour_by options.corr
for all three correlation encodings; actual method remains visible in
janusplot_data() column names and the caption.(T, I) dispatch. New fine categories:
skewed_peak, broad_peak,
rippled_peak, wave, warped_wave,
rippled_wave, complex_wave,
bimodal, bimodal_ripple, bi_wave,
bi_wave_ripple, rippled_monotone.label (code). The old right-margin Unicode-glyph list is
retired.annotations default is c("edf", "A"); opt into
per-cell shape markers with annotations = c(..., "shape")
(glyph) or annotations = c(..., "code") (2-letter ASCII —
safer on any font / PDF pipeline). When both are passed,
"code" wins.janusplot(..., with_data)$data gains
shape_code, shape_archetype,
shape_monotonic, shape_linear.
janusplot_data() pairs gain per-direction
shape_code_yx / shape_code_xy and the matching
shape_archetype_*, shape_monotonic_*,
shape_linear_*.janusplot_shape_hierarchy() — returns the 24-row
taxonomy with hierarchy columns (category,
code, archetype, monotonic,
linear, label, gloss). Intended
for downstream group-bys and for cross-referencing the compact 2-letter
codes when they appear in cells or the data table.complex.Academic framing. The broader-tier vocabulary
(linear / non-linear, monotone / non-monotone, convex / concave) is
standard calculus; the archetype layer is anchored by Pya & Wood
(2015) Stat & Comput (shape-constrained additive models)
and Calabrese (2008) Env Tox Chem (dose-response taxonomy). The
(T, I) dispatch is a coarsened Morse-theoretic
critical-point classification (Milnor 1963).
colour_by argument with a diverging RdBu
palette symmetric around zero. Choose colour_by = "pearson"
/ "kendall" for other correlation flavours or
colour_by = "edf" to restore the legacy encoding.fill_by is deprecated — use
colour_by. When supplied, fill_by fires a soft
deprecation warning and forwards to colour_by for one minor
version.annotations — a character vector (subset of
c("edf", "A", "shape")) now controls which corner labels
render on each cell. "A" (asymmetry index) replaces the old
n = ... annotation; "edf" keeps EDF visible as
text; "shape" draws the new shape-category glyph
bottom-right.show_asymmetry is deprecated — use
annotations. When supplied, the legacy argument fires a
soft deprecation warning and is merged into
annotations.M, convexity
C), two discrete counts (n_turning_points,
n_inflections), and a discrete shape_category
from a 12-category taxonomy (linear_up,
linear_down, convex_up,
concave_up, convex_down,
concave_down, u_shape,
inverted_u, s_shape, complex,
flat, indeterminate). Cutoffs are tunable via
janusplot_shape_cutoffs().janusplot_shape_metrics() — public
function to compute the shape descriptor for any fitted
mgcv::gam with a single s() term.janusplot_shape_cutoffs() — public
function returning the default threshold list; callers override
individual thresholds via ....annotations
includes "shape", a compact legend listing every category
present in the matrix is attached alongside the colour bar. Toggle with
show_shape_legend.glyph_style argument ("unicode" default,
"ascii" fallback) for pipelines that lack Unicode curve
glyphs.janusplot(..., with_data = TRUE)$data gains columns
cor_pearson, cor_spearman,
cor_kendall, tie_ratio, M,
C, n_turning_points,
n_inflections, flat_range_ratio,
shape_category, colour_value. The legacy
fill_value column is renamed
colour_value.janusplot_data() pairs — each
per-pair element gains Pearson / Spearman / Kendall correlations, tie
ratio, and per-direction shape descriptors (M_yx,
C_yx, M_xy, C_xy,
n_turning_*, n_inflect_*,
shape_yx, shape_xy).janusplot() — asymmetric smoothed-association matrix
visualisation. For each pair of numeric variables
(X_i, X_j) with i != j, the cell at matrix
position [i, j] renders the fitted spline from
mgcv::gam(X_j ~ s(X_i) + <adjust>). Upper and lower
triangles show the two directional fits. Diagonal cells carry variable
labels.janusplot_data() — programmatic companion returning raw
GAM fits and per-pair metrics (EDF, F-test p-value, deviance explained,
asymmetry index) without constructing a ggplot. Set
keep_fits = TRUE to retain the full mgcv::gam
objects.|EDF_yx − EDF_xy| / (EDF_yx + EDF_xy), bounded in
[0, 1], exposed both via
janusplot_data()$pairs[[i]]$asymmetry_index and
(optionally) as a cell corner annotation via
show_asymmetry = TRUE.adjust = — any
one-sided formula RHS (e.g. ~ s(z) + s(g, bs = "re"))
propagates to every pairwise GAM, producing covariate-adjusted smooths
and supporting random effects out of the box.order = "hclust" — reorder variables
by hierarchical clustering of 1 - |cor| (matching the
corrplot convention) to group strongly correlated variables
visually.na_action = "pairwise" (default) vs
"complete" — per-pair complete observations with annotated
n_used per cell, vs listwise deletion.future.apply — set
parallel = TRUE to dispatch pair fits across a
user-configured future::plan().palette =
accepts one of 16 options:
viridis
(default), magma, inferno,
plasma, cividis, mako,
rocket;turbo (not
colourblind-safe);RdYlBu,
RdBu, PuOr;Spectral;YlOrRd,
YlGnBu, Blues, Greens.fill_by != "none", placed in a dedicated fixed-width column
(1.8 cm) so cells remain square.k —
n and EDF labels shrink sublinearly as the
number of variables grows, keeping small-cell plots legible.n,
EDF, A, fill encoding, and significance
glyphs), showing only the keys actually displayed.with_data = TRUE — optional flat
per-cell summary table returned alongside the ggplot, as a
data.table when the package is installed or a
data.frame otherwise.vdiffr visual-regression
snapshots.R CMD check --as-cran with 0 ERRORs, 0 WARNINGs;
2 NOTEs remain (both environmental: new-submission status; local HTML
Tidy version).r = 0.93, janusplot recovers an asymmetry
index ≈ 0.56 (IQR [0.52, 0.65]), exposing hidden directional structure a
scalar correlation misses.Imports kept minimal: mgcv, ggplot2,
patchwork, grid, stats,
cli, rlang. Optional: data.table,
future.apply, vdiffr, withr,
palmerpenguins, MASS, agridat,
knitr, rmarkdown.