---
title: "Large-data linking with myIO"
output:
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{Large-data linking with myIO}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## When to use this

myIO's default SVG path handles up to about 20,000 rendered marks
comfortably. Beyond that, the big-data tier uses a coordinator, an engine
adapter, and optional Canvas or WebGL rendering so charts can respond to
brush and zoom interactions over millions of rows. Opt in per chart by
calling `setBigData(widget, source)`.

## Installation

Big-data features require two optional components.

First, install the Suggested R packages used for Arrow encoding, DuckDB
queries, downloads, checksums, and status output:

```{r install-packages, eval = FALSE}
install.packages(c("arrow", "duckdb", "DBI", "base64enc", "cli", "curl", "openssl"))
```

Second, install the DuckDB-WASM runtime when you plan to use the browser
engine. The runtime is downloaded on demand by
`myIO::install_duckdb_wasm()` and is not bundled in the CRAN tarball. It
is cached under `tools::R_user_dir("myIO", "cache")`. For airgapped
machines, place `duckdb-mvp.wasm` and `duckdb-browser-mvp.worker.js` in a
local directory and call `install_duckdb_wasm(from = "/path/to/dir")`.

```{r install-wasm, eval = FALSE}
install.packages(c("arrow", "duckdb", "DBI", "base64enc", "cli", "curl", "openssl"))
myIO::install_duckdb_wasm()
myIO::duckdb_wasm_status()
```

## Attaching big data

`setBigData()` accepts several source types. A `data.frame` is encoded as
inline Arrow IPC. This is convenient for portable HTML, but it warns above
50 MB and hard-errors above 200 MB.

```{r data-frame-source, eval = FALSE}
\dontrun{
library(myIO)

big <- data.frame(
  id = seq_len(1e6),
  x = rnorm(1e6),
  y = rnorm(1e6)
)

myIO(engine = "wasm") |>
  addIoLayer(type = "point", label = "points",
             mapping = list(x_var = "x", y_var = "y")) |>
  setBigData(big, rowkey_col = "id")
}
```

An `arrow::Table` uses the same inline IPC path.

```{r arrow-source, eval = FALSE}
\dontrun{
library(arrow)
library(myIO)

tab <- arrow_table(big)

myIO(engine = "wasm") |>
  addIoLayer(type = "point", label = "points",
             mapping = list(x_var = "x", y_var = "y")) |>
  setBigData(tab, rowkey_col = "id")
}
```

For larger static assets, pass a local path or URL ending in `.parquet`,
`.csv`, `.arrow`, or `.feather`.

```{r file-source, eval = FALSE}
\dontrun{
myIO(engine = "wasm") |>
  addIoLayer(type = "histogram", label = "x",
             mapping = list(x_var = "x")) |>
  setBigData("data/observations.parquet", rowkey_col = "id")

myIO(engine = "wasm") |>
  addIoLayer(type = "point", label = "remote",
             mapping = list(x_var = "x", y_var = "y")) |>
  setBigData("https://example.org/observations.csv", rowkey_col = "id")
}
```

A `DBI` connection is server-engine-only. Provide `table = "..."` so
myIO can read the schema.

```{r dbi-source, eval = FALSE}
\dontrun{
library(DBI)
library(duckdb)
library(myIO)

con <- dbConnect(duckdb())
dbWriteTable(con, "observations", big)

myIO(engine = "server") |>
  addIoLayer(type = "point", label = "points",
             mapping = list(x_var = "x", y_var = "y")) |>
  setBigData(con, table = "observations", rowkey_col = "id")
}
```

## The engine argument

Use `engine = "auto"`, `"server"`, `"wasm"`, or `"svg"` on `myIO()`.
`"auto"` is the default: a Shiny session resolves to `"server"`; otherwise
it resolves to `"wasm"`. `"server"` runs queries in R with `duckdb` and
streams Arrow batches to the browser, which is a good fit when a Shiny
server already exists. `"wasm"` runs DuckDB in the browser from the cached
WASM runtime, which fits static Quarto or R Markdown HTML. `"svg"` forces
the legacy SVG path without the coordinator and is mainly useful for
testing.

## Crosstalk threshold

By default, myIO broadcasts row keys to `crosstalk::SharedData` only when
the selected row count is at or below 100,000. Below the threshold, sibling
htmlwidgets such as plotly, leaflet, and reactable can react to myIO
brushes. Above it, upward broadcast is suppressed; myIO-to-myIO linking
still works through predicates, a one-shot console info is emitted, and
the footer badge reads `linked: predicate-only`.

Tune the limit with:

```{r threshold-option, eval = TRUE}
options(myIO.crosstalk_threshold = 50000L)
```

The threshold is per selection, not per chart. A narrow brush on a
million-row source can still broadcast if it matches few rows.

## File-protocol limitation

When a Quarto or R Markdown HTML file is opened directly from the file
manager with the `file://` protocol, Chromium blocks dynamic module
imports. myIO detects this and falls back to the SVG path with a one-shot
console info. To use the WASM engine on a local static HTML, serve it with
`servr::httd()` or `quarto preview`.

## Performance expectations

| Input rows | Engine | Renderer | Interaction |
|---|---|---|---|
| <= 20k | `svg` (default) | D3 SVG | Full brush/zoom, publication-quality |
| 20k-100k | `svg` + aggregation | D3 SVG | Smooth; tooltips on pre-aggregated data |
| 100k-1M | `wasm` or `server` | Canvas or WebGL | Sub-200ms brush re-aggregation (WASM), sub-500ms (server Shiny) |
| 1M-10M | `wasm` or `server` | WebGL | Target: 60fps pan/zoom; brush re-agg < 300ms |

## Limits and gotchas

Inline IPC above 200 MB hard-errors; use file paths or a `DBI`
connection. The Crosstalk threshold depends on which rows match the
current selection. The WASM binary is about 22 MB, downloads once per user
per version, and is cached indefinitely; clear it with
`clear_duckdb_wasm_cache()`. On Posit Connect or shinyapps.io, use the
`"server"` engine; `install_duckdb_wasm()` is not needed on the server.

## Minimal complete example

```{r minimal-example, eval = FALSE}
\dontrun{
library(myIO)

install.packages(c("arrow", "duckdb", "DBI", "base64enc", "cli", "curl", "openssl"))
myIO::install_duckdb_wasm()

set.seed(1)
events <- data.frame(
  id = seq_len(250000),
  time = as.POSIXct("2026-01-01", tz = "UTC") + seq_len(250000),
  x = rnorm(250000),
  y = rnorm(250000),
  group = sample(LETTERS[1:4], 250000, replace = TRUE)
)

myIO(engine = "wasm") |>
  addIoLayer(type = "point", label = "events",
             mapping = list(x_var = "x", y_var = "y", color = "group")) |>
  setBrush(direction = "xy") |>
  setBigData(events, rowkey_col = "id")
}
```
