| Title: | Evaluate the Discrepancy Between Two Real-Time Updated Probabilistic Forecasts |
| Version: | 0.1.0 |
| Description: | Methods from Yeh, Rice, and Dubin (2022) <doi:10.1080/00031305.2021.1967781> for comparing two continuously updated probabilistic forecasts under squared (Brier) loss: pointwise loss and variance, a global delta test (Monte Carlo p-values), simulation designs, and a naive pointwise band plot. |
| URL: | https://github.com/chikuang/rtForecastEval |
| BugReports: | https://github.com/chikuang/rtForecastEval/issues |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Imports: | dplyr (≥ 1.0.1), ggplot2 (≥ 3.3.2), stats (≥ 4.0.3), tidyr (≥ 1.1.1), RSpectra (≥ 0.16.0), rlist (≥ 0.4.6.1), purrr (≥ 0.3.4), MASS (≥ 7.3-52) |
| Suggests: | pkgload (≥ 1.2.0), knitr (≥ 1.28.0), readr (≥ 1.3.1), reshape2 (≥ 1.4.4), rmarkdown (≥ 2.1.0), gridExtra (≥ 2.3.0), tibble (≥ 3.0.3), sde |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-04-24 02:21:34 UTC; chikuang |
| Author: | Chi-Kuang Yeh |
| Maintainer: | Chi-Kuang Yeh <cyeh@gsu.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-04-28 18:30:02 UTC |
rtForecastEval: Compare real-time probabilistic forecasts
Description
Methods from Yeh, Rice, and Dubin (2022) doi:10.1080/00031305.2021.1967781 for comparing two continuously updated probabilistic forecasts under squared (Brier) loss: pointwise loss and variance, a global delta test (Monte Carlo p-values), simulation designs, and a naive pointwise band plot.
Author(s)
Maintainer: Chi-Kuang Yeh cyeh@gsu.edu (ORCID)
Other contributors:
Gregory Rice [contributor, thesis advisor]
Joel A. Dubin [contributor, thesis advisor]
References
Yeh, C.-K., Rice, G., & Dubin, J. A. (2022). Evaluating real-time probabilistic forecasts with application to National Basketball Association outcome prediction. The American Statistician, 76, 214–223. doi:10.1080/00031305.2021.1967781
Preprint: https://arxiv.org/abs/2010.00781 (PDF: https://arxiv.org/pdf/2010.00781.pdf).
See Also
Useful links:
Report bugs at https://github.com/chikuang/rtForecastEval/issues
Pointwise mean loss difference and variance
Description
For each time grid point, estimates the pointwise difference in expected
squared (Brier) loss between two probabilistic forecasts, and the variance
factor used for inference (Yeh, Rice, and Dubin, 2022). With default loss
L(x,y) = (x-y)^2, the function computes the mean over games of
L(Y, phat_1) - L(Y, phat_2) at each time, and the influence-style terms
si used to form sigma-squared over four, matching the paper replication
code.
Usage
calc_L_s2(
df,
pA = "phat_1",
pB = "phat_2",
Y = "Y",
grid = "grid",
L = function(x, y) (x - y)^2
)
Arguments
df |
A data frame with one row per (game × time) in long format. |
pA, pB |
Column names for the two forecast vectors (probabilities between 0 and 1). |
Y |
Column name for the binary outcome (0/1). |
grid |
Column name for the normalized time grid between 0 and 1. |
L |
Loss function; default is squared error, \(L(x,y)=(x-y)^2\). |
Value
A tibble with one row per grid value and columns L (mean loss
difference), sigma2 (variance factor), and n (number of rows).
References
Yeh, C.-K., Rice, G., & Dubin, J. A. (2022). The American Statistician, 76, 214–223. doi:10.1080/00031305.2021.1967781
Preprint: https://arxiv.org/abs/2010.00781
Examples
library(dplyr)
library(rtForecastEval)
ngame <- 8L
nsamp <- 6L
gr <- seq(0, 1, length.out = nsamp)
df <- tidyr::expand_grid(grid = gr, game = seq_len(ngame)) %>%
mutate(
phat_A = stats::runif(dplyr::n()),
phat_B = stats::runif(dplyr::n()),
Y = stats::rbinom(dplyr::n(), 1L, 0.5)
)
calc_L_s2(df, "phat_A", "phat_B", "Y", "grid")
Delta test statistic for comparing two forecasting methods
Description
Computes the global test statistic Z for comparing two real-time forecasters under squared loss: Z equals (n_game / n_samp) times the sum over grid points of the squared pointwise mean loss differences, matching the implementation in the paper replication code (utility.R).
Usage
calc_Z(
df,
pA = "phat_1",
pB = "phat_2",
Y = "Y",
grid = "grid",
nsamp,
ngame,
L = function(x, y) (x - y)^2
)
Arguments
df |
Data frame containing forecasts, outcomes, and |
pA, pB |
Names of columns with the two probability forecasts. |
Y |
Name of the binary outcome column. |
grid |
Name of the column with normalized times between 0 and 1. |
nsamp |
Number of distinct time points (length of the grid). |
ngame |
Number of independent replicates (e.g. games). |
L |
Loss function; default is squared error. |
Value
A single numeric value of the test statistic Z.
References
Yeh, C.-K., Rice, G., & Dubin, J. A. (2022). The American Statistician, 76, 214–223. doi:10.1080/00031305.2021.1967781
Preprint: https://arxiv.org/abs/2010.00781
Examples
library(dplyr)
library(rtForecastEval)
ngame <- 8L
nsamp <- 6L
gr <- seq(0, 1, length.out = nsamp)
df <- tidyr::expand_grid(grid = gr, game = seq_len(ngame)) %>%
mutate(
phat_A = stats::runif(dplyr::n()),
phat_B = stats::runif(dplyr::n()),
Y = stats::rbinom(dplyr::n(), 1L, 0.5)
)
calc_Z(df, "phat_A", "phat_B", "Y", "grid", nsamp = nsamp, ngame = ngame)
Leading eigenvalues for the delta test covariance
Description
Forms the empirical covariance matrix of either centered or non-centered
forecast differences across time points (see diff_cent vs diff_non_cent
in the paper), and returns the leading eigenvalues scaled by one over
nsamp, for use with calc_pval(). This matches the construction in the
replication utility.R (calc_p_val / eigs_sym).
Usage
calc_eig(df, n_eig = 10, ngame, nsamp, grid = "grid", cent = FALSE)
Arguments
df |
Data frame containing |
n_eig |
Number of leading eigenvalues to extract (dimension D in the paper). |
ngame |
Number of independent games (used for scaling inner products). |
nsamp |
Number of time grid points. |
grid |
Name of the time grid column. |
cent |
If |
Value
A numeric vector of length n_eig of eigenvalues (descending).
References
Yeh, C.-K., Rice, G., & Dubin, J. A. (2022). The American Statistician, 76, 214–223. doi:10.1080/00031305.2021.1967781
Preprint: https://arxiv.org/abs/2010.00781
Examples
library(dplyr)
library(rtForecastEval)
ngame <- 8L
nsamp <- 6L
gr <- seq(0, 1, length.out = nsamp)
df <- tidyr::expand_grid(grid = gr, game = seq_len(ngame)) %>%
mutate(
phat_A = stats::runif(dplyr::n()),
phat_B = stats::runif(dplyr::n()),
diff_non_cent = phat_A - phat_B
)
calc_eig(df, n_eig = 4L, ngame = ngame, nsamp = nsamp, cent = FALSE)
Monte Carlo p-value and quantiles for the delta test
Description
Given leading eigenvalues of the covariance operator used in the delta test
(from calc_eig()) and the observed statistic \(Z\) (from calc_Z()),
draws a Monte Carlo sample from the weighted sum of chi-square(1) variables
and returns the right-tail p-value and quantiles of the null distribution
(as in the paper’s replication code).
Usage
calc_pval(Z, eig, quan, n_MC = 5000)
Arguments
Z |
Observed test statistic from |
eig |
Numeric vector of leading eigenvalues (length |
quan |
Probabilities for which to return quantiles of the simulated null
statistic (e.g. |
n_MC |
Number of Monte Carlo draws (default |
Details
Monte Carlo draws use R's current random-number stream. The function does
not call set.seed; call it yourself beforehand if you
need reproducible p_val and quantiles.
Value
A list with p_val (one-sided p-value) and quantile (named
vector of quantiles at quan).
References
Yeh, C.-K., Rice, G., & Dubin, J. A. (2022). The American Statistician, 76, 214–223. doi:10.1080/00031305.2021.1967781
Preprint: https://arxiv.org/abs/2010.00781
Examples
if (requireNamespace("sde", quietly = TRUE)) {
library(dplyr)
nsamp <- 20L
ngame <- 25L
df <- df_gen(N = nsamp, Ngame = ngame) %>%
dplyr::group_by(grid) %>%
dplyr::mutate(
p_bar_12 = mean(phat_A - phat_B),
diff_non_cent = phat_A - phat_B,
diff_cent = phat_A - phat_B - p_bar_12
) %>%
dplyr::ungroup()
Z <- calc_Z(df, "phat_A", "phat_B", "Y", "grid", nsamp, ngame)
eig <- calc_eig(df, n_eig = 5L, ngame = ngame, nsamp = nsamp, cent = FALSE)
set.seed(1)
calc_pval(Z, eig, quan = c(0.9, 0.95), n_MC = 300L)
}
Simulate real-time probabilistic forecasts (paper designs)
Description
Generates replicated “games” and two competing forecast trajectories on a
uniform grid from 0 to 1, following the simulation constructions used in
Yeh, Rice, and Dubin (2022) (see sanity_generator / simulation scripts in
the replication repository). Ornstein–Uhlenbeck (type = "OU") or Brownian
motion (type = "BM") noise can drive the latent processes; outputs include
binary outcomes Y and forecast probabilities phat_A, phat_B suitable
for calc_Z(), calc_eig(), etc.
Usage
df_gen(N, Ngame, type = c("OU", "BM"), a = 1, b = 0.27)
Arguments
N |
Number of interior time steps (grid has |
Ngame |
Number of independent replicates (games). |
type |
|
a, b |
Constants controlling the drift of the latent trajectory (see replication code). |
Details
Requires the sde package (Suggests) for Brownian paths.
Value
A tibble with columns including game, grid, Y, phat_A,
phat_B, and latent W1, W2, etc., in long format.
References
Yeh, C.-K., Rice, G., & Dubin, J. A. (2022). The American Statistician, 76, 214–223. doi:10.1080/00031305.2021.1967781
Preprint: https://arxiv.org/abs/2010.00781
Examples
if (requireNamespace("sde", quietly = TRUE)) {
d <- df_gen(N = 15L, Ngame = 12L)
head(d)
}
Linear interpolation of a forecast trajectory onto a grid
Description
Maps a vector of probabilities prob observed at times grid onto an
equally spaced grid of length length(grid) between 0 and 1 using linear
interpolation (stats::approx). Useful when aligning ESPN or model outputs
to the common grid used in the NBA analysis (see replication pre_process /
utility.R).
Usage
lin_interp(prob, grid, outcome)
Arguments
prob |
Numeric vector of forecast probabilities. |
grid |
Numeric vector of time points (same length as |
outcome |
Scalar binary outcome \(Y\) to attach to every interpolated row. |
Value
A tibble with columns phat_approx, grid, and Y.
References
Yeh, C.-K., Rice, G., & Dubin, J. A. (2022). The American Statistician, 76, 214–223. doi:10.1080/00031305.2021.1967781
Preprint: https://arxiv.org/abs/2010.00781
Examples
library(rtForecastEval)
g <- seq(0, 1, length.out = 9L)
lin_interp(prob = seq(0.1, 0.9, length.out = 9L), grid = g, outcome = 1L)
Naive pointwise confidence band for mean loss difference
Description
Plots the pointwise mean loss difference from calc_L_s2() (column L)
against time, with a normal-theory band using sigma2 and the point sample
size n. This is the naive \(t\)-style band from the paper; global
inference uses calc_Z() / calc_pval() instead.
Usage
plot_pcb(
df,
grid = "grid",
L = "L",
var = "sigma2",
title = "Pointwise mean loss difference (A vs B)",
subtitle = "Naive normal band (95%); use calc_pval() for global test",
caption = NULL,
xlab = "Normalized time (grid)",
ylab = "Mean squared loss difference"
)
Arguments
df |
Output of |
grid |
Name of the x-axis column (default |
L |
Name of the pointwise mean difference column (default |
var |
Name of the variance column (default |
title, subtitle, caption |
Passed to |
xlab, ylab |
Axis labels (see |
Details
This plot summarizes relative skill (loss difference) over time — not a
classical calibration plot (forecast vs observed event rate). For a
simple reliability-style view from the same long-format data, see the package
vignette example that bins phat and plots mean Y vs mean forecast.
Value
A ggplot2 object.
References
Yeh, C.-K., Rice, G., & Dubin, J. A. (2022). The American Statistician, 76, 214–223. doi:10.1080/00031305.2021.1967781
Preprint: https://arxiv.org/abs/2010.00781
Examples
library(dplyr)
library(rtForecastEval)
ngame <- 8L
nsamp <- 6L
gr <- seq(0, 1, length.out = nsamp)
df <- tidyr::expand_grid(grid = gr, game = seq_len(ngame)) %>%
mutate(
phat_A = stats::runif(dplyr::n()),
phat_B = stats::runif(dplyr::n()),
Y = stats::rbinom(dplyr::n(), 1L, 0.5)
)
tmp <- calc_L_s2(df, "phat_A", "phat_B", "Y", "grid")
plot_pcb(tmp)