| Title: | Bayesian Error Propagation and Forecast Uncertainty Decomposition |
|---|---|
| Description: | Provides a full pipeline from regularized or standard regression models (elastic net, linear models, generalized linear models, random forests) to informed Bayesian priors, structured forecast uncertainty decomposition (parameter / environmental / residual, plus a temporal component when the model carries an autocorrelation term), and forecast shelf life analysis (the quantification of when a forecast becomes uninformative). Designed for ecological and genomic forecasting with climate or environmental covariates. Methods build on Bürkner (2017) <doi:10.18637/jss.v080.i01> for Bayesian regression via 'Stan', Friedman, Hastie, and Tibshirani (2010) <doi:10.18637/jss.v033.i01> for elastic net regularization, Wright and Ziegler (2017) <doi:10.18637/jss.v077.i01> for random forests, and Vehtari, Gelman, and Gabry (2017) <doi:10.1007/s11222-016-9696-4> for leave-one-out cross-validation. |
| Authors: | Luis Javier Madrigal-Roca [aut, cre], John Kelly [aut] |
| Maintainer: | Luis Javier Madrigal-Roca <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-05-30 15:14:44 UTC |
| Source: | https://github.com/luismadrigal98/errortracer |
Returns a data.frame with the uncertainty decomposition stored
inside an et_prediction object:
Variance of the posterior linear predictor — captures uncertainty in fitted regression coefficients.
Additional variance arising from measurement or
prediction uncertainty in the predictor values (estimated via
perturbation in et_predict). Zero when
env_noise = NULL.
Posterior mean of (or its
family-specific analogue) — biological process noise, unmeasured
drivers, and drift. For autocorrelation models this is the
innovation variance, not the stationary marginal variance;
the autocorrelated accumulation is reported separately in
temporal_var.
(Only present when the model formula contains an
autocorrelation term such as ar(), ma(),
arma(), cosy(), unstr(), sar(), or
car().) Variance attributable to residual temporal or
spatial dependence beyond the iid param + residual sum,
computed as
pmax(0, total_var - (param_var + residual_var)).
env_var is deliberately excluded from this gap because it
is an additive perturbation-based augmentation measured outside
of posterior_predict.
Variance of the full posterior predictive draws, including any autocorrelation structure modelled by brms.
decompose_uncertainty(predictions, ...)decompose_uncertainty(predictions, ...)
predictions |
An |
... |
Unused. |
All variance components are guaranteed non-negative. When
temporal_var is present, param_var + residual_var +
temporal_var reconstructs total_var (modulo Monte Carlo error);
when it is absent, param_var + residual_var does.
env_var is always additive on top, representing the extra
variance that would be contributed by perturbing predictors with
env_noise.
A data.frame with columns
obs_id, param_var, env_var, residual_var, total_var
(plus temporal_var for autocorrelation models, and a leading
group column for grouped predictions).
set.seed(1) df <- data.frame(y = rnorm(20), x1 = rnorm(20)) fit <- et_fit(y ~ x1, data = df, chains = 1, iter = 500, warmup = 250, cores = 1, refresh = 0) new_df <- data.frame(x1 = rnorm(5)) pred <- et_predict(fit, newdata = new_df, env_noise = list(x1 = 0.2), n_draws = 200, n_perturb = 50) decomp <- decompose_uncertainty(pred) head(decomp)set.seed(1) df <- data.frame(y = rnorm(20), x1 = rnorm(20)) fit <- et_fit(y ~ x1, data = df, chains = 1, iter = 500, warmup = 250, cores = 1, refresh = 0) new_df <- data.frame(x1 = rnorm(5)) pred <- et_predict(fit, newdata = new_df, env_noise = list(x1 = 0.2), n_draws = 200, n_perturb = 50) decomp <- decompose_uncertainty(pred) head(decomp)
Computes observed coverage probability at multiple nominal CI levels. A well-calibrated Bayesian model should produce 90% CIs that contain the true value 90% of the time, etc.
et_calibrate(predictions, observed, response_col = NULL, ci_levels = NULL, ...)et_calibrate(predictions, observed, response_col = NULL, ci_levels = NULL, ...)
predictions |
An |
observed |
A |
response_col |
Character. Name of the response column in
|
ci_levels |
Numeric vector. CI levels to assess. Defaults to all
levels present in the |
... |
Unused. |
A data.frame with columns:
Nominal CI level.
Same as ci_level.
Fraction of true values falling inside the CI.
Number of observations used.
Signed difference: observed - nominal. Positive = over-coverage (CIs too wide / conservative). Negative = under-coverage (CIs too narrow / overconfident).
Mean CI width across observations. Sharpness and calibration are complementary: a model can be calibrated but useless if sharpness is poor (very wide CIs).
For grouped predictions, a group column is prepended.
set.seed(1) df <- data.frame(y = rnorm(20), x1 = rnorm(20)) fit <- et_fit(y ~ x1, data = df, chains = 1, iter = 500, warmup = 250, cores = 1, refresh = 0) valid_df <- data.frame(y = rnorm(5), x1 = rnorm(5)) pred <- et_predict(fit, newdata = valid_df, n_draws = 200, n_perturb = 50) cal <- et_calibrate(pred, observed = valid_df, response_col = "y") print(cal)set.seed(1) df <- data.frame(y = rnorm(20), x1 = rnorm(20)) fit <- et_fit(y ~ x1, data = df, chains = 1, iter = 500, warmup = 250, cores = 1, refresh = 0) valid_df <- data.frame(y = rnorm(5), x1 = rnorm(5)) pred <- et_predict(fit, newdata = valid_df, n_draws = 200, n_perturb = 50) cal <- et_calibrate(pred, observed = valid_df, response_col = "y") print(cal)
Computes Rhat, effective sample size ratios, divergent transitions, and
leave-one-out cross-validation (LOO-CV) for a fitted et_model.
et_diagnose(model, loo = TRUE, ...)et_diagnose(model, loo = TRUE, ...)
model |
An |
loo |
Logical. Whether to run LOO-CV (can be slow; default
|
... |
Unused. |
A list with elements:
List: rhat_max, rhat_all_ok,
neff_min, neff_all_ok, n_divergences.
List or NULL: elpd_loo, p_loo,
looic, n_bad_pareto_k, loo_object.
Printed summary from brms::summary().
For et_model_list, a named list of per-group diagnostic lists
plus an aggregated summary data.frame.
Wraps brms::brm() and attaches the prior specification,
training data reference, and configuration for downstream uncertainty
decomposition. Pass priors from extract_priors to
use regularized-model coefficients as prior means; omit it for
default (weakly informative) priors.
et_fit( formula, data, priors = NULL, chains = 4L, iter = 2000L, warmup = floor(iter/2), cores = min(chains, parallel::detectCores()), seed = 42L, adapt_delta = 0.95, max_treedepth = 12L, grouping = NULL, eiv = NULL, silent = 2L, ... )et_fit( formula, data, priors = NULL, chains = 4L, iter = 2000L, warmup = floor(iter/2), cores = min(chains, parallel::detectCores()), seed = 42L, adapt_delta = 0.95, max_treedepth = 12L, grouping = NULL, eiv = NULL, silent = 2L, ... )
formula |
An R formula, e.g.\ |
data |
A |
priors |
An |
chains |
Integer. Number of MCMC chains (default 4). |
iter |
Integer. Total iterations per chain, including warmup (default 2000). |
warmup |
Integer. Warmup iterations per chain (default
|
cores |
Integer. Parallel cores (default
|
seed |
Integer. Random seed for reproducibility (default 42). |
adapt_delta |
Numeric. Target acceptance probability for HMC (default 0.95). |
max_treedepth |
Integer. Maximum tree depth (default 12). |
grouping |
Character. Name of a column in |
eiv |
Optional errors-in-variables specification. A named list /
vector mapping predictor names to either a scalar SD or a vector of
per-row SDs (length |
silent |
Integer passed to |
... |
Additional arguments passed to |
An et_model object (or an et_model_list if
grouping is specified).
set.seed(1) df <- data.frame(y = rnorm(20), x1 = rnorm(20), x2 = rnorm(20)) ps <- extract_priors(lm(y ~ x1 + x2, data = df)) fit <- et_fit(y ~ x1 + x2, data = df, priors = ps, chains = 1, iter = 500, warmup = 250, cores = 1, refresh = 0) print(fit)set.seed(1) df <- data.frame(y = rnorm(20), x1 = rnorm(20), x2 = rnorm(20)) ps <- extract_priors(lm(y ~ x1 + x2, data = df)) fit <- et_fit(y ~ x1 + x2, data = df, priors = ps, chains = 1, iter = 500, warmup = 250, cores = 1, refresh = 0) print(fit)
A well-calibrated model produces points along the 1:1 diagonal. Points above the diagonal indicate over-coverage (conservative); below indicates under-coverage (anti-conservative).
et_plot_calibration(cal, group_col = NULL)et_plot_calibration(cal, group_col = NULL)
cal |
A |
group_col |
Optional character. Name of a column in |
A ggplot2 object.
Compares Bayesian posterior estimates (95% CI) with the regularized coefficient values used as prior means (shown as crosses).
et_plot_coefficients(model)et_plot_coefficients(model)
model |
An |
A ggplot2 object.
Produces a stacked bar chart showing the relative contributions of
parameter, environmental, and residual variance for each observation,
plus a fourth temporal-autocorrelation component when present in
decomp (see decompose_uncertainty).
et_plot_decomposition(decomp, proportional = TRUE, group_col = NULL)et_plot_decomposition(decomp, proportional = TRUE, group_col = NULL)
decomp |
A |
proportional |
Logical. If |
group_col |
Character. Optional name of a grouping column in
|
A ggplot2 object.
Overlays nested credible interval ribbons on the median forecast, with optional observed values for calibration assessment.
et_plot_forecast( predictions, observed = NULL, response_col = NULL, time_col = NULL )et_plot_forecast( predictions, observed = NULL, response_col = NULL, time_col = NULL )
predictions |
An |
observed |
Optional |
response_col |
Character. Name of the response column in
|
time_col |
Character. Column in |
A ggplot2 object.
Overlays prior and posterior density for each predictor coefficient, visualising how much the data update the priors.
et_plot_prior_posterior(model, max_preds = 8L, n_prior_draws = 4000L)et_plot_prior_posterior(model, max_preds = 8L, n_prior_draws = 4000L)
model |
An |
max_preds |
Integer. Maximum number of predictors to show (default 8). Predictors are shown in the order they appear in the prior specification. |
n_prior_draws |
Integer. Number of random draws for the prior density (default 4000). |
A ggplot2 object.
Visualises the output of et_sensitivity_profile: for each
noise grid point, shows how the environmental share of total
variance and the forecast horizon respond. Observed horizons
are drawn as solid points; projected horizons are hollow points;
lower-bound rows are drawn as upward arrows at the last informative time.
et_plot_sensitivity(sens, show = c("horizon", "env_share", "ratio"))et_plot_sensitivity(sens, show = c("horizon", "env_share", "ratio"))
sens |
An |
show |
|
The x-axis is the noise fraction (when the grid was built from
fraction_grid) or the grid step label otherwise. When both a
numeric fraction and a descriptive label exist, the fraction is preferred
for continuous x-positioning.
A ggplot2 object.
Shows how credible interval width grows over the forecast horizon and marks the threshold beyond which the forecast is uninformative.
et_plot_shelf_life(sl, show_ratio = TRUE)et_plot_shelf_life(sl, show_ratio = TRUE)
sl |
An |
show_ratio |
Logical. If |
A ggplot2 object.
Generates posterior predictive draws for new observations, propagates
environmental measurement uncertainty through the model, and computes
credible intervals. The resulting et_prediction object is the
input to decompose_uncertainty, shelf_life,
and the plotting functions.
et_predict( model, newdata, env_noise = NULL, env_cov = NULL, env_dist = NULL, n_draws = 2000L, ci_levels = c(0.5, 0.8, 0.9, 0.95), n_perturb = NULL, n_env_draws = 1L, interval_type = c("predictive", "linpred"), include_env_in_ci = FALSE, ... )et_predict( model, newdata, env_noise = NULL, env_cov = NULL, env_dist = NULL, n_draws = 2000L, ci_levels = c(0.5, 0.8, 0.9, 0.95), n_perturb = NULL, n_env_draws = 1L, interval_type = c("predictive", "linpred"), include_env_in_ci = FALSE, ... )
model |
An |
newdata |
A |
env_noise |
Environmental measurement / prediction uncertainty. Can be:
|
env_cov |
Correlation structure of the environmental noise. The
magnitudes of the noise come from
A correlation derived from training data is a working assumption: the structure of the errors is assumed to mirror the structure of the values. When this is implausible, pass a matrix directly. |
env_dist |
Distributional form of the per-predictor noise. The
Distributions:
Correlation ( |
n_draws |
Integer. Number of posterior draws to use (default 2000; capped at the number of draws available in the fit). |
ci_levels |
Numeric vector. Credible interval levels to compute
(default |
n_perturb |
Integer. Number of posterior draws used for the
environmental perturbation step (default |
n_env_draws |
Integer. Number of independent environmental
perturbations averaged per posterior draw when estimating
|
interval_type |
Character. Which draws to use when computing credible intervals:
The decomposition components and |
include_env_in_ci |
Logical. When |
... |
Passed to methods. |
An et_prediction object (list) containing:
posterior_predictMatrix [n_draws x n_obs]: full
posterior predictive draws (parameter + residual uncertainty).
posterior_linpredMatrix [n_draws x n_obs]: linear
predictor draws on the link scale (parameter uncertainty only).
lp_perturbedMatrix [n_perturb x n_obs]: linear
predictor on the link scale computed on environmentally perturbed inputs.
sigma_drawsNumeric vector: posterior draws of sigma
(NA for families without a sigma parameter, e.g.\ Binomial).
credible_intervalsdata.frame with columns
row_id, ci_level, lower, median, upper, width.
decompositiondata.frame with columns
obs_id, total_var, param_var, env_var, v_env_mcse, residual_var.
All components are on the response scale.
v_env_mcse is the Monte Carlo SE of env_var.
residual_var is per-observation for non-Gaussian families
(e.g.\ Binomial: ) and constant
for Gaussian.
newdataThe input newdata.
modelReference to the et_model used.
env_covThe correlation matrix actually
used for perturbation (identity for env_cov = NULL).
env_distNamed character vector mapping each predictor to the distribution actually used for its perturbation.
decompose_uncertainty, shelf_life,
et_calibrate
Sweeps a grid of noise magnitudes, re-runs et_predict for
each, and summarises how the environmental variance component and the
forecast shelf life respond. This turns the subjective choice of
env_noise into an auditable curve: instead of reporting a
single shelf life at one (often hand-picked) measurement-error level, the
user sees how the horizon degrades as assumed noise grows.
et_sensitivity_profile( model, newdata, response_scale, fraction_grid = NULL, absolute_grid = NULL, env_cov = NULL, env_dist = NULL, include_env_in_ci = TRUE, ci_level = 0.9, ci_levels = c(0.5, 0.8, 0.9, 0.95), threshold = 1, time_col = NULL, n_draws = 2000L, n_perturb = NULL, max_extrapolation_factor = 10, verbose = TRUE, plausible_range = NULL )et_sensitivity_profile( model, newdata, response_scale, fraction_grid = NULL, absolute_grid = NULL, env_cov = NULL, env_dist = NULL, include_env_in_ci = TRUE, ci_level = 0.9, ci_levels = c(0.5, 0.8, 0.9, 0.95), threshold = 1, time_col = NULL, n_draws = 2000L, n_perturb = NULL, max_extrapolation_factor = 10, verbose = TRUE, plausible_range = NULL )
model |
An |
newdata |
|
response_scale |
Two-element numeric vector |
fraction_grid |
Optional numeric vector of scalar noise fractions. |
absolute_grid |
Optional |
env_cov, env_dist
|
Passed through to |
include_env_in_ci |
Logical. If |
ci_level |
Numeric. Used for shelf life and CI width tracking
(default 0.90). Must be in |
ci_levels |
Numeric vector of CI levels computed per iteration
(default |
threshold |
Numeric. Shelf-life threshold (default 1.0). |
time_col |
Character. Optional time column for shelf life. |
n_draws, n_perturb
|
Passed to |
max_extrapolation_factor |
Passed to |
verbose |
Logical. If |
plausible_range |
Deprecated. Use |
Three argument styles are supported for the grid:
fraction_grid: scalar fractions of each predictor's SD in
newdata. These are passed straight to the scalar form of
et_predict's env_noise. Use this for an apples-
to-apples sweep that scales all predictors together.
absolute_grid: a list of named numeric lists /
vectors; each list element becomes a single env_noise argument
(so you can sweep absolute SDs for one predictor while holding others
fixed).
Neither supplied (NULL): a default log-spaced fraction grid
c(0, 0.01, 0.025, 0.05, 0.1, 0.2, 0.4, 0.8) is used.
Each iteration calls et_predict(..., env_noise = g) then
shelf_life and records:
Mean parameter / environmental / residual / total variance across forecast observations.
The horizon description ("observed", "projected", or
"lower_bound") and numeric horizon if any.
Mean / max CI width / plausible-range ratio.
An et_sensitivity object: a data.frame with one row
per grid point and columns
grid_id1-indexed step.
labelShort descriptor of the env_noise used
(e.g.\ "fraction = 0.05").
fractionScalar fraction when applicable; NA
for absolute-grid rows.
param_var, env_var, residual_var, total_varMean across forecast observations.
env_shareenv_var / (env_var + total_var) —
the fraction of combined predictive variance attributable to
predictor noise (bounded in ).
ci_width_mean, ci_width_maxAt ci_level.
ratio_mean, ratio_maxCI width / plausible range.
horizon_typeOne of "observed", "projected",
"lower_bound".
horizonNumeric horizon (observed or projected) or
NA for lower-bound rows.
horizon_descriptionThe long description returned by
shelf_life.
The returned object carries the original call and response_scale
as attributes for use by et_plot_sensitivity.
et_predict, shelf_life,
et_plot_sensitivity
A named list containing training data, forecast predictors, validation responses, and ground-truth parameters for two synthetic SNP clusters (A and B) at a hypothetical mountain plant site. The dataset is designed to exercise every step of the ErrorTracer pipeline and to support parameter-recovery validation (true coefficients are known).
et_simet_sim
A named list with seven elements:
traindata.frame with 100 rows and 6 columns (2 clusters
50 training years, 1970–2019):
Integer. Calendar year.
Character. SNP cluster identifier: "A" or
"B".
Numeric. Standardised mean growing-season temperature (original unit: °C; standardised using training-period mean and SD).
Numeric. Standardised total growing-season precipitation (original unit: mm).
Numeric. Standardised peak snow water equivalent (original unit: mm).
Numeric. Simulated allele-frequency change on the
arcsin-sqrt scale ( transformation).
This transformation is unbounded and has no fixed
[, 1] constraint; the plausible range should be derived
from the training data or from the theoretical bounds
(\ on the arcsin scale).
forecastdata.frame with 30 rows and 5 columns (2 clusters
15 forecast years, 2020–2034). Same columns as train except
z_diff is absent — these are the prediction targets. Predictors
are standardised using the training-period statistics stored in
standardization.
validationdata.frame with 30 rows and 3 columns (year,
cluster_id, z_diff). True response values for the
forecast period; used with et_calibrate to assess
posterior predictive coverage.
true_paramsNamed list with one element per cluster. Each element is a named
numeric vector of the true data-generating parameters:
intercept, Tmean, PPT, SWE, and
sigma (residual SD).
intercept = 0.00, Tmean = 0.50,
PPT = -0.30, SWE = 0.20, sigma = 0.30
intercept = 0.10, Tmean = 0.30,
PPT = -0.20, SWE = -0.25, sigma = 0.35
env_noiseNamed list. Suggested per-predictor environmental noise SDs (on
the standardised scale) for use with the env_noise argument of
et_predict: Tmean = 0.30, PPT = 0.20,
SWE = 0.15.
standardizationNamed list with one element per predictor. Each element is a
named numeric vector c(mean = ..., sd = ...) giving the
training-period statistics used to standardise the data. Needed if you
wish to back-transform predictions to the original (unstandardised)
scale with unstandardize.
descriptionCharacter string. Human-readable description of the dataset and how it was generated.
The climate time series are simulated as AR(1) processes with a warming
trend in Tmean (+0.015 °C yr, cumulating to ~0.30 SD
above the training mean over the 15-year forecast window). SWE
is generated with a weak dependence on Tmean (negative coupling)
and PPT (positive coupling) plus a dominant independent noise
component, so that the three predictors have only mild pairwise
correlation (|R| < 0.2 on the training subset). This makes all
regression coefficients reliably identifiable in a single replicate.
All predictors are standardised using training-period statistics only.
In addition to the three real climate predictors, the dataset includes
ten independent standardised nuisance predictors d1, ..., d10
with zero true effect on the response. They give the regularized-prior
extraction step a real selection job: cv.glmnet (alpha = 0.5) is expected
to identify the three real predictors and shrink the ten dummies toward
zero. Without them, regularization on three well-identified predictors
merely biases the true coefficients without selecting anything.
Responses are generated from a linear model with Gaussian noise:
The true parameters are stored in et_sim$true_params for
parameter-recovery validation.
The dataset was generated with set.seed(111). The full generation
script is at data-raw/generate_et_sim.R.
Generated by data-raw/generate_et_sim.R with
set.seed(111).
data(et_sim) # Inspect structure str(et_sim, max.level = 2) # Training data head(et_sim$train) # True parameters for cluster A et_sim$true_params$A # Suggested noise SDs for et_predict() et_sim$env_noisedata(et_sim) # Inspect structure str(et_sim, max.level = 2) # Training data head(et_sim$train) # True parameters for cluster A et_sim$true_params$A # Suggested noise SDs for et_predict() et_sim$env_noise
Minimal ggplot2 theme for ErrorTracer plots
et_theme(base_size = 12)et_theme(base_size = 12)
base_size |
Base font size (default 12). |
A ggplot2 theme object.
Converts a fitted model object into a brms prior specification
suitable for et_fit. The coefficient estimates (or
importance weights for ranger) from the regularized or standard fit are used as
informative prior means, so the Bayesian model starts close to the
regularized or standard solution while remaining open to data-driven revision.
extract_priors( model, multiplier = 2, min_sd = 0.1, intercept_prior_sd = NULL, sigma_prior_scale = 1, ... ) ## S3 method for class 'lm' extract_priors( model, multiplier = 2, min_sd = 0.1, intercept_prior_sd = NULL, sigma_prior_scale = 1, ... ) ## S3 method for class 'glm' extract_priors( model, multiplier = 2, min_sd = 0.1, intercept_prior_sd = NULL, sigma_prior_scale = 1, ... ) ## S3 method for class 'cv.glmnet' extract_priors( model, multiplier = 2, min_sd = 0.1, intercept_prior_sd = NULL, sigma_prior_scale = 1, lambda = "lambda.min", ... ) ## S3 method for class 'glmnet' extract_priors( model, multiplier = 2, min_sd = 0.1, intercept_prior_sd = NULL, sigma_prior_scale = 1, s = 1L, ... ) ## S3 method for class 'ranger' extract_priors( model, multiplier = 2, min_sd = 0.1, intercept_prior_sd = NULL, sigma_prior_scale = 1, ... )extract_priors( model, multiplier = 2, min_sd = 0.1, intercept_prior_sd = NULL, sigma_prior_scale = 1, ... ) ## S3 method for class 'lm' extract_priors( model, multiplier = 2, min_sd = 0.1, intercept_prior_sd = NULL, sigma_prior_scale = 1, ... ) ## S3 method for class 'glm' extract_priors( model, multiplier = 2, min_sd = 0.1, intercept_prior_sd = NULL, sigma_prior_scale = 1, ... ) ## S3 method for class 'cv.glmnet' extract_priors( model, multiplier = 2, min_sd = 0.1, intercept_prior_sd = NULL, sigma_prior_scale = 1, lambda = "lambda.min", ... ) ## S3 method for class 'glmnet' extract_priors( model, multiplier = 2, min_sd = 0.1, intercept_prior_sd = NULL, sigma_prior_scale = 1, s = 1L, ... ) ## S3 method for class 'ranger' extract_priors( model, multiplier = 2, min_sd = 0.1, intercept_prior_sd = NULL, sigma_prior_scale = 1, ... )
model |
A fitted model. Supported classes:
|
multiplier |
Numeric scalar. Prior SD is set to
|
min_sd |
Numeric scalar. Minimum prior SD to avoid degenerate (spike) priors on near-zero coefficients. Default 0.1. |
intercept_prior_sd |
Optional prior SD for the intercept term. The
default |
sigma_prior_scale |
Scale parameter for the half-Cauchy prior on the residual SD sigma. Default 1.0. |
... |
Additional arguments passed to methods. |
lambda |
Which lambda to extract coefficients from when the model is a
|
s |
Index (integer) or value of lambda to extract coefficients at when
the model is a plain |
For ranger models, signed coefficients are not available. Priors
are centred at zero (direction unknown) and the prior SD for each predictor
is set to multiplier * importance_normalised, where importance is
normalised to the [min_sd, 1] interval. Only variables with
positive permutation importance are included.
An et_prior_spec list containing:
priorA brmsprior object for use in brms::brm().
pred_namesCharacter vector of included predictor names.
coefsNamed numeric vector of regularized coefficients (NULL for ranger, which uses importance instead).
methodCharacter: the dispatch method used.
multiplier, min_sd
Settings echo.
fit_lm <- lm(mpg ~ wt + hp + cyl, data = mtcars) ps <- extract_priors(fit_lm, multiplier = 2, min_sd = 0.1) print(ps)fit_lm <- lm(mpg ~ wt + hp + cyl, data = mtcars) ps <- extract_priors(fit_lm, multiplier = 2, min_sd = 0.1) print(ps)
Quantifies when a forecast becomes uninformative by comparing the
width of credible intervals to a response scale. A forecast is
uninformative when its CI width exceeds threshold * response_scale.
shelf_life( predictions, response_scale, ci_level = 0.9, threshold = 1, time_col = NULL, min_slope_for_projection = 1e-04, max_extrapolation_factor = 10, ..., plausible_range = NULL )shelf_life( predictions, response_scale, ci_level = 0.9, threshold = 1, time_col = NULL, min_slope_for_projection = 1e-04, max_extrapolation_factor = 10, ..., plausible_range = NULL )
predictions |
An |
response_scale |
Numeric vector of length 2 ( |
ci_level |
Numeric. The credible interval level to use (default
0.90). Must be present in the |
threshold |
Numeric. CI width / response scale above which the forecast is uninformative (default 1.0). |
time_col |
Character. Optional column in
|
min_slope_for_projection |
Numeric. Minimum linear slope (of
ratio vs.\ time) required to attempt extrapolation when all periods
are informative. Below this value the shelf life is reported as a
lower bound only. Default |
max_extrapolation_factor |
Numeric. Cap on how far the linear
projection may reach beyond the observed window. If the projected
crossing time exceeds
|
... |
Unused. |
plausible_range |
Deprecated. Use |
The function operates in three modes depending on the available data and whether the uninformative threshold is crossed within the forecast window:
The threshold is crossed within the
forecast/validation window. The shelf life is the first time point
at which ratio >= threshold.
All forecast periods remain informative but
the CI/range ratio is trending upward. A linear trend is fitted to
the ratios and extrapolated to estimate when the threshold would be
reached. The projected crossing time
(where is the threshold, the fitted intercept,
the fitted slope) is reported together with a Monte Carlo
standard error se_t_star derived via the delta method.
All forecast periods are informative with
no upward trend in the ratio. The shelf life is reported as a lower
bound: > last observed time.
The intended workflow is:
Fit the model (et_fit).
Predict on held-out data or a future time window
(et_predict).
Call shelf_life() on those predictions.
If the threshold is not crossed in the held-out window, the projected mode extrapolates the horizon automatically.
An et_shelf_life object (a data.frame) with columns:
Observation index.
Time axis value.
Width of the credible interval at ci_level.
Effective response scale (scalar diff).
CI width / response scale.
Logical; TRUE when ratio < threshold.
For grouped predictions a group column is prepended.
The object carries three attributes that drive print():
horizonNamed list with elements value,
type ("observed", "projected", or
"lower_bound"), last_informative,
description, and (for projected) se_t_star.
horizon_by_groupFor grouped objects: named list of per-group horizon lists.
thresholdThe threshold value used.
set.seed(1) df <- data.frame(y = rnorm(20), year = 2001:2020, x1 = rnorm(20)) fit <- et_fit(y ~ x1, data = df, chains = 1, iter = 500, warmup = 250, cores = 1, refresh = 0) new_df <- data.frame(x1 = rnorm(10), year = 2021:2030) pred <- et_predict(fit, newdata = new_df, n_draws = 200, n_perturb = 50) sl <- shelf_life(pred, response_scale = range(df$y), ci_level = 0.90, threshold = 1.0, time_col = "year", max_extrapolation_factor = 10) print(sl)set.seed(1) df <- data.frame(y = rnorm(20), year = 2001:2020, x1 = rnorm(20)) fit <- et_fit(y ~ x1, data = df, chains = 1, iter = 500, warmup = 250, cores = 1, refresh = 0) new_df <- data.frame(x1 = rnorm(10), year = 2021:2030) pred <- et_predict(fit, newdata = new_df, n_draws = 200, n_perturb = 50) sl <- shelf_life(pred, response_scale = range(df$y), ci_level = 0.90, threshold = 1.0, time_col = "year", max_extrapolation_factor = 10) print(sl)
Standardize a numeric vector (mean-centre, unit-variance)
standardize(x)standardize(x)
x |
Numeric vector. |
Standardized numeric vector. Returns zeros if variance is zero.
Reverse standardization
unstandardize(z, mu, s)unstandardize(z, mu, s)
z |
Standardized values. |
mu |
Original mean. |
s |
Original standard deviation. |
Values on the original scale.