arviz_stats.update_subsample#
- arviz_stats.update_subsample(loo_orig, data, observations=None, var_name=None, reff=None, log_weights=None, seed=315, method='lpd', log_lik_fn=None, param_names=None, log=True)[source]#
Update a sub-sampled PSIS-LOO-CV object with new observations.
Extends a sub-sampled PSIS-LOO-CV result by adding new observations to the sub-sample without recomputing values for previously sampled observations. This allows for incrementally improving the sub-sampled PSIS-LOO-CV estimate with additional observations.
The sub-sampling method is described in [1].
- Parameters:
- loo_orig
ELPDData
Original PSIS-LOO-CV result created with
loo_subsample
withpointwise=True
.- data
xarray.DataTree
orInferenceData
Input data. It should contain the posterior and the log_likelihood groups.
- observations
int
orndarray
, optional The additional observations to use:
An integer specifying the number of new observations to randomly sub-sample without replacement.
An array of integer indices specifying the exact new observations to use.
If None or 0, returns the original PSIS-LOO-CV result unchanged.
- var_name
str
, optional The name of the variable in log_likelihood groups storing the pointwise log likelihood data to use for loo computation.
- reff
float
, optional Relative MCMC efficiency,
ess / n
i.e. number of effective samples divided by the number of actual samples. Computed from trace by default.- log_weights
xarray.DataArray
orELPDData
, optional Smoothed log weights. Can be either:
A DataArray with the same shape as the log likelihood data
An ELPDData object from a previous
arviz_stats.loo
call.
Defaults to None. If not provided, it will be computed using the PSIS-LOO method.
- seed
int
, optional Seed for random sampling.
- method: str, optional
Method used for approximating the pointwise log predictive density:
‘lpd’: Use standard log predictive density approximation (default)
‘plpd’: Use Point Log Predictive Density approximation which requires a
log_lik_fn
.
- log_lik_fn
callable
, optional A function that computes the log-likelihood for a single observation given the mean values of posterior parameters. Required only when
method="plpd"
. The function must accept the observed data value for a single point as its first argument (scalar). Subsequent arguments must correspond to the mean values of the posterior parameters specified byparam_names
, passed in the same order. It should return a single scalar log-likelihood value.- param_names: list, optional
Only used when
method="plpd"
. List of parameter names to extract from the posterior. If None, all parameters are used.- log: bool, optional
Only used when
method="plpd"
. Whether thelog_lik_fn
returns log-likelihood (True) or likelihood (False). Default is True.
- loo_orig
- Returns:
ELPDData
Object with the following attributes:
elpd: updated approximated expected log pointwise predictive density (elpd)
se: standard error of the elpd (includes approximation and sampling uncertainty)
p: effective number of parameters
n_samples: number of samples in the posterior
n_data_points: total number of data points (N)
warning: True if the estimated shape parameter k of the Pareto distribution is >
good_k
for any observation in the subsample.elpd_i:
DataArray
with pointwise elpd values (filled with NaNs for non-subsampled points), only ifpointwise=True
.pareto_k:
DataArray
with Pareto shape values for the subsample (filled with NaNs for non-subsampled points), only ifpointwise=True
.scale: scale of the elpd results (“log”, “negative_log”, or “deviance”).
good_k: Threshold for Pareto k warnings.
approx_posterior: True if approximate posterior was used.
subsampling_se: Standard error estimate from subsampling uncertainty only.
subsample_size: Number of observations in the subsample (original + new).
log_p: Log density of the target posterior.
log_q: Log density of the proposal posterior.
thin: Thinning factor for posterior draws.
log_weights: Smoothed log weights.
See also
loo
Exact PSIS-LOO cross-validation.
loo_subsample
PSIS-LOO-CV with subsampling.
compare
Compare models based on ELPD.
References
[1]Magnusson, M., Riis Andersen, M., Jonasson, J., & Vehtari, A. Bayesian Leave-One-Out Cross-Validation for Large Data. Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4244–4253 (2019) https://proceedings.mlr.press/v97/magnusson19a.html arXiv preprint https://arxiv.org/abs/1904.10679
Examples
Calculate initial sub-sampled PSIS-LOO-CV using 4 observations, then update with 4 more:
In [1]: from arviz_stats import loo_subsample, update_subsample ...: from arviz_base import load_arviz_data ...: data = load_arviz_data("non_centered_eight") ...: initial_loo = loo_subsample(data, observations=4, var_name="obs", pointwise=True) ...: updated_loo = update_subsample(initial_loo, data, observations=2) ...: updated_loo ...: Out[1]: Computed from 2000 by 6 subsampled log-likelihood values from 8 total observations. Estimate SE subsampling SE elpd_loo -30.8 1.4 0.2 p_loo 1.0 ------ Pareto k diagnostic values: Count Pct. (-Inf, 0.70] (good) 6 100.0% (0.70, 1] (bad) 0 0.0% (1, Inf) (very bad) 0 0.0%