arviz_stats.update_subsample

Contents

arviz_stats.update_subsample#

arviz_stats.update_subsample(loo_orig, data, observations=None, var_name=None, reff=None, log_weights=None, seed=315, method='lpd', log_lik_fn=None, param_names=None, log=True)[source]#

Update a sub-sampled PSIS-LOO-CV object with new observations.

Extends a sub-sampled PSIS-LOO-CV result by adding new observations to the sub-sample without recomputing values for previously sampled observations. This allows for incrementally improving the sub-sampled PSIS-LOO-CV estimate with additional observations.

The sub-sampling method is described in [1].

Parameters:
loo_origELPDData

Original PSIS-LOO-CV result created with loo_subsample with pointwise=True.

dataxarray.DataTree or InferenceData

Input data. It should contain the posterior and the log_likelihood groups.

observationsint or ndarray, optional

The additional observations to use:

  • An integer specifying the number of new observations to randomly sub-sample without replacement.

  • An array of integer indices specifying the exact new observations to use.

  • If None or 0, returns the original PSIS-LOO-CV result unchanged.

var_namestr, optional

The name of the variable in log_likelihood groups storing the pointwise log likelihood data to use for loo computation.

refffloat, optional

Relative MCMC efficiency, ess / n i.e. number of effective samples divided by the number of actual samples. Computed from trace by default.

log_weightsxarray.DataArray or ELPDData, optional

Smoothed log weights. Can be either:

  • A DataArray with the same shape as the log likelihood data

  • An ELPDData object from a previous arviz_stats.loo call.

Defaults to None. If not provided, it will be computed using the PSIS-LOO method.

seedint, optional

Seed for random sampling.

method: str, optional

Method used for approximating the pointwise log predictive density:

  • ‘lpd’: Use standard log predictive density approximation (default)

  • ‘plpd’: Use Point Log Predictive Density approximation which requires a log_lik_fn.

log_lik_fncallable, optional

A function that computes the log-likelihood for a single observation given the mean values of posterior parameters. Required only when method="plpd". The function must accept the observed data value for a single point as its first argument (scalar). Subsequent arguments must correspond to the mean values of the posterior parameters specified by param_names, passed in the same order. It should return a single scalar log-likelihood value.

param_names: list, optional

Only used when method="plpd". List of parameter names to extract from the posterior. If None, all parameters are used.

log: bool, optional

Only used when method="plpd". Whether the log_lik_fn returns log-likelihood (True) or likelihood (False). Default is True.

Returns:
ELPDData

Object with the following attributes:

  • elpd: updated approximated expected log pointwise predictive density (elpd)

  • se: standard error of the elpd (includes approximation and sampling uncertainty)

  • p: effective number of parameters

  • n_samples: number of samples in the posterior

  • n_data_points: total number of data points (N)

  • warning: True if the estimated shape parameter k of the Pareto distribution is > good_k for any observation in the subsample.

  • elpd_i: DataArray with pointwise elpd values (filled with NaNs for non-subsampled points), only if pointwise=True.

  • pareto_k: DataArray with Pareto shape values for the subsample (filled with NaNs for non-subsampled points), only if pointwise=True.

  • scale: scale of the elpd results (“log”, “negative_log”, or “deviance”).

  • good_k: Threshold for Pareto k warnings.

  • approx_posterior: True if approximate posterior was used.

  • subsampling_se: Standard error estimate from subsampling uncertainty only.

  • subsample_size: Number of observations in the subsample (original + new).

  • log_p: Log density of the target posterior.

  • log_q: Log density of the proposal posterior.

  • thin: Thinning factor for posterior draws.

  • log_weights: Smoothed log weights.

See also

loo

Exact PSIS-LOO cross-validation.

loo_subsample

PSIS-LOO-CV with subsampling.

compare

Compare models based on ELPD.

References

[1]

Magnusson, M., Riis Andersen, M., Jonasson, J., & Vehtari, A. Bayesian Leave-One-Out Cross-Validation for Large Data. Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4244–4253 (2019) https://proceedings.mlr.press/v97/magnusson19a.html arXiv preprint https://arxiv.org/abs/1904.10679

Examples

Calculate initial sub-sampled PSIS-LOO-CV using 4 observations, then update with 4 more:

In [1]: from arviz_stats import loo_subsample, update_subsample
   ...: from arviz_base import load_arviz_data
   ...: data = load_arviz_data("non_centered_eight")
   ...: initial_loo = loo_subsample(data, observations=4, var_name="obs", pointwise=True)
   ...: updated_loo = update_subsample(initial_loo, data, observations=2)
   ...: updated_loo
   ...: 
Out[1]: 
Computed from 2000 by 6 subsampled log-likelihood
values from 8 total observations.

         Estimate   SE subsampling SE
elpd_loo     -30.8  1.4            0.2
p_loo          1.0

------

Pareto k diagnostic values:
                         Count   Pct.
(-Inf, 0.70]   (good)        6  100.0%
   (0.70, 1]   (bad)         0    0.0%
    (1, Inf)   (very bad)    0    0.0%