biogeme.bayesian_estimation.bayesian_results module

Derived Bayesian results (posterior summaries) built from RawBayesianResults.

  • Posterior mean -> ‘estimate’ (analogous to MLE estimate)

  • Posterior std -> ‘std_err’ (analogous to MLE standard error)

  • z = mean / std -> ‘z_value’ (rough MLE-like t-stat analogue)

  • p(two-sided) -> min(2*P(theta>0), 2*P(theta<0)) from posterior draws

  • HDI -> credible interval (e.g., 94% by default)

Michel Bierlaire Mon Nov 03 2025, 08:55:59

class biogeme.bayesian_estimation.bayesian_results.BayesianResults(raw, *, calculate_likelihood, calculate_waic, calculate_loo, hdi_prob=0.94, strict=False)[source]

Bases: object

Posterior summaries for parameters, derived from RawBayesianResults.

parameters: dict mapping parameter name -> EstimatedBeta

Parameters:
  • raw (RawBayesianResults)

  • calculate_likelihood (bool)

  • calculate_waic (bool)

  • calculate_loo (bool)

  • hdi_prob (float)

  • strict (bool)

array_metadata: dict[str, dict]
arviz_summary()[source]
Return type:

DataFrame

property best_draw_log_likelihood: float | None
chains: int
data_name: str
draws: int
dump(path)[source]

Write the underlying posterior + metadata to a single NetCDF file.

Delegates to RawBayesianResults.save().

Parameters:

path (str) – Output path for the NetCDF file.

Return type:

None

ensure_diagnostics()[source]

Compute R-hat and ESS lazily. Cached after first attempt.

Return type:

None

property expected_log_likelihood: float | None

Posterior expectation of the total log-likelihood.

Computes E_theta[ log L(Y|theta) ] across posterior draws. For pointwise arrays of shape (chain, draw, obs), totals are formed by summing over observations first.

Returns:

Expected total log-likelihood, or None if likelihood was not computed.

Raises:

ValueError – If the stored log-likelihood has an unexpected shape.

classmethod from_netcdf(filename, *, calculate_likelihood=True, calculate_waic=True, calculate_loo=True, hdi_prob=0.94, strict=False)[source]

Alternate constructor: build results directly from a NetCDF file.

This uses RawBayesianResults.load() under the hood and then computes posterior summaries.

Parameters:
  • filename (str) – Path to the NetCDF file.

  • calculate_likelihood (bool) – If True, expose/add the ArviZ log_likelihood group and enable predictive criteria.

  • calculate_waic (bool) – If True, compute WAIC (requires calculate_likelihood=True).

  • calculate_loo (bool) – If True, compute LOO (requires calculate_likelihood=True).

  • hdi_prob (float) – Credible mass for the Highest Density Interval.

  • strict (bool) – If True, raise when posterior variables have extra dimensions beyond (chain, draw).

Return type:

BayesianResults

Returns:

A BayesianResults instance built from the file.

generate_general_information()[source]
get_beta_values(my_betas=None, *, summary=PosteriorSummary.MEAN)[source]

Retrieve posterior point estimates for a set of parameters.

Parameters:
  • my_betas (list[str] | None) – names of requested parameters. If None, all parameters are returned.

  • summary (PosteriorSummary) – PosteriorSummary enum specifying whether to return the posterior mean, median, or mode. Default: MEAN.

Return type:

dict[str, float]

get_betas_for_sensitivity_analysis(my_betas=None, size=100)[source]

Generate draws from the distribution of the estimates, for sensitivity analysis.

Parameters:
  • my_betas (list[str] | None) – names of the parameters for which draws are requested.

  • size (int) – number of draws. Default: 100.

Return type:

list[dict[str, float]]

Returns:

list of dict. Each dict has a many entries as parameters. The list has as many entries as draws.

hdi_prob: float
property idata: InferenceData
identification_diagnostics(*, identification_threshold, prior_idata=None, var_names=None)[source]

Compute heuristic diagnostics for potential identification issues.

Designed for the workflow where a posterior arviz.InferenceData is available and an optional prior_idata is produced via pm.sample_prior_predictive(..., return_inferencedata=True).

If prior_idata is provided, it is merged into the stored InferenceData using idata.extend(prior_idata) so the resulting NetCDF can contain both posterior and prior groups.

The diagnostics are heuristics (not proofs):

  • Eigen-structure of the posterior covariance (near-zero eigenvalues / large condition number) can indicate weak or non-identification.

  • Comparing posterior vs prior marginal scales highlights parameters that may be largely “identified by the prior” (posterior std close to prior std).

Parameters:
  • prior_idata (InferenceData | None) – Optional prior InferenceData to merge before computing diagnostics.

  • var_names (list[str] | None) – Variables to analyze. If None, uses raw_bayesian_results.beta_names filtered to scalar variables present in the posterior.

  • identification_threshold (float)

Return type:

dict[str, Any]

Returns:

Dictionary with keys has_prior, posterior_cov, prior_cov, per_parameter (DataFrame), flags (list of strings), and (if detected) posterior_near_null_direction / prior_near_null_direction.

list_array_variables()[source]

Return metadata for posterior variables that have extra dims beyond (chain, draw).

Each entry contains: dims (tuple), shape (tuple), sizes (dict), dtype (str).

Return type:

dict[str, dict]

property log_likelihood
property loo: float | None
property loo_res
property loo_se
other_variables()[source]

Return posterior scalar variables that are not listed as parameters.

Useful to expose derived/deterministic quantities stored in the posterior (e.g., total log-likelihood) without mixing them with parameter estimates.

Return type:

dict[str, EstimatedBeta]

property p_loo
property p_waic
parameter_estimates()[source]

Return only the parameters explicitly listed in raw_bayesian_results.beta_names.

Missing names are ignored silently (they may have been skipped if multidimensional or missing in the posterior). The returned dict maps name -> EstimatedBeta.

Return type:

dict[str, EstimatedBeta]

parameters: dict[str, EstimatedBeta]
property posterior_draws: int
posterior_mean_by_observation(var_name)[source]

Return a DataFrame giving the posterior mean for each observation of the requested variable.

The variable must have shape (chain, draw, obs_dim), i.e., exactly one dimension besides ‘chain’ and ‘draw’. The returned DataFrame has one row per observation, indexed by the observation coordinate if available.

Parameters:

var_name (str) – Name of the posterior variable to summarize.

Return type:

DataFrame

Returns:

pd.DataFrame with index = observation and column = posterior mean of var_name.

Raises:

BiogemeError – if the variable is not present, not an array, or not indexed by a single observation dimension.

property posterior_predictive_loglike: float | None

Posterior-predictive log density.

Computes sum_n log(mean_{chain,draw} p(y_n|theta)) using the log-likelihood draws. This is a posterior-predictive criterion (log pointwise predictive density via arithmetic averaging over theta); it is not the maximum-likelihood log-likelihood.

Returns:

Posterior-predictive log density, or None if likelihood was not computed.

Raises:

ValueError – If the stored log-likelihood has an unexpected shape.

report_stored_variables()[source]

Report all variables stored in the underlying NetCDF/InferenceData.

This is a convenience method to inspect what PyMC/ArviZ stored in the results file. It lists each variable together with its group, dimensions, and shape. The dimensions typically include chain and draw for posterior quantities.

Return type:

DataFrame

Returns:

A DataFrame with columns group, variable, dims, and shape.

Raises:

BiogemeError – If the inference data is missing or malformed.

short_summary()[source]
summarize_array_variable(name, *, dim, indices=None, hdi_prob=None)[source]

Summarize a multi-dimensional posterior variable for selected indices along one extra dimension.

Parameters:
  • name (str) – Name of the posterior variable to summarize (must be present in array_metadata).

  • dim (str) – Name of the extra dimension along which indices are selected (e.g., an observation dimension).

  • indices (list[int] | None) – Indices to summarize. If None, summarize all indices (may be large).

  • hdi_prob (float | None) – If provided, overrides the instance hdi_prob for this call.

Return type:

dict[int, EstimatedBeta]

Returns:

Mapping index -> EstimatedBeta computed from samples across chains/draws.

Raises:

KeyError – If the variable or dimension is unknown.

property waic
property waic_res
property waic_se
class biogeme.bayesian_estimation.bayesian_results.EstimatedBeta(name, mean, median, mode, std_err, z_value, p_value, hdi_low, hdi_high, rhat, effective_sample_size_bulk, effective_sample_size_tail)[source]

Bases: object

Parameters:
  • name (str)

  • mean (float)

  • median (float)

  • mode (float)

  • std_err (float)

  • z_value (float | None)

  • p_value (float | None)

  • hdi_low (float | None)

  • hdi_high (float | None)

  • rhat (float)

  • effective_sample_size_bulk (float)

  • effective_sample_size_tail (float)

documentation: ClassVar[dict[str, str]] = {'ESS (bulk)': 'Effective sample size for the central part of the posterior; values above ~400 are generally considered sufficient.', 'ESS (tail)': 'Effective sample size for the posterior tails; values above ~100 ensure reliable estimates of extreme quantiles.', 'HDI low / HDI high': 'Lower and upper bounds of the Highest Density Interval containing the most probable parameter values.', 'Median': 'Posterior median (50% quantile) of the parameter.', 'Mode': 'Posterior mode (most frequent value) of the parameter', 'Name': 'Identifier of the model parameter being estimated.', 'R-hat (Gelman–Rubin)': 'Convergence diagnostic; values very close to 1 (typically 1.01) indicate well-mixed chains.', 'Std err.': 'Posterior standard deviation, measuring uncertainty around the mean.', 'Value': 'Posterior mean (expected value) of the parameter.', 'p-value': 'Two-sided Bayesian tail probability that the parameter differs in sign from zero.', 'z-value': 'Standardized estimate (mean divided by std. dev.), indicating signal-to-noise ratio.'}
effective_sample_size_bulk: float
effective_sample_size_tail: float
hdi_high: float | None
hdi_low: float | None
mean: float
median: float
mode: float
name: str
p_value: float | None
rhat: float
std_err: float
z_value: float | None
class biogeme.bayesian_estimation.bayesian_results.PosteriorSummary(*values)[source]

Bases: str, Enum

Type of posterior point estimate to extract.

MEAN = 'mean'
MEDIAN = 'median'
MODE = 'mode'