biogeme.bayesian_estimation.bayesian_results module¶

Derived Bayesian results (posterior summaries) built from RawBayesianResults.

Posterior mean -> ‘estimate’ (analogous to MLE estimate)
Posterior std -> ‘std_err’ (analogous to MLE standard error)
z = mean / std -> ‘z_value’ (rough MLE-like t-stat analogue)
p(two-sided) -> min(2*P(theta>0), 2*P(theta<0)) from posterior draws
HDI -> credible interval (e.g., 94% by default)

Michel Bierlaire Mon Nov 03 2025, 08:55:59

class biogeme.bayesian_estimation.bayesian_results.BayesianResults(raw, *, calculate_likelihood, calculate_waic, calculate_loo, hdi_prob=0.94, strict=False)[source]¶

Bases: object

Posterior summaries for parameters, derived from RawBayesianResults.

parameters: dict mapping parameter name -> EstimatedBeta

Parameters:

raw (RawBayesianResults)
calculate_likelihood (bool)
calculate_waic (bool)
calculate_loo (bool)
hdi_prob (float)
strict (bool)

array_metadata: dict[str, dict]¶

arviz_summary()[source]¶

Return type:: DataFrame

property best_draw_log_likelihood: float | None¶

chains: int¶

data_name: str¶

draws: int¶

dump(path)[source]¶

Write the underlying posterior + metadata to a single NetCDF file.

Delegates to RawBayesianResults.save().

Parameters:: path (str) – Output path for the NetCDF file.
Return type:: None

ensure_diagnostics()[source]¶

Compute R-hat and ESS lazily. Cached after first attempt.

Return type:: None

property expected_log_likelihood: float | None¶

Posterior expectation of the total log-likelihood.

Computes E_theta[ log L(Y|theta) ] across posterior draws. For pointwise arrays of shape (chain, draw, obs), totals are formed by summing over observations first.

Returns:: Expected total log-likelihood, or None if likelihood was not computed.
Raises:: ValueError – If the stored log-likelihood has an unexpected shape.

classmethod from_netcdf(filename, *, calculate_likelihood=True, calculate_waic=True, calculate_loo=True, hdi_prob=0.94, strict=False)[source]¶

Alternate constructor: build results directly from a NetCDF file.

This uses RawBayesianResults.load() under the hood and then computes posterior summaries.

Parameters:

filename (str) – Path to the NetCDF file.
calculate_likelihood (bool) – If True, expose/add the ArviZ log_likelihood group and enable predictive criteria.
calculate_waic (bool) – If True, compute WAIC (requires calculate_likelihood=True).
calculate_loo (bool) – If True, compute LOO (requires calculate_likelihood=True).
hdi_prob (float) – Credible mass for the Highest Density Interval.
strict (bool) – If True, raise when posterior variables have extra dimensions beyond (chain, draw).

Return type:

BayesianResults

Returns:

A BayesianResults instance built from the file.

generate_general_information()[source]¶

get_beta_values(my_betas=None, *, summary=PosteriorSummary.MEAN)[source]¶

Retrieve posterior point estimates for a set of parameters.

Parameters:

my_betas (list[str] | None) – names of requested parameters. If None, all parameters are returned.
summary (PosteriorSummary) – PosteriorSummary enum specifying whether to return the posterior mean, median, or mode. Default: MEAN.

Return type:

dict[str, float]

get_betas_for_sensitivity_analysis(my_betas=None, size=100)[source]¶

Generate draws from the distribution of the estimates, for sensitivity analysis.

Parameters:

my_betas (list[str] | None) – names of the parameters for which draws are requested.
size (int) – number of draws. Default: 100.

Return type:

list[dict[str, float]]

Returns:

list of dict. Each dict has a many entries as parameters. The list has as many entries as draws.

hdi_prob: float¶

property idata: InferenceData¶

identification_diagnostics(*, identification_threshold, prior_idata=None, var_names=None)[source]¶

Compute heuristic diagnostics for potential identification issues.

Designed for the workflow where a posterior arviz.InferenceData is available and an optional prior_idata is produced via pm.sample_prior_predictive(..., return_inferencedata=True).

If prior_idata is provided, it is merged into the stored InferenceData using idata.extend(prior_idata) so the resulting NetCDF can contain both posterior and prior groups.

The diagnostics are heuristics (not proofs):

Eigen-structure of the posterior covariance (near-zero eigenvalues / large condition number) can indicate weak or non-identification.
Comparing posterior vs prior marginal scales highlights parameters that may be largely “identified by the prior” (posterior std close to prior std).

Parameters:

prior_idata (InferenceData | None) – Optional prior InferenceData to merge before computing diagnostics.
var_names (list[str] | None) – Variables to analyze. If None, uses raw_bayesian_results.beta_names filtered to scalar variables present in the posterior.
identification_threshold (float)

Return type:

dict[str, Any]

Returns:

Dictionary with keys has_prior, posterior_cov, prior_cov, per_parameter (DataFrame), flags (list of strings), and (if detected) posterior_near_null_direction / prior_near_null_direction.

list_array_variables()[source]¶

Return metadata for posterior variables that have extra dims beyond (chain, draw).

Each entry contains: dims (tuple), shape (tuple), sizes (dict), dtype (str).

Return type:: dict[str, dict]

property log_likelihood¶

property loo: float | None¶

property loo_res¶

property loo_se¶

other_variables()[source]¶

Return posterior scalar variables that are not listed as parameters.

Useful to expose derived/deterministic quantities stored in the posterior (e.g., total log-likelihood) without mixing them with parameter estimates.

Return type:: dict[str, EstimatedBeta]

property p_loo¶

property p_waic¶

parameter_estimates()[source]¶

Return only the parameters explicitly listed in raw_bayesian_results.beta_names.

Missing names are ignored silently (they may have been skipped if multidimensional or missing in the posterior). The returned dict maps name -> EstimatedBeta.

Return type:: dict[str, EstimatedBeta]

parameters: dict[str, EstimatedBeta]¶

property posterior_draws: int¶

posterior_mean_by_observation(var_name)[source]¶

Return a DataFrame giving the posterior mean for each observation of the requested variable.

The variable must have shape (chain, draw, obs_dim), i.e., exactly one dimension besides ‘chain’ and ‘draw’. The returned DataFrame has one row per observation, indexed by the observation coordinate if available.

Parameters:: var_name (str) – Name of the posterior variable to summarize.
Return type:: DataFrame
Returns:: pd.DataFrame with index = observation and column = posterior mean of var_name.
Raises:: BiogemeError – if the variable is not present, not an array, or not indexed by a single observation dimension.

property posterior_predictive_loglike: float | None¶

Posterior-predictive log density.

Computes sum_n log(mean_{chain,draw} p(y_n|theta)) using the log-likelihood draws. This is a posterior-predictive criterion (log pointwise predictive density via arithmetic averaging over theta); it is not the maximum-likelihood log-likelihood.

Returns:: Posterior-predictive log density, or None if likelihood was not computed.
Raises:: ValueError – If the stored log-likelihood has an unexpected shape.

report_stored_variables()[source]¶

Report all variables stored in the underlying NetCDF/InferenceData.

This is a convenience method to inspect what PyMC/ArviZ stored in the results file. It lists each variable together with its group, dimensions, and shape. The dimensions typically include chain and draw for posterior quantities.

Return type:: DataFrame
Returns:: A DataFrame with columns group, variable, dims, and shape.
Raises:: BiogemeError – If the inference data is missing or malformed.

short_summary()[source]¶

summarize_array_variable(name, *, dim, indices=None, hdi_prob=None)[source]¶

Summarize a multi-dimensional posterior variable for selected indices along one extra dimension.

Parameters:

name (str) – Name of the posterior variable to summarize (must be present in array_metadata).
dim (str) – Name of the extra dimension along which indices are selected (e.g., an observation dimension).
indices (list[int] | None) – Indices to summarize. If None, summarize all indices (may be large).
hdi_prob (float | None) – If provided, overrides the instance hdi_prob for this call.

Return type:

dict[int, EstimatedBeta]

Returns:

Mapping index -> EstimatedBeta computed from samples across chains/draws.

Raises:

KeyError – If the variable or dimension is unknown.

property waic¶

property waic_res¶

property waic_se¶

class biogeme.bayesian_estimation.bayesian_results.EstimatedBeta(name, mean, median, mode, std_err, z_value, p_value, hdi_low, hdi_high, rhat, effective_sample_size_bulk, effective_sample_size_tail)[source]¶

Bases: object

Parameters:

name (str)
mean (float)
median (float)
mode (float)
std_err (float)
z_value (float | None)
p_value (float | None)
hdi_low (float | None)
hdi_high (float | None)
rhat (float)
effective_sample_size_bulk (float)
effective_sample_size_tail (float)

documentation: ClassVar[dict[str, str]] = {'ESS (bulk)': 'Effective sample size for the central part of the posterior; values above ~400 are generally considered sufficient.', 'ESS (tail)': 'Effective sample size for the posterior tails; values above ~100 ensure reliable estimates of extreme quantiles.', 'HDI low / HDI high': 'Lower and upper bounds of the Highest Density Interval containing the most probable parameter values.', 'Median': 'Posterior median (50% quantile) of the parameter.', 'Mode': 'Posterior mode (most frequent value) of the parameter', 'Name': 'Identifier of the model parameter being estimated.', 'R-hat (Gelman–Rubin)': 'Convergence diagnostic; values very close to 1 (typically ≤ 1.01) indicate well-mixed chains.', 'Std err.': 'Posterior standard deviation, measuring uncertainty around the mean.', 'Value': 'Posterior mean (expected value) of the parameter.', 'p-value': 'Two-sided Bayesian tail probability that the parameter differs in sign from zero.', 'z-value': 'Standardized estimate (mean divided by std. dev.), indicating signal-to-noise ratio.'}¶

effective_sample_size_bulk: float¶

effective_sample_size_tail: float¶

hdi_high: float | None¶

hdi_low: float | None¶

mean: float¶

median: float¶

mode: float¶

name: str¶

p_value: float | None¶

rhat: float¶

std_err: float¶

z_value: float | None¶

class biogeme.bayesian_estimation.bayesian_results.PosteriorSummary(*values)[source]¶

Bases: str, Enum

Type of posterior point estimate to extract.

MEAN = 'mean'¶

MEDIAN = 'median'¶

MODE = 'mode'¶