biogeme.bayesian_estimation.bayesian_results module¶
Derived Bayesian results (posterior summaries) built from RawBayesianResults.
Posterior mean -> ‘estimate’ (analogous to MLE estimate)
Posterior std -> ‘std_err’ (analogous to MLE standard error)
z = mean / std -> ‘z_value’ (rough MLE-like t-stat analogue)
p(two-sided) -> min(2*P(theta>0), 2*P(theta<0)) from posterior draws
HDI -> credible interval (e.g., 94% by default)
Michel Bierlaire Mon Nov 03 2025, 08:55:59
- class biogeme.bayesian_estimation.bayesian_results.BayesianResults(raw, *, calculate_likelihood, calculate_waic, calculate_loo, hdi_prob=0.94, strict=False)[source]¶
Bases:
objectPosterior summaries for parameters, derived from RawBayesianResults.
parameters: dict mapping parameter name -> EstimatedBeta
- Parameters:
raw (RawBayesianResults)
calculate_likelihood (bool)
calculate_waic (bool)
calculate_loo (bool)
hdi_prob (float)
strict (bool)
- array_metadata: dict[str, dict]¶
- property best_draw_log_likelihood: float | None¶
- chains: int¶
- data_name: str¶
- draws: int¶
- dump(path)[source]¶
Write the underlying posterior + metadata to a single NetCDF file.
Delegates to
RawBayesianResults.save().- Parameters:
path (
str) – Output path for the NetCDF file.- Return type:
None
- ensure_diagnostics()[source]¶
Compute R-hat and ESS lazily. Cached after first attempt.
- Return type:
None
- property expected_log_likelihood: float | None¶
Posterior expectation of the total log-likelihood.
Computes
E_theta[ log L(Y|theta) ]across posterior draws. For pointwise arrays of shape (chain, draw, obs), totals are formed by summing over observations first.- Returns:
Expected total log-likelihood, or None if likelihood was not computed.
- Raises:
ValueError – If the stored log-likelihood has an unexpected shape.
- classmethod from_netcdf(filename, *, calculate_likelihood=True, calculate_waic=False, calculate_loo=True, hdi_prob=0.94, strict=False)[source]¶
Alternate constructor: build results directly from a NetCDF file.
This uses
RawBayesianResults.load()under the hood and then computes posterior summaries.- Parameters:
filename (
str) – Path to the NetCDF file.calculate_likelihood (
bool) – If True, expose/add the ArviZ log_likelihood group and enable predictive criteria.calculate_waic (
bool) – If True, compute WAIC (requires calculate_likelihood=True).calculate_loo (
bool) – If True, compute LOO (requires calculate_likelihood=True).hdi_prob (
float) – Credible mass for the Highest Density Interval.strict (
bool) – If True, raise when posterior variables have extra dimensions beyond (chain, draw).
- Return type:
- Returns:
A
BayesianResultsinstance built from the file.
- get_beta_values(my_betas=None, *, summary=PosteriorSummary.MEAN)[source]¶
Retrieve posterior point estimates for a set of parameters.
- Parameters:
my_betas (
list[str] |None) – names of requested parameters. If None, all parameters are returned.summary (
PosteriorSummary) – PosteriorSummary enum specifying whether to return the posterior mean, median, or mode. Default: MEAN.
- Return type:
dict[str,float]
- get_betas_for_sensitivity_analysis(my_betas=None, size=100)[source]¶
Generate draws from the distribution of the estimates, for sensitivity analysis.
- Parameters:
my_betas (
list[str] |None) – names of the parameters for which draws are requested.size (
int) – number of draws. Default: 100.
- Return type:
list[dict[str,float]]- Returns:
list of dict. Each dict has a many entries as parameters. The list has as many entries as draws.
- hdi_prob: float¶
- property idata: DataTree¶
- identification_diagnostics(*, identification_threshold, prior_idata=None, var_names=None)[source]¶
Compute heuristic diagnostics for potential identification issues.
Designed for the workflow where a posterior
arviz.InferenceDatais available and an optionalprior_idatais produced viapm.sample_prior_predictive(..., return_inferencedata=True).The diagnostics are heuristic indicators of weak or non-identification. They are based on the posterior covariance matrix and, when available, on comparisons between posterior and prior distributions.
The parameter
identification_thresholdfollows the same philosophy as the corresponding threshold used in maximum-likelihood estimation. In the ML case, identification issues are associated with very small eigenvalues of the Hessian. In the Bayesian case, they are associated with very large eigenvalues of the posterior covariance matrix. The threshold therefore controls the maximum tolerated anisotropy of the posterior covariance.More precisely, a weak-identification direction is reported when
- max_eigenvalue / min_positive_eigenvalue
>= 1 / identification_threshold
where the eigenvalues are those of the posterior covariance matrix.
For example:
identification_threshold = 1e-3corresponds to a condition number threshold of 10^3.identification_threshold = 1e-5corresponds to a condition number threshold of 10^5.identification_threshold = 1e-8corresponds to a condition number threshold of 10^8.
If
prior_idatais provided, prior and posterior dispersions are compared in order to identify parameters whose uncertainty is reduced only marginally by the likelihood.- Parameters:
identification_threshold (
float) – Threshold controlling the detection of weak-identification directions. Smaller values require stronger evidence before a warning is reported.prior_idata (
DataTree|None) – Optional prior InferenceData.var_names (
list[str] |None) – Variables to analyze. If None, the estimated model parameters are used.
- Return type:
dict[str,Any]- Returns:
Dictionary containing covariance diagnostics, prior/posterior comparisons, and warning flags.
- list_array_variables()[source]¶
Return metadata for posterior variables that have extra dims beyond (chain, draw).
Each entry contains: dims (tuple), shape (tuple), sizes (dict), dtype (str).
- Return type:
dict[str,dict]
- property log_likelihood¶
- property loo: float | None¶
- property loo_res¶
- property loo_se¶
- other_variables()[source]¶
Return posterior scalar variables that are not listed as parameters.
Useful to expose derived/deterministic quantities stored in the posterior (e.g., total log-likelihood) without mixing them with parameter estimates.
- Return type:
dict[str,EstimatedBeta]
- property p_loo¶
- property p_waic¶
- parameter_estimates()[source]¶
Return only the parameters explicitly listed in raw_bayesian_results.beta_names.
Missing names are ignored silently (they may have been skipped if multidimensional or missing in the posterior). The returned dict maps name -> EstimatedBeta.
- Return type:
dict[str,EstimatedBeta]
- parameters: dict[str, EstimatedBeta]¶
- property posterior_draws: int¶
- posterior_mean_by_observation(var_name)[source]¶
Return a DataFrame giving the posterior mean for each observation of the requested variable.
The variable must have shape (chain, draw, obs_dim), i.e., exactly one dimension besides ‘chain’ and ‘draw’. The returned DataFrame has one row per observation, indexed by the observation coordinate if available.
- Parameters:
var_name (
str) – Name of the posterior variable to summarize.- Return type:
DataFrame- Returns:
pd.DataFrame with index = observation and column = posterior mean of var_name.
- Raises:
BiogemeError – if the variable is not present, not an array, or not indexed by a single observation dimension.
- property posterior_predictive_loglike: float | None¶
Posterior-predictive log density.
Computes
sum_n log(mean_{chain,draw} p(y_n|theta))using the log-likelihood draws. This is a posterior-predictive criterion (log pointwise predictive density via arithmetic averaging overtheta); it is not the maximum-likelihood log-likelihood.- Returns:
Posterior-predictive log density, or None if likelihood was not computed.
- Raises:
ValueError – If the stored log-likelihood has an unexpected shape.
- report_stored_variables()[source]¶
Report all variables stored in the underlying NetCDF/InferenceData.
This is a convenience method to inspect what PyMC/ArviZ stored in the results file. It lists each variable together with its group, dimensions, and shape. The dimensions typically include
chainanddrawfor posterior quantities.- Return type:
DataFrame- Returns:
A DataFrame with columns
group,variable,dims, andshape.- Raises:
BiogemeError – If the inference data is missing or malformed.
- set_diagnostic_figure_references(figure_references)[source]¶
Store references to pre-rendered diagnostic figures.
These references are later transmitted to
BayesianResultsSummaryso that reports can be generated without access to posterior draws.- Parameters:
figure_references (
dict[str,str] |None) – Mapping from diagnostic figure names to filenames or paths.- Return type:
None
- summarize_array_variable(name, *, dim, indices=None, hdi_prob=None)[source]¶
Summarize a multi-dimensional posterior variable for selected indices along one extra dimension.
- Parameters:
name (
str) – Name of the posterior variable to summarize (must be present in array_metadata).dim (
str) – Name of the extra dimension along which indices are selected (e.g., an observation dimension).indices (
list[int] |None) – Indices to summarize. If None, summarize all indices (may be large).hdi_prob (
float|None) – If provided, overrides the instance hdi_prob for this call.
- Return type:
dict[int,EstimatedBeta]- Returns:
Mapping
index -> EstimatedBetacomputed from samples across chains/draws.- Raises:
KeyError – If the variable or dimension is unknown.
- to_summary()[source]¶
Convert the full Bayesian results into a lightweight summary object.
The returned object contains only derived summaries and metadata needed for inspection and serialization. It does not contain posterior draws.
- Return type:
- Returns:
Lightweight summary of the Bayesian estimation results.
- property waic¶
- property waic_res¶
- property waic_se¶
- class biogeme.bayesian_estimation.bayesian_results.EstimatedBeta(name, mean, median, mode, std_err, z_value, p_value, hdi_low, hdi_high, rhat, effective_sample_size_bulk, effective_sample_size_tail)[source]¶
Bases:
object- Parameters:
name (str)
mean (float)
median (float)
mode (float)
std_err (float)
z_value (float | None)
p_value (float | None)
hdi_low (float | None)
hdi_high (float | None)
rhat (float)
effective_sample_size_bulk (float)
effective_sample_size_tail (float)
- documentation: ClassVar[dict[str, str]] = {'ESS (bulk)': 'Effective sample size for the central part of the posterior; values above ~400 are generally considered sufficient.', 'ESS (tail)': 'Effective sample size for the posterior tails; values above ~100 ensure reliable estimates of extreme quantiles.', 'HDI low / HDI high': 'Lower and upper bounds of the Highest Density Interval containing the most probable parameter values.', 'Median': 'Posterior median (50% quantile) of the parameter.', 'Mode': 'Posterior mode (most frequent value) of the parameter', 'Name': 'Identifier of the model parameter being estimated.', 'R-hat (Gelman–Rubin)': 'Convergence diagnostic; values very close to 1 (typically ≤ 1.01) indicate well-mixed chains.', 'Std err.': 'Posterior standard deviation, measuring uncertainty around the mean.', 'Value': 'Posterior mean (expected value) of the parameter.', 'p-value': 'Two-sided Bayesian tail probability that the parameter differs in sign from zero.', 'z-value': 'Standardized estimate (mean divided by std. dev.), indicating signal-to-noise ratio.'}¶
- effective_sample_size_bulk: float¶
- effective_sample_size_tail: float¶
- hdi_high: float | None¶
- hdi_low: float | None¶
- mean: float¶
- median: float¶
- mode: float¶
- name: str¶
- p_value: float | None¶
- rhat: float¶
- std_err: float¶
- z_value: float | None¶