1b. Estimation of a logit model with custom priors (Bayesian)

This example estimates the same Bayesian logit model as in Example 1a, but illustrates how to specify custom prior distributions for the model parameters.

The example demonstrates:

  • the use of non-default prior distributions,

  • the specification of custom PyMC prior functions,

  • the use of truncated prior distributions to enforce sign constraints,

  • the estimation of a Bayesian logit model,

  • the extraction and display of the estimation results.

The # %% markers are used to separate the script into notebook cells when the example gallery is converted into Jupyter notebooks.

Tested with Biogeme 3.3.3.

Michel Bierlaire, EPFL Tue Jun 09 2026, 15:10:00

import pymc as pm
from IPython.core.display_functions import display
from pytensor.tensor.variable import TensorVariable

Import the variables and the database prepared in the Swissmetro data-processing example.

from swissmetro_data import (
    CAR_AV_SP,
    CAR_CO_SCALED,
    CAR_TT_SCALED,
    CHOICE,
    SM_AV,
    SM_COST_SCALED,
    SM_TT_SCALED,
    TRAIN_AV_SP,
    TRAIN_COST_SCALED,
    TRAIN_TT_SCALED,
    database,
)

import biogeme.biogeme_logging as blog
from biogeme.bayesian_estimation import BayesianResults, get_pandas_estimated_parameters
from biogeme.biogeme import BIOGEME
from biogeme.expressions import Beta
from biogeme.models import loglogit

Configure the Biogeme logger. Increasing the verbosity level provides additional information about the estimation process.

logger = blog.get_screen_logger(level=blog.DEBUG)
logger.info('Example b01b_logit.py')
[INFO] 2026-06-16 21:37:52,755 Example b01b_logit.py <plot_b01b_logit.py:57>

Define the alternative-specific constants. By default, Biogeme uses a normal prior distribution, possibly truncated if bounds are specified. Here, we increase the prior standard deviation by setting sigma_prior=30.

asc_car = Beta('asc_car', 0, None, None, 0, sigma_prior=30)
asc_train = Beta('asc_train', 0, None, None, 0, sigma_prior=30)


# For the remaining parameters, we define a custom Student-t prior.
# When bounds are specified, the prior is truncated accordingly.
# Consult the PyMC documentation for the available probability
# distributions.
def negative_student_prior(
    name: str,
    initial_value: float,
    lower_bound: float | None,
    upper_bound: float | None,
) -> TensorVariable:
    """Generate a Student-t prior distribution."""

    if lower_bound is None and upper_bound is None:
        return pm.StudentT(name=name, mu=initial_value, sigma=10.0, nu=5.0)

    rv = pm.StudentT.dist(mu=initial_value, sigma=10.0, nu=5.0)
    return pm.Truncated(
        name, rv, lower=lower_bound, upper=upper_bound, initval=initial_value
    )

Define the coefficients associated with travel time and travel cost. The upper bound is fixed to zero in order to reflect the prior assumption that increases in time and cost reduce utility.

b_time = Beta('b_time', -1, None, 0, 0, prior=negative_student_prior)
b_cost = Beta('b_cost', -1, None, 0, 0, prior=negative_student_prior)

Define the utility functions.

v_train = asc_train + b_time * TRAIN_TT_SCALED + b_cost * TRAIN_COST_SCALED
v_sm = b_time * SM_TT_SCALED + b_cost * SM_COST_SCALED
v_car = asc_car + b_time * CAR_TT_SCALED + b_cost * CAR_CO_SCALED

Associate each utility function with the corresponding alternative identifier.

v = {1: v_train, 2: v_sm, 3: v_car}

Associate the availability conditions with each alternative.

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

Define the log-likelihood contribution of each observation.

log_probability = loglogit(v, av, CHOICE)

Create the Biogeme object. We explicitly request the standard PyMC sampler. See the PyMC documentation for details about the available sampling algorithms.

the_biogeme = BIOGEME(database, log_probability, mcmc_sampling_strategy='pymc')
the_biogeme.model_name = 'b01b_logit'
[DEBUG] 2026-06-16 21:37:52,756 READ FILE biogeme.toml : automatic <parameters.py:184>
[DEBUG] 2026-06-16 21:37:52,756 Parameter file: /Users/bierlair/MyFiles/github/biogeme/docs/source/examples/bayesian_swissmetro/biogeme.toml <parameters.py:201>
[INFO] 2026-06-16 21:37:53,911 Biogeme parameters read from biogeme.toml. <parameters.py:205>

Estimate the parameters. The estimation code is placed inside a function protected by the standard Python main guard. This is required because PyMC may use multiprocessing, and worker processes re-import the script.

def main() -> None:
    """Estimate the Bayesian logit model and display the results."""

    try:
        results = BayesianResults.from_netcdf(
            filename=f'saved_results/{the_biogeme.model_name}.nc'
        )
    except FileNotFoundError:
        results = the_biogeme.bayesian_estimation()

    # %%
    # Display a short summary of the estimation results.
    print(results.short_summary())

    # %%
    # Convert the estimated parameters into a pandas DataFrame.
    pandas_results = get_pandas_estimated_parameters(
        estimation_results=results,
    )
    display(pandas_results)


if __name__ == '__main__':
    main()
[DEBUG] 2026-06-16 21:37:53,912 Read file saved_results/b01b_logit.nc <raw_bayesian_results.py:220>
load finished in 211 ms
[INFO] 2026-06-16 21:37:54,124 *** Initial values of the parameters are obtained from the file __b01b_logit.iter <biogeme.py:913>
[INFO] 2026-06-16 21:37:54,124 Cannot read file __b01b_logit.iter. Statement is ignored. <biogeme.py:800>
[INFO] 2026-06-16 21:37:54,124 Starting values for the algorithm: {} <biogeme.py:924>
[INFO] 2026-06-16 21:37:54,124 Detected CPU devices: 4 | System logical cores: 12
Current XLA_FLAGS: --xla_force_host_platform_device_count=4
Platform: Darwin 25.5.0 | Python: 3.14.3
 <biogeme.py:976>
/Users/bierlair/MyFiles/github/biogeme/src/biogeme/biogeme.py:1002: UserWarning: The effect of Potentials on other parameters is ignored during prior predictive sampling. This is likely to lead to invalid or biased predictive samples.
  pm.sample_prior_predictive(
/Users/bierlair/.local/share/uv/python/cpython-3.14.3-macos-aarch64-none/lib/python3.14/multiprocessing/popen_fork.py:70: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = os.fork()

  Progress                                                       Draw                            Divergences                     Step size                       Grad evals                      Speed                                                          Elapsed                         Remaining
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   4000                            0                               0.504                           7                               203.07 draws/s                                                 0:00:19                         0:00:00
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   4000                            0                               0.496                           7                               197.84 draws/s                                                 0:00:20                         0:00:00
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   4000                            0                               0.448                           7                               205.34 draws/s                                                 0:00:19                         0:00:00
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   4000                            0                               0.477                           15                              203.72 draws/s                                                 0:00:19                         0:00:00

[INFO] 2026-06-16 21:38:38,193 Diagnostics computation took 20.4 seconds (cached). <bayesian_results.py:268>
posterior_predictive_loglike finished in 215 ms
loo_res finished in 7082 ms (7.08 s)
loo finished in 7082 ms (7.08 s)
[WARNING] 2026-06-16 21:38:48,172 Prior group present but could not be analyzed: Array must not contain infs or NaNs <bayesian_results.py:1389>
[INFO] 2026-06-16 21:38:48,178 File b01b_logit.html has been generated. <html_output.py:770>
[INFO] 2026-06-16 21:38:48,183 Save simulation results on b01b_logit.nc <raw_bayesian_results.py:156>
[INFO] 2026-06-16 21:38:48,846 Saved Bayesian results (posterior + metadata) to b01b_logit.nc <raw_bayesian_results.py:208>
[INFO] 2026-06-16 21:38:48,846 Bayesian estimation completed. The file b01b_logit.nc containing all posterior draws has been generated. <biogeme.py:1097>
Sample size                                              6768
Sampler                                                  NUTS
Number of chains                                         4
Number of draws per chain                                2000
Total number of draws                                    8000
Acceptance rate target                                   0.9
Run time                                                 0:00:23.598062
Posterior predictive log-likelihood (sum of log mean p)  -5329.82
Expected log-likelihood E[log L(Y|θ)]                    -5333.24
Best-draw log-likelihood (posterior upper bound)         -5331.26
LOO (Leave-One-Out Cross-Validation)                     -5336.68
LOO Standard Error                                       59.63
Effective number of parameters (p_LOO)                   6.86
        Name  Value (mean)  Value (median)  ...     R hat   ESS (bulk)   ESS (tail)
0  asc_train     -0.699877       -0.699558  ...  1.001474  3750.888210  4194.742806
1    asc_car     -0.154105       -0.153809  ...  1.001788  4136.173217  4433.298933
2     b_time     -1.279698       -1.279040  ...  1.001252  3809.593607  4343.913550
3     b_cost     -1.084215       -1.083964  ...  1.000339  5118.692669  4750.088592

[4 rows x 12 columns]

Total running time of the script: (0 minutes 56.096 seconds)

Gallery generated by Sphinx-Gallery