1b. Estimation of a logit model (Bayesian)¶

This example illustrates how to change the prior distribution of the parameters.

Michel Bierlaire, EPFL Thu Nov 20 2025, 08:58:43

import pymc as pm
from IPython.core.display_functions import display
from pytensor.tensor.var import TensorVariable

import biogeme.biogeme_logging as blog
from biogeme.bayesian_estimation import BayesianResults, get_pandas_estimated_parameters
from biogeme.biogeme import BIOGEME
from biogeme.expressions import Beta
from biogeme.models import loglogit

/Users/bierlair/MyFiles/github/biogeme/docs/source/examples/bayesian_swissmetro/plot_b01b_logit.py:14: DeprecationWarning: The module 'pytensor.tensor.var' has been deprecated. Use 'pytensor.tensor.variable' instead.
  from pytensor.tensor.var import TensorVariable

See the data processing script: Data preparation for Swissmetro.

from swissmetro_data import (
    CAR_AV_SP,
    CAR_CO_SCALED,
    CAR_TT_SCALED,
    CHOICE,
    SM_AV,
    SM_COST_SCALED,
    SM_TT_SCALED,
    TRAIN_AV_SP,
    TRAIN_COST_SCALED,
    TRAIN_TT_SCALED,
    database,
)

See the data processing script: Data preparation for Swissmetro.

The logger sets the verbosity of Biogeme. By default, Biogeme is quite silent and generates only warnings. To have more information about what it happening behind the scene, the level should be set to blog.INFO.

logger = blog.get_screen_logger(level=blog.DEBUG)
logger.info('Example b01b_logit.py')

[INFO] 2025-12-25 14:44:57,044 Example b01b_logit.py <plot_b01b_logit.py:45>

Parameters to be estimated: alternative specific constants. By default, the prior distribution is normal, possibly truncated if bounds are defined, with the mean defined by the user, and the scale parameter explicitly defined. Here, we decide to replace the default sigma by sigma_prior=30.

asc_car = Beta('asc_car', 0, None, None, 0, sigma_prior=30)
asc_train = Beta('asc_train', 0, None, None, 0, sigma_prior=30)


# For the other parameters, we use a student distribution truncated at zero. We need to explicitly implement this prior,
# as illustrated below. Consult the PyMc documentation for the catalog of available distributions.
def negative_student_prior(
    name: str,
    initial_value: float,
    lower_bound: float | None,
    upper_bound: float | None,
) -> TensorVariable:
    """
    Example of a Student-t prior.
    """

    if lower_bound is None and upper_bound is None:
        return pm.StudentT(name=name, mu=initial_value, sigma=10.0, nu=5.0)

    rv = pm.StudentT.dist(mu=initial_value, sigma=10.0, nu=5.0)
    return pm.Truncated(
        name, rv, lower=lower_bound, upper=upper_bound, initval=initial_value
    )

Coefficients of the attributes. It is useful to set the upper bound to 0 to reflect the prior assumption about the sign of those parameters.

b_time = Beta('b_time', -1, None, 0, 0, prior=negative_student_prior)
b_cost = Beta('b_cost', -1, None, 0, 0, prior=negative_student_prior)

Definition of the utility functions.

v_train = asc_train + b_time * TRAIN_TT_SCALED + b_cost * TRAIN_COST_SCALED
v_sm = b_time * SM_TT_SCALED + b_cost * SM_COST_SCALED
v_car = asc_car + b_time * CAR_TT_SCALED + b_cost * CAR_CO_SCALED

Associate utility functions with the numbering of alternatives.

v = {1: v_train, 2: v_sm, 3: v_car}

Associate the availability conditions with the alternatives.

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

Definition of the model. This is the contribution of each observation to the log likelihood function.

log_probability = loglogit(v, av, CHOICE)

Create the Biogeme object. We illustrate the use of a specific sampler. Here, the default PyMC sampler. See the PyMC documentation for details.

the_biogeme = BIOGEME(database, log_probability, mcmc_sampling_strategy="pymc")
the_biogeme.model_name = 'b01b_logit'

[DEBUG] 2025-12-25 14:44:57,046 READ FILE biogeme.toml : automatic <parameters.py:184>
[DEBUG] 2025-12-25 14:44:57,046 Parameter file: /Users/bierlair/MyFiles/github/biogeme/docs/source/examples/bayesian_swissmetro/biogeme.toml <parameters.py:201>
[INFO] 2025-12-25 14:44:57,053 Biogeme parameters read from biogeme.toml. <parameters.py:205>

Estimate the parameters.

try:
    results = BayesianResults.from_netcdf(
        filename=f'saved_results/{the_biogeme.model_name}.nc'
    )
except FileNotFoundError:
    results = the_biogeme.bayesian_estimation()

[DEBUG] 2025-12-25 14:44:57,054 Read file saved_results/b01b_logit.nc <raw_bayesian_results.py:219>
[INFO] 2025-12-25 14:45:01,170 Loaded NetCDF file size: 757.0 MB <raw_bayesian_results.py:259>
load finished in 4116 ms (4.12 s)

print(results.short_summary())

posterior_predictive_loglike finished in 244 ms
expected_log_likelihood finished in 10 ms
best_draw_log_likelihood finished in 10 ms
/Users/bierlair/python_envs/venv313/lib/python3.13/site-packages/arviz/stats/stats.py:1667: UserWarning: For one or more samples the posterior variance of the log predictive densities exceeds 0.4. This could be indication of WAIC starting to fail.
See http://arxiv.org/abs/1507.04544 for details
  warnings.warn(
waic_res finished in 643 ms
waic finished in 643 ms
loo_res finished in 7494 ms (7.49 s)
loo finished in 7494 ms (7.49 s)
Sample size                                              6768
Sampler                                                  NUTS
Number of chains                                         4
Number of draws per chain                                2000
Total number of draws                                    8000
Acceptance rate target                                   0.9
Run time                                                 0:01:06.612717
Posterior predictive log-likelihood (sum of log mean p)  -5329.79
Expected log-likelihood E[log L(Y|θ)]                    -5333.26
Best-draw log-likelihood (posterior upper bound)         -5331.26
WAIC (Widely Applicable Information Criterion)           -5336.74
WAIC Standard Error                                      59.68
Effective number of parameters (p_WAIC)                  6.94
LOO (Leave-One-Out Cross-Validation)                     -5336.74
LOO Standard Error                                       59.68
Effective number of parameters (p_LOO)                   6.94

Get the results in a pandas table

pandas_results = get_pandas_estimated_parameters(
    estimation_results=results,
)
display(pandas_results)

[INFO] 2025-12-25 14:45:32,649 Diagnostics computation took 23.0 seconds (cached). <bayesian_results.py:235>
        Name  Value (mean)  Value (median)  ...     R hat   ESS (bulk)   ESS (tail)
0  asc_train     -0.698959       -0.698565  ...  1.000615  3246.938110  4581.487364
1    asc_car     -0.152807       -0.151998  ...  1.000407  3818.074091  5100.627719
2     b_time     -1.281670       -1.281377  ...  1.000845  3235.206268  4041.160060
3     b_cost     -1.084814       -1.085185  ...  1.000620  5762.587789  4609.336129

[4 rows x 12 columns]

Total running time of the script: (0 minutes 35.612 seconds)

Gallery generated by Sphinx-Gallery