23a. Binary logit model

Bayesian estimation of a binary logit model. Two alternatives: Train and Car.

Michel Bierlaire, EPFL Tue Nov 18 2025, 18:42:42

from pathlib import Path

from IPython.core.display_functions import display

See the data processing script: Data preparation for Swissmetro (binary choice).

from swissmetro_binary import (
    CAR_AV_SP,
    CAR_CO_SCALED,
    CAR_TT_SCALED,
    CHOICE,
    TRAIN_AV_SP,
    TRAIN_COST_SCALED,
    TRAIN_TT_SCALED,
    database,
)

from biogeme.bayesian_estimation import (
    BayesianResults,
    BayesianResultsSummary,
    get_pandas_estimated_parameters,
)
from biogeme.biogeme import BIOGEME
from biogeme.expressions import Beta
from biogeme.models import loglogit

Parameters to be estimated.

asc_car = Beta('asc_car', 0, None, None, 0)
b_time_car = Beta('b_time_car', 0, None, 0, 0)
b_time_train = Beta('b_time_train', 0, None, 0, 0)
b_cost_car = Beta('b_cost_car', 0, None, 0, 0)
b_cost_train = Beta('b_cost_train', 0, None, 0, 0)

Definition of the utility functions. We estimate a binary logit model. There are only two alternatives.

v_train = b_time_train * TRAIN_TT_SCALED + b_cost_train * TRAIN_COST_SCALED
v_car = asc_car + b_time_car * CAR_TT_SCALED + b_cost_car * CAR_CO_SCALED

Associate utility functions with the numbering of alternatives.

v = {1: v_train, 3: v_car}

Associate the availability conditions with the alternatives.

av = {1: TRAIN_AV_SP, 3: CAR_AV_SP}

Definition of the model. This is the contribution of each observation to the log likelihood function.

log_probability = loglogit(v, av, CHOICE)

Create the Biogeme object

the_biogeme = BIOGEME(database, log_probability)
the_biogeme.model_name = 'b23a_binary_logit'

Estimate the posterior distribution of the parameters, or read the results if already available.

yaml_file = Path('saved_results') / f'{the_biogeme.model_name}.yaml'
try:
    summary_results = BayesianResultsSummary.from_yaml_file(filename=yaml_file)
except FileNotFoundError:
    results: BayesianResults = the_biogeme.bayesian_estimation()
    summary_results = results.to_summary()
print(summary_results.short_summary())
Sample size                                              2232
Sampler                                                  NUTS
Number of chains                                         4
Number of draws per chain                                2000
Total number of draws                                    8000
Acceptance rate target                                   0.9
Run time                                                 0:00:30.208479
Posterior predictive log-likelihood (sum of log mean p)  -869.85
Expected log-likelihood E[log L(Y|θ)]                    -875.38
Best-draw log-likelihood (posterior upper bound)         -872.92
LOO (Leave-One-Out Cross-Validation)                     -882.32
LOO Standard Error                                       34.82
Effective number of parameters (p_LOO)                   12.46

Present the parameter estimates in a pandas table.

pandas_results = get_pandas_estimated_parameters(
    estimation_results=summary_results,
)
display(pandas_results)
           Name  Value (mean)  ...   ESS (bulk)   ESS (tail)
0       asc_car     -0.901000  ...  5662.453588  4972.486730
1  b_time_train     -1.155393  ...  4941.587572  4619.614336
2  b_cost_train     -2.405035  ...  6274.989100  5192.790538
3    b_time_car     -0.417166  ...  5681.468015  4317.196991
4    b_cost_car     -1.075706  ...  5625.989506  4176.135532

[5 rows x 12 columns]

Report the variables stored in the Bayesian estimation results.

display(summary_results.report_stored_variables())
             group           variable                dims            shape
0    constant_data          CAR_AV_SP               [obs]           [2232]
1    constant_data      CAR_CO_SCALED               [obs]           [2232]
2    constant_data      CAR_TT_SCALED               [obs]           [2232]
3    constant_data             CHOICE               [obs]           [2232]
4    constant_data        TRAIN_AV_SP               [obs]           [2232]
5    constant_data  TRAIN_COST_SCALED               [obs]           [2232]
6    constant_data    TRAIN_TT_SCALED               [obs]           [2232]
7   log_likelihood            _choice  [chain, draw, obs]  [4, 2000, 2232]
8        posterior            asc_car       [chain, draw]        [4, 2000]
9        posterior         b_cost_car       [chain, draw]        [4, 2000]
10       posterior       b_cost_train       [chain, draw]        [4, 2000]
11       posterior         b_time_car       [chain, draw]        [4, 2000]
12       posterior       b_time_train       [chain, draw]        [4, 2000]
13       posterior           log_like  [chain, draw, obs]  [4, 2000, 2232]
14           prior            asc_car       [chain, draw]        [1, 2000]
15           prior         b_cost_car       [chain, draw]        [1, 2000]
16           prior       b_cost_train       [chain, draw]        [1, 2000]
17           prior         b_time_car       [chain, draw]        [1, 2000]
18           prior       b_time_train       [chain, draw]        [1, 2000]
19           prior           log_like  [chain, draw, obs]  [1, 2000, 2232]
20    sample_stats    acceptance_rate       [chain, draw]        [4, 2000]
21    sample_stats          diverging       [chain, draw]        [4, 2000]
22    sample_stats             energy       [chain, draw]        [4, 2000]
23    sample_stats                 lp       [chain, draw]        [4, 2000]
24    sample_stats            n_steps       [chain, draw]        [4, 2000]
25    sample_stats          step_size       [chain, draw]        [4, 2000]
26    sample_stats         tree_depth       [chain, draw]        [4, 2000]

Total running time of the script: (0 minutes 1.477 seconds)

Gallery generated by Sphinx-Gallery