2. Logit and sample with weights (Bayesian)

Example of a logit model with a weighted sample

Michel Bierlaire, EPFL Thu Oct 30 2025, 10:15:17

from pathlib import Path

from IPython.core.display_functions import display

See the data processing script: Data preparation for Swissmetro.

from swissmetro_data import (
    CAR_AV_SP,
    CAR_CO_SCALED,
    CAR_TT_SCALED,
    CHOICE,
    GROUP,
    SM_AV,
    SM_COST_SCALED,
    SM_TT_SCALED,
    TRAIN_AV_SP,
    TRAIN_COST_SCALED,
    TRAIN_TT_SCALED,
    database,
)

from biogeme.bayesian_estimation import (
    BayesianResults,
    BayesianResultsSummary,
    get_pandas_estimated_parameters,
)
from biogeme.biogeme import BIOGEME
from biogeme.expressions import Beta
from biogeme.models import loglogit

Parameters to be estimated.

asc_car = Beta('asc_car', 0, None, None, 0)
asc_train = Beta('asc_train', 0, None, None, 0)
asc_sm = Beta('asc_sm', 0, None, None, 1)
b_time = Beta('b_time', 0, None, None, 0)
b_cost = Beta('b_cost', 0, None, None, 0)

Definition of the utility functions.

v_train = asc_train + b_time * TRAIN_TT_SCALED + b_cost * TRAIN_COST_SCALED
v_swissmetro = asc_sm + b_time * SM_TT_SCALED + b_cost * SM_COST_SCALED
v_car = asc_car + b_time * CAR_TT_SCALED + b_cost * CAR_CO_SCALED

Associate utility functions with the numbering of alternatives.

v = {1: v_train, 2: v_swissmetro, 3: v_car}

Associate the availability conditions with the alternatives.

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

Definition of the model.

This is the contribution of each observation to the log likelihood function.

logprob = loglogit(v, av, CHOICE)

Definition of the weight.

WEIGHT_GROUP_2 = 8.890991e-01
WEIGHT_GROUP_3 = 1.2
weight = WEIGHT_GROUP_2 * (GROUP == 2) + WEIGHT_GROUP_3 * (GROUP == 3)

These notes will be included as such in the report file.

USER_NOTES = (
    'Example of a logit model with three alternatives: '
    'Train, Car and Swissmetro.'
    ' Weighted Exogenous Sample Maximum Likelihood estimator (WESML)'
)

Create the Biogeme object. It is possible to control the generation of the HTML and the yaml files. Note that these parameters can also be modified in the .TOML configuration file.

formulas = {'log_like': logprob, 'weight': weight}
the_biogeme = BIOGEME(
    database,
    formulas,
    mcmc_sampling_strategy='pymc',
    user_notes=USER_NOTES,
    generate_html=False,
    generate_yaml=True,
)
the_biogeme.model_name = 'b02_weight'

## %%
# Estimate the posterior distribution of the parameters, or read the summary
# if already available.
yaml_file = Path('saved_results') / f'{the_biogeme.model_name}.yaml'

try:
    summary_results = BayesianResultsSummary.from_yaml_file(filename=yaml_file)
except FileNotFoundError:
    results: BayesianResults = the_biogeme.bayesian_estimation()
    summary_results = results.to_summary()

## %%
print(summary_results.short_summary())
/Users/bierlair/MyFiles/github/biogeme/src/biogeme/biogeme.py:1002: UserWarning: The effect of Potentials on other parameters is ignored during prior predictive sampling. This is likely to lead to invalid or biased predictive samples.
  pm.sample_prior_predictive(
/Users/bierlair/.local/share/uv/python/cpython-3.14.3-macos-aarch64-none/lib/python3.14/multiprocessing/popen_fork.py:70: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = os.fork()

  Progress                                                       Draw                            Divergences                     Step size                       Grad evals                      Speed                                                          Elapsed                         Remaining
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   4000                            0                               0.460                           7                               213.24 draws/s                                                 0:00:18                         0:00:00
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   4000                            0                               0.471                           15                              196.91 draws/s                                                 0:00:20                         0:00:00
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   4000                            0                               0.612                           7                               196.49 draws/s                                                 0:00:20                         0:00:00
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   4000                            0                               0.621                           7                               184.51 draws/s                                                 0:00:21                         0:00:00

posterior_predictive_loglike finished in 205 ms
loo_res finished in 6968 ms (6.97 s)
loo finished in 6968 ms (6.97 s)
Sample size                                              6768
Sampler                                                  NUTS
Number of chains                                         4
Number of draws per chain                                2000
Total number of draws                                    8000
Acceptance rate target                                   0.9
Run time                                                 0:00:23.294466
Posterior predictive log-likelihood (sum of log mean p)  -5667.47
Expected log-likelihood E[log L(Y|θ)]                    -5671.05
Best-draw log-likelihood (posterior upper bound)         -5669.07
LOO (Leave-One-Out Cross-Validation)                     -5674.67
LOO Standard Error                                       63.99
Effective number of parameters (p_LOO)                   7.19

Get the results in a pandas table

pandas_results = get_pandas_estimated_parameters(estimation_results=summary_results)
display(pandas_results)
        Name  Value (mean)  Value (median)  ...     R hat   ESS (bulk)   ESS (tail)
0  asc_train     -0.794843       -0.794365  ...  1.001411  3941.607639  4778.463005
1     b_time     -1.348708       -1.347527  ...  1.001060  3637.842357  4576.099021
2     b_cost     -1.141194       -1.141127  ...  1.000222  5862.829903  5304.052443
3    asc_car     -0.090966       -0.090941  ...  1.000684  4137.729390  5212.387260

[4 rows x 12 columns]

Total running time of the script: (0 minutes 52.386 seconds)

Gallery generated by Sphinx-Gallery