Note
Go to the end to download the full example code.
Illustration of additional features of Biogeme¶
Same model as b01logit, using LinearUtility, segmentations
Michel Bierlaire, EPFL Wed Jun 18 2025, 10:57:53
import os
from IPython.core.display_functions import display
import biogeme.biogeme_logging as blog
from biogeme.biogeme import BIOGEME
from biogeme.exceptions import BiogemeError
from biogeme.expressions import Beta, LinearTermTuple, LinearUtility
from biogeme.models import loglogit
from biogeme.results_processing import (
EstimateVarianceCovariance,
generate_html_file,
get_pandas_estimated_parameters,
)
from biogeme.segmentation import Segmentation
See the data processing script: Data preparation for Swissmetro.
from swissmetro_data import (
CAR_AV_SP,
CAR_CO_SCALED,
CAR_TT_SCALED,
CHOICE,
GA,
MALE,
SM_AV,
SM_COST_SCALED,
SM_TT_SCALED,
TRAIN_AV_SP,
TRAIN_COST_SCALED,
TRAIN_TT_SCALED,
database,
)
logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example b01logit_bis.py')
Example b01logit_bis.py
Parameters to be estimated.
asc_car = Beta('asc_car', 0, None, None, 0)
asc_train = Beta('asc_train', 0, None, None, 0)
Starting value. We use starting values estimated from a previous run
b_time = Beta('b_time', -1.28, None, None, 0)
b_cost = Beta('b_cost', -1.08, None, None, 0)
Define segmentations.
gender_segmentation = database.generate_segmentation(
variable=MALE, mapping={0: 'female', 1: 'male'}
)
ga_segmentation = database.generate_segmentation(
variable=GA, mapping={0: 'without_ga', 1: 'with_ga'}
)
segmentations_for_asc = [
gender_segmentation,
ga_segmentation,
]
Segmentation of the constants.
asc_train_segmentation = Segmentation(asc_train, segmentations_for_asc)
segmented_asc_train = asc_train_segmentation.segmented_beta()
asc_car_segmentation = Segmentation(asc_car, segmentations_for_asc)
segmented_asc_car = asc_car_segmentation.segmented_beta()
Definition of the utility functions.
terms1 = [
LinearTermTuple(beta=b_time, x=TRAIN_TT_SCALED),
LinearTermTuple(beta=b_cost, x=TRAIN_COST_SCALED),
]
v_train = segmented_asc_train + LinearUtility(terms1)
terms2 = [
LinearTermTuple(beta=b_time, x=SM_TT_SCALED),
LinearTermTuple(beta=b_cost, x=SM_COST_SCALED),
]
v_swissmetro = LinearUtility(terms2)
terms3 = [
LinearTermTuple(beta=b_time, x=CAR_TT_SCALED),
LinearTermTuple(beta=b_cost, x=CAR_CO_SCALED),
]
v_car = segmented_asc_car + LinearUtility(terms3)
Associate utility functions with the numbering of alternatives.
v = {1: v_train, 2: v_swissmetro, 3: v_car}
Associate the availability conditions with the alternatives.
av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}
Definition of the model.
This is the contribution of each observation to the log likelihood function.
logprob = loglogit(v, av, CHOICE)
User notes.
These notes will be included as such in the report file.
USER_NOTES = (
'Example of a logit model with three alternatives: Train, Car and'
' Swissmetro. Same as 01logit and '
'introducing some options and features. In particular, bioLinearUtility,'
' and automatic segmentation of parameters.'
)
Create the Biogeme object. We include users notes, and we ask not to calculate the second derivatives. The parameter ‘calculating_second_derivatives’ is a general instruction for Biogeme, In this case, the second derivatives will not even be calculated after the algorithm has converged. It means that the statistics will have to rely on bootstrap or BHHH.
the_biogeme = BIOGEME(
database,
logprob,
user_notes=USER_NOTES,
save_iterations=False,
bootstrap_samples=100,
calculating_second_derivatives='never',
)
Biogeme parameters read from biogeme.toml.
Calculate the null log likelihood for reporting.
As we have used starting values different from 0, the initial model is not the equal probability model.
the_biogeme.calculate_null_loglikelihood(av)
the_biogeme.model_name = 'b01logit_bis'
Estimate the parameters.
results = the_biogeme.estimate(run_bootstrap=True)
Starting values for the algorithm: {}
As the model is rather complex, we cancel the calculation of second derivatives. If you want to control the parameters, change the algorithm from "automatic" to "simple_bounds" in the TOML file.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: BFGS with trust region for simple bounds
Iter. asc_train_ref asc_train_diff_ asc_train_diff_ b_time b_cost asc_car_ref asc_car_diff_ma asc_car_diff_wi Function Relgrad Radius Rho
0 0 0 0 -1.3 -1.1 0 0 0 5.5e+03 0.12 0.5 -0.69 -
1 0 0 0 -1.3 -1.1 0 0 0 5.5e+03 0.12 0.25 -0.051 -
2 -0.25 -0.25 0.25 -1.5 -0.83 0.25 0.25 -0.25 5.3e+03 0.079 0.25 0.41 +
3 -0.2 -0.5 0.5 -1.8 -0.58 0 0 -0.5 5.3e+03 0.1 0.25 0.2 +
4 -0.13 -0.55 0.67 -1.5 -0.8 0.13 0.17 -0.53 5.1e+03 0.041 0.25 0.9 +
5 -0.29 -0.8 0.92 -1.6 -1.1 -0.12 -0.077 -0.78 5.1e+03 0.069 0.25 0.49 +
6 -0.21 -0.84 1.1 -1.3 -1.1 -0.0014 0.11 -0.78 5e+03 0.041 0.25 0.46 +
7 -0.24 -0.99 1.4 -1.3 -1 -0.23 0.026 -0.78 5e+03 0.022 0.25 0.69 +
8 -0.44 -1.2 1.6 -1.3 -1 -0.23 0.2 -0.76 5e+03 0.024 0.25 0.17 +
9 -0.44 -1.2 1.6 -1.3 -1 -0.23 0.2 -0.76 5e+03 0.024 0.12 -0.21 -
10 -0.32 -1.2 1.7 -1.2 -1 -0.34 0.16 -0.76 5e+03 0.01 0.12 0.7 +
11 -0.44 -1.2 1.8 -1.2 -1.1 -0.46 0.23 -0.72 5e+03 0.013 0.12 0.54 +
12 -0.44 -1.2 1.9 -1.1 -1.1 -0.5 0.34 -0.68 4.9e+03 0.013 0.12 0.18 +
13 -0.56 -1.1 1.9 -1.2 -1 -0.63 0.45 -0.55 4.9e+03 0.0059 0.12 0.41 +
14 -0.56 -1.1 1.9 -1.2 -1 -0.63 0.45 -0.55 4.9e+03 0.0059 0.062 0.0044 -
15 -0.54 -1.1 1.9 -1.2 -1.1 -0.64 0.43 -0.55 4.9e+03 0.0028 0.062 0.75 +
16 -0.54 -1.1 1.9 -1.2 -1.1 -0.64 0.43 -0.55 4.9e+03 0.0028 0.031 -2.8 -
17 -0.54 -1.1 1.9 -1.2 -1.1 -0.64 0.43 -0.55 4.9e+03 0.0028 0.016 -0.29 -
18 -0.55 -1.1 1.9 -1.2 -1.1 -0.63 0.44 -0.54 4.9e+03 0.0021 0.016 0.44 +
19 -0.54 -1.1 1.9 -1.2 -1.1 -0.63 0.42 -0.53 4.9e+03 0.0021 0.016 0.46 +
20 -0.54 -1.1 1.9 -1.2 -1.1 -0.62 0.42 -0.51 4.9e+03 0.00047 0.016 0.83 +
21 -0.53 -1.1 1.9 -1.2 -1.1 -0.62 0.42 -0.5 4.9e+03 0.0014 0.016 0.42 +
22 -0.54 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.48 4.9e+03 0.00067 0.016 0.64 +
23 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.46 4.9e+03 0.00032 0.016 0.66 +
24 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.45 4.9e+03 0.00047 0.016 0.59 +
25 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.43 4.9e+03 0.00049 0.016 0.43 +
26 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.43 4.9e+03 0.00049 0.0078 -2.6 -
27 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.43 4.9e+03 0.00022 0.0078 0.5 +
28 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.43 4.9e+03 0.00022 0.0039 -2.3 -
29 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.43 4.9e+03 0.00022 0.002 -1.1 -
30 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.43 4.9e+03 0.00022 0.00098 0.064 -
31 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.43 4.9e+03 0.00017 0.00098 0.45 +
32 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.43 4.9e+03 7.4e-05 0.00098 0.84 +
33 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.43 4.9e+03 6.8e-05 0.00098 0.83 +
34 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.43 4.9e+03 6.2e-05 0.00098 0.87 +
35 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.43 4.9e+03 5.4e-05 0.00098 0.89 +
36 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.42 4.9e+03 5.3e-05 0.00098 0.89 +
37 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.42 4.9e+03 4.5e-05 0.00098 0.89 +
38 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.42 4.9e+03 4.3e-05 0.00098 0.89 +
39 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.42 4.9e+03 3.6e-05 0.00098 0.89 +
40 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.42 4.9e+03 3.4e-05 0.00098 0.9 +
41 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.42 4.9e+03 5.7e-05 0.00098 0.88 +
42 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.42 4.9e+03 2.4e-05 0.0098 0.94 ++
43 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.42 4.9e+03 2.4e-05 0.0033 -9.7 -
44 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.42 4.9e+03 2e-05 0.0033 0.88 +
45 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.42 4.9e+03 2e-05 0.0017 -35 -
46 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.42 4.9e+03 2e-05 0.00084 -3.5 -
47 -0.53 -1.1 1.9 -1.2 -1.1 -0.61 0.41 -0.42 4.9e+03 3.8e-06 0.00084 0.86 -
Optimization algorithm has converged.
Relative gradient: 3.827951698290516e-06
Cause of termination: Relative gradient = 3.8e-06 <= 6.1e-06
Number of function evaluations: 119
Number of gradient evaluations: 71
Number of hessian evaluations: 0
Algorithm: BFGS with trust region for simple bound constraints
Number of iterations: 48
Proportion of Hessian calculation: 0/35 = 0.0%
Optimization time: 0:00:00.269941
Calculate BHHH
Re-estimate the model 100 times for bootstrapping
Bootstraps: 0%| | 0/100 [00:00<?, ?it/s]
0%| | 0/100 [00:00<?, ?it/s]
1%| | 1/100 [00:03<05:07, 3.10s/it]
3%|▎ | 3/100 [00:03<01:38, 1.02s/it]
5%|▌ | 5/100 [00:04<01:00, 1.56it/s]
7%|▋ | 7/100 [00:04<00:45, 2.06it/s]
9%|▉ | 9/100 [00:05<00:36, 2.47it/s]
11%|█ | 11/100 [00:05<00:33, 2.69it/s]
13%|█▎ | 13/100 [00:06<00:29, 2.93it/s]
15%|█▌ | 15/100 [00:07<00:27, 3.11it/s]
17%|█▋ | 17/100 [00:07<00:25, 3.26it/s]
19%|█▉ | 19/100 [00:08<00:23, 3.38it/s]
21%|██ | 21/100 [00:08<00:23, 3.30it/s]
23%|██▎ | 23/100 [00:09<00:22, 3.38it/s]
25%|██▌ | 25/100 [00:09<00:21, 3.47it/s]
27%|██▋ | 27/100 [00:10<00:20, 3.51it/s]
29%|██▉ | 29/100 [00:11<00:20, 3.54it/s]
31%|███ | 31/100 [00:11<00:20, 3.35it/s]
33%|███▎ | 33/100 [00:12<00:19, 3.38it/s]/Users/bierlair/python_envs/venv313/lib/python3.13/site-packages/joblib/externals/loky/process_executor.py:782: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
warnings.warn(
35%|███▌ | 35/100 [00:13<00:21, 3.01it/s]
36%|███▌ | 36/100 [00:13<00:22, 2.90it/s]
37%|███▋ | 37/100 [00:14<00:35, 1.76it/s]
38%|███▊ | 38/100 [00:15<00:35, 1.74it/s]
39%|███▉ | 39/100 [00:16<00:35, 1.73it/s]
41%|████ | 41/100 [00:16<00:27, 2.16it/s]
43%|████▎ | 43/100 [00:17<00:22, 2.50it/s]
45%|████▌ | 45/100 [00:17<00:20, 2.74it/s]
47%|████▋ | 47/100 [00:18<00:18, 2.94it/s]
49%|████▉ | 49/100 [00:19<00:17, 2.99it/s]
51%|█████ | 51/100 [00:19<00:15, 3.12it/s]
53%|█████▎ | 53/100 [00:20<00:14, 3.25it/s]
55%|█████▌ | 55/100 [00:20<00:13, 3.37it/s]
56%|█████▌ | 56/100 [00:20<00:11, 3.75it/s]
57%|█████▋ | 57/100 [00:21<00:12, 3.34it/s]
58%|█████▊ | 58/100 [00:21<00:10, 3.82it/s]
59%|█████▉ | 59/100 [00:22<00:12, 3.17it/s]
61%|██████ | 61/100 [00:22<00:11, 3.33it/s]
63%|██████▎ | 63/100 [00:23<00:10, 3.42it/s]
65%|██████▌ | 65/100 [00:23<00:10, 3.48it/s]
66%|██████▌ | 66/100 [00:23<00:09, 3.75it/s]
67%|██████▋ | 67/100 [00:24<00:09, 3.43it/s]
68%|██████▊ | 68/100 [00:24<00:08, 3.74it/s]
69%|██████▉ | 69/100 [00:25<00:12, 2.45it/s]
70%|███████ | 70/100 [00:25<00:10, 2.91it/s]
71%|███████ | 71/100 [00:26<00:19, 1.48it/s]
72%|███████▏ | 72/100 [00:27<00:18, 1.55it/s]
73%|███████▎ | 73/100 [00:27<00:15, 1.79it/s]
74%|███████▍ | 74/100 [00:28<00:11, 2.22it/s]
75%|███████▌ | 75/100 [00:28<00:10, 2.30it/s]
76%|███████▌ | 76/100 [00:28<00:08, 2.86it/s]
77%|███████▋ | 77/100 [00:28<00:08, 2.76it/s]
78%|███████▊ | 78/100 [00:29<00:06, 3.30it/s]
79%|███████▉ | 79/100 [00:29<00:06, 3.07it/s]
80%|████████ | 80/100 [00:29<00:05, 3.38it/s]
81%|████████ | 81/100 [00:30<00:05, 3.35it/s]
82%|████████▏ | 82/100 [00:30<00:05, 3.53it/s]
83%|████████▎ | 83/100 [00:30<00:05, 3.25it/s]
84%|████████▍ | 84/100 [00:30<00:04, 3.68it/s]
85%|████████▌ | 85/100 [00:31<00:04, 3.30it/s]
86%|████████▌ | 86/100 [00:31<00:03, 3.78it/s]
87%|████████▋ | 87/100 [00:31<00:03, 3.39it/s]
88%|████████▊ | 88/100 [00:31<00:03, 3.81it/s]
89%|████████▉ | 89/100 [00:32<00:03, 3.42it/s]
90%|█████████ | 90/100 [00:32<00:02, 3.57it/s]
91%|█████████ | 91/100 [00:32<00:02, 3.51it/s]
92%|█████████▏| 92/100 [00:33<00:02, 3.56it/s]
93%|█████████▎| 93/100 [00:33<00:02, 3.24it/s]
94%|█████████▍| 94/100 [00:33<00:01, 3.70it/s]
95%|█████████▌| 95/100 [00:34<00:01, 3.33it/s]
96%|█████████▌| 96/100 [00:34<00:01, 3.76it/s]
97%|█████████▋| 97/100 [00:34<00:00, 3.29it/s]
98%|█████████▊| 98/100 [00:34<00:00, 3.70it/s]
99%|█████████▉| 99/100 [00:35<00:00, 3.28it/s]
100%|██████████| 100/100 [00:35<00:00, 3.45it/s]
100%|██████████| 100/100 [00:36<00:00, 2.73it/s]
File b01logit_bis~00.html has been generated.
File b01logit_bis~00.yaml has been generated.
Get the results in a pandas table.
print('Parameters')
print('----------')
pandas_results = get_pandas_estimated_parameters(estimation_results=results)
display(pandas_results)
Parameters
----------
Name Value ... Bootstrap t-stat. Bootstrap p-value
0 asc_train_ref -0.534241 ... -4.877074 1.076712e-06
1 asc_train_diff_male -1.103475 ... -12.722500 0.000000e+00
2 asc_train_diff_with_ga 1.889291 ... 18.925585 0.000000e+00
3 b_time -1.172988 ... -11.299239 0.000000e+00
4 b_cost -1.089775 ... -14.222431 0.000000e+00
5 asc_car_ref -0.612850 ... -6.348853 2.169263e-10
6 asc_car_diff_male 0.408056 ... 3.940929 8.116649e-05
7 asc_car_diff_with_ga -0.414881 ... -2.072138 3.825259e-02
[8 rows x 5 columns]
Get general statistics.
print('General statistics')
print('------------------')
stats = results.get_general_statistics()
for description, value in stats.items():
print(f'{description}: {value}')
General statistics
------------------
Number of estimated parameters: 8
Sample size: 6768
Excluded observations: 3960
Null log likelihood: -6964.663
Init log likelihood: -5533.155
Final log likelihood: -4943.895
Likelihood ratio test for the null model: 4041.535
Rho-square for the null model: 0.29
Rho-square-bar for the null model: 0.289
Likelihood ratio test for the init. model: 1178.519
Rho-square for the init. model: 0.106
Rho-square-bar for the init. model: 0.105
Akaike Information Criterion: 9903.791
Bayesian Information Criterion: 9958.351
Final gradient norm: 3.7451E-02
Bootstrapping time: 0:00:36.572809
Messages from the optimization algorithm.
print('Optimization algorithm')
print('----------------------')
for description, message in results.optimization_messages.items():
print(f'{description}:\t{message}')
Optimization algorithm
----------------------
Relative gradient: 3.827951698290516e-06
Cause of termination: Relative gradient = 3.8e-06 <= 6.1e-06
Number of function evaluations: 119
Number of gradient evaluations: 71
Number of hessian evaluations: 0
Algorithm: BFGS with trust region for simple bound constraints
Number of iterations: 48
Proportion of Hessian calculation: 0/35 = 0.0%
Optimization time: 0:00:00.269941
Try to generate the html output with the robust variance-covariance matrix. It does not work as the second derivatives matrix is not calculated.
try:
robust_html_filename = f'{the_biogeme.model_name}_robust.html'
# The following function assumes that the file does not exist.
if os.path.exists(robust_html_filename):
os.remove(robust_html_filename)
generate_html_file(
filename=robust_html_filename,
estimation_results=results,
variance_covariance_type=EstimateVarianceCovariance.ROBUST,
)
print(
f'Estimation results with robust statistics generated: {robust_html_filename}'
)
except BiogemeError as e:
print(f'BiogemeError: {e}')
BiogemeError: Second derivatives matrix not available.
Generate the html output with the BHHH variance-covariance matrix
bhhh_html_filename = f'{the_biogeme.model_name}_bhhh.html'
# The following function assumes that the file does not exist.
if os.path.exists(bhhh_html_filename):
os.remove(bhhh_html_filename)
generate_html_file(
filename=bhhh_html_filename,
estimation_results=results,
variance_covariance_type=EstimateVarianceCovariance.BHHH,
)
print(f'Estimation results with BHHH statistics generated: {bhhh_html_filename}')
File b01logit_bis_bhhh.html has been generated.
Estimation results with BHHH statistics generated: b01logit_bis_bhhh.html
Generate the file in Alogit format.
f12_filename = results.write_f12()
print(f'Estimation results in ALogit format generated: {f12_filename}')
File b01logit_bis~03.F12 has been generated.
Estimation results in ALogit format generated: b01logit_bis~03.F12
Generate LaTeX code with the results.
latex_filename = results.write_latex(include_begin_document=True)
print(f'Estimation results in LaTeX format generated: {latex_filename}')
File b01logit_bis~00.tex has been generated.
Estimation results in LaTeX format generated: b01logit_bis~00.tex
Total running time of the script: (0 minutes 37.546 seconds)