Note
Go to the end to download the full example code.
Logit¶
Estimation of a logit model using sampling of alternatives.
Michel Bierlaire Fri Jul 25 2025, 17:36:23
import pandas as pd
from IPython.core.display_functions import display
import biogeme.biogeme_logging as blog
from alternatives import ID_COLUMN, alternatives, partitions
from biogeme.biogeme import BIOGEME
from biogeme.results_processing import get_pandas_estimated_parameters
from biogeme.sampling_of_alternatives import (
ChoiceSetsGeneration,
GenerateModel,
SamplingContext,
generate_segment_size,
)
from compare import compare
from specification_sampling import V, combined_variables
ID rating price ... rest_lon distance downtown
0 0 1 4 ... 42.220972 71.735518 1.0
1 1 2 2 ... 50.549434 106.267205 0.0
2 2 3 3 ... 97.830520 136.298409 0.0
3 3 4 1 ... 69.152206 85.941147 0.0
4 4 4 3 ... 89.145620 96.773021 0.0
.. .. ... ... ... ... ... ...
95 95 4 3 ... 9.511387 84.166441 0.0
96 96 1 1 ... 92.144641 95.601366 0.0
97 97 4 2 ... 27.657518 30.440555 1.0
98 98 4 4 ... 32.303213 45.027143 1.0
99 99 4 1 ... 13.672495 25.703295 1.0
[100 rows x 16 columns]
Number of asian restaurants: 33
logger = blog.get_screen_logger(level=blog.INFO)
The data file contains several columns associated with synthetic choices. Here we arbitrarily select logit_4.
CHOICE_COLUMN = 'logit_4'
SAMPLE_SIZE = 10
PARTITION = 'asian'
MODEL_NAME = f'logit_{PARTITION}_{SAMPLE_SIZE}_alt'
FILE_NAME = f'{MODEL_NAME}.dat'
OBS_FILE = 'obs_choice.dat'
the_partition = partitions.get(PARTITION)
if the_partition is None:
raise ValueError(f'Unknown partition: {PARTITION}')
segment_sizes = generate_segment_size(SAMPLE_SIZE, the_partition.number_of_segments())
observations = pd.read_csv(OBS_FILE)
context = SamplingContext(
the_partition=the_partition,
sample_sizes=segment_sizes,
individuals=observations,
choice_column=CHOICE_COLUMN,
alternatives=alternatives,
id_column=ID_COLUMN,
biogeme_file_name=FILE_NAME,
utility_function=V,
combined_variables=combined_variables,
)
logger.info(context.reporting())
Size of the choice set: 100
Main partition: 2 segment(s) of size 33, 67
Main sample: 10: 5/33, 5/67
the_data_generation = ChoiceSetsGeneration(context=context)
the_model_generation = GenerateModel(context=context)
biogeme_database = the_data_generation.sample_and_merge(recycle=False)
Generating 10 + 0 alternatives for 10000 observations
0%| | 0/10000 [00:00<?, ?it/s]
1%|▏ | 131/10000 [00:00<00:07, 1301.00it/s]
3%|▎ | 266/10000 [00:00<00:07, 1325.92it/s]
4%|▍ | 401/10000 [00:00<00:07, 1336.43it/s]
5%|▌ | 538/10000 [00:00<00:07, 1347.94it/s]
7%|▋ | 675/10000 [00:00<00:06, 1355.37it/s]
8%|▊ | 811/10000 [00:00<00:06, 1352.79it/s]
9%|▉ | 947/10000 [00:00<00:06, 1346.40it/s]
11%|█ | 1082/10000 [00:00<00:06, 1335.86it/s]
12%|█▏ | 1220/10000 [00:00<00:06, 1348.67it/s]
14%|█▎ | 1358/10000 [00:01<00:06, 1356.92it/s]
15%|█▍ | 1494/10000 [00:01<00:06, 1356.97it/s]
16%|█▋ | 1633/10000 [00:01<00:06, 1364.28it/s]
18%|█▊ | 1773/10000 [00:01<00:05, 1374.08it/s]
19%|█▉ | 1915/10000 [00:01<00:05, 1385.30it/s]
21%|██ | 2054/10000 [00:01<00:05, 1382.53it/s]
22%|██▏ | 2193/10000 [00:01<00:05, 1374.64it/s]
23%|██▎ | 2331/10000 [00:01<00:05, 1368.63it/s]
25%|██▍ | 2468/10000 [00:01<00:05, 1353.81it/s]
26%|██▌ | 2604/10000 [00:01<00:05, 1352.18it/s]
27%|██▋ | 2744/10000 [00:02<00:05, 1364.29it/s]
29%|██▉ | 2881/10000 [00:02<00:05, 1361.53it/s]
30%|███ | 3018/10000 [00:02<00:05, 1354.04it/s]
32%|███▏ | 3154/10000 [00:02<00:05, 1346.68it/s]
33%|███▎ | 3289/10000 [00:02<00:04, 1344.16it/s]
34%|███▍ | 3430/10000 [00:02<00:04, 1362.04it/s]
36%|███▌ | 3567/10000 [00:02<00:04, 1359.02it/s]
37%|███▋ | 3704/10000 [00:02<00:04, 1359.82it/s]
38%|███▊ | 3840/10000 [00:02<00:04, 1357.21it/s]
40%|███▉ | 3978/10000 [00:02<00:04, 1363.35it/s]
41%|████ | 4118/10000 [00:03<00:04, 1371.69it/s]
43%|████▎ | 4256/10000 [00:03<00:04, 1368.92it/s]
44%|████▍ | 4393/10000 [00:03<00:04, 1366.06it/s]
45%|████▌ | 4530/10000 [00:03<00:04, 1354.35it/s]
47%|████▋ | 4667/10000 [00:03<00:03, 1356.87it/s]
48%|████▊ | 4806/10000 [00:03<00:03, 1366.41it/s]
49%|████▉ | 4944/10000 [00:03<00:03, 1367.98it/s]
51%|█████ | 5081/10000 [00:03<00:03, 1365.24it/s]
52%|█████▏ | 5218/10000 [00:03<00:03, 1356.83it/s]
54%|█████▎ | 5354/10000 [00:03<00:03, 1353.83it/s]
55%|█████▍ | 5493/10000 [00:04<00:03, 1364.50it/s]
56%|█████▋ | 5630/10000 [00:04<00:03, 1366.11it/s]
58%|█████▊ | 5767/10000 [00:04<00:03, 1361.97it/s]
59%|█████▉ | 5904/10000 [00:04<00:03, 1363.64it/s]
60%|██████ | 6043/10000 [00:04<00:02, 1370.12it/s]
62%|██████▏ | 6183/10000 [00:04<00:02, 1376.36it/s]
63%|██████▎ | 6324/10000 [00:04<00:02, 1384.49it/s]
65%|██████▍ | 6464/10000 [00:04<00:02, 1388.30it/s]
66%|██████▌ | 6603/10000 [00:04<00:02, 1369.67it/s]
67%|██████▋ | 6741/10000 [00:04<00:02, 1366.02it/s]
69%|██████▉ | 6878/10000 [00:05<00:02, 1362.48it/s]
70%|███████ | 7020/10000 [00:05<00:02, 1378.25it/s]
72%|███████▏ | 7158/10000 [00:05<00:02, 1375.58it/s]
73%|███████▎ | 7296/10000 [00:05<00:02, 1346.10it/s]
74%|███████▍ | 7436/10000 [00:05<00:01, 1359.42it/s]
76%|███████▌ | 7573/10000 [00:05<00:01, 1359.31it/s]
77%|███████▋ | 7714/10000 [00:05<00:01, 1373.18it/s]
79%|███████▊ | 7852/10000 [00:05<00:01, 1300.66it/s]
80%|███████▉ | 7988/10000 [00:05<00:01, 1315.77it/s]
81%|████████ | 8124/10000 [00:05<00:01, 1328.00it/s]
83%|████████▎ | 8261/10000 [00:06<00:01, 1338.80it/s]
84%|████████▍ | 8399/10000 [00:06<00:01, 1350.31it/s]
85%|████████▌ | 8535/10000 [00:06<00:01, 1319.57it/s]
87%|████████▋ | 8668/10000 [00:06<00:01, 1290.57it/s]
88%|████████▊ | 8798/10000 [00:06<00:00, 1285.44it/s]
89%|████████▉ | 8927/10000 [00:06<00:00, 1284.70it/s]
91%|█████████ | 9065/10000 [00:06<00:00, 1311.50it/s]
92%|█████████▏| 9197/10000 [00:06<00:00, 1309.87it/s]
93%|█████████▎| 9329/10000 [00:06<00:00, 1301.96it/s]
95%|█████████▍| 9462/10000 [00:07<00:00, 1309.50it/s]
96%|█████████▌| 9595/10000 [00:07<00:00, 1313.54it/s]
97%|█████████▋| 9730/10000 [00:07<00:00, 1322.07it/s]
99%|█████████▊| 9865/10000 [00:07<00:00, 1327.92it/s]
100%|██████████| 10000/10000 [00:07<00:00, 1308.23it/s]
Define new variables
Defining new variables...: 0%| | 0/10 [00:00<?, ?it/s]
Defining new variables...: 20%|██ | 2/10 [00:00<00:00, 17.76it/s]
Defining new variables...: 40%|████ | 4/10 [00:00<00:00, 18.97it/s]
Defining new variables...: 70%|███████ | 7/10 [00:00<00:00, 19.77it/s]
Defining new variables...: 100%|██████████| 10/10 [00:00<00:00, 19.99it/s]
Defining new variables...: 100%|██████████| 10/10 [00:00<00:00, 19.69it/s]
File logit_asian_10_alt.dat has been created.
logprob = the_model_generation.get_logit()
the_biogeme = BIOGEME(biogeme_database, logprob)
the_biogeme.modelName = MODEL_NAME
Default values of the Biogeme parameters are used.
File biogeme.toml has been created
/Users/bierlair/MyFiles/github/biogeme/docs/source/examples/sampling/plot_b01logit.py:84: DeprecationWarning: 'modelName' is deprecated. Please use 'model_name' instead.
the_biogeme.modelName = MODEL_NAME
Calculate the null log likelihood for reporting.
the_biogeme.calculate_null_loglikelihood({i: 1 for i in range(SAMPLE_SIZE)})
-23025.850929940458
Estimate the parameters
results = the_biogeme.estimate(recycle=False)
*** Initial values of the parameters are obtained from the file __logit_asian_10_alt.iter
Cannot read file __logit_asian_10_alt.iter. Statement is ignored.
Starting values for the algorithm: {}
As the model is not too complex, we activate the calculation of second derivatives. To change this behavior, modify the algorithm to "simple_bounds" in the TOML file.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter. beta_rating beta_price beta_chinese beta_japanese beta_korean beta_indian beta_french beta_mexican beta_lebanese beta_ethiopian beta_log_dist Function Relgrad Radius Rho
0 0.45 -0.39 0.023 1 0.42 0.36 0.097 0.89 -0.044 -0.071 -0.68 1.9e+04 0.066 10 1 ++
1 0.7 -0.4 0.58 1.2 0.68 0.89 0.57 1.2 0.63 0.44 -0.6 1.8e+04 0.01 1e+02 1.1 ++
2 0.75 -0.41 0.61 1.2 0.71 0.94 0.61 1.2 0.66 0.47 -0.6 1.8e+04 0.00033 1e+03 1 ++
3 0.75 -0.41 0.61 1.2 0.71 0.94 0.61 1.2 0.66 0.47 -0.6 1.8e+04 3.8e-07 1e+03 1 ++
Optimization algorithm has converged.
Relative gradient: 3.8171102916004234e-07
Cause of termination: Relative gradient = 3.8e-07 <= 6.1e-06
Number of function evaluations: 13
Number of gradient evaluations: 9
Number of hessian evaluations: 4
Algorithm: Newton with trust region for simple bound constraints
Number of iterations: 4
Proportion of Hessian calculation: 4/4 = 100.0%
Optimization time: 0:00:03.529146
Calculate second derivatives and BHHH
File logit_asian_10_alt.html has been generated.
File logit_asian_10_alt.yaml has been generated.
print(results.short_summary())
Results for model logit_asian_10_alt
Nbr of parameters: 11
Sample size: 10000
Excluded data: 0
Null log likelihood: -23025.85
Final log likelihood: -18389.31
Likelihood ratio test (null): 9273.078
Rho square (null): 0.201
Rho bar square (null): 0.201
Akaike Information Criterion: 36800.62
Bayesian Information Criterion: 36879.94
estimated_parameters = get_pandas_estimated_parameters(estimation_results=results)
display(estimated_parameters)
Name Value Robust std err. Robust t-stat. Robust p-value
0 beta_rating 0.747405 0.015277 48.922826 0.0
1 beta_price -0.406566 0.012767 -31.843948 0.0
2 beta_chinese 0.612019 0.050248 12.180081 0.0
3 beta_japanese 1.187620 0.046358 25.618523 0.0
4 beta_korean 0.710032 0.042348 16.766548 0.0
5 beta_indian 0.937761 0.043111 21.752391 0.0
6 beta_french 0.611551 0.061565 9.933479 0.0
7 beta_mexican 1.223355 0.036539 33.480934 0.0
8 beta_lebanese 0.657816 0.062525 10.520805 0.0
9 beta_ethiopian 0.472634 0.050153 9.423939 0.0
10 beta_log_dist -0.601792 0.015140 -39.748700 0.0
df, msg = compare(estimated_parameters)
print(df)
Name True Value Estimated Value T-Test
0 beta_rating 0.75 0.747405 0.169838
1 beta_price -0.40 -0.406566 0.514300
2 beta_chinese 0.75 0.612019 2.746018
3 beta_japanese 1.25 1.187620 1.345622
4 beta_korean 0.75 0.710032 0.943790
5 beta_indian 1.00 0.937761 1.443713
6 beta_french 0.75 0.611551 2.248835
7 beta_mexican 1.25 1.223355 0.729218
8 beta_lebanese 0.75 0.657816 1.474341
9 beta_ethiopian 0.50 0.472634 0.545648
10 beta_log_dist -0.60 -0.601792 0.118391
print(msg)
Parameters not estimated: ['mu_asian', 'mu_downtown']
Total running time of the script: (0 minutes 14.095 seconds)