Logit

Estimation of a logit model using sampling of alternatives.

Michel Bierlaire Fri Jul 25 2025, 17:36:23

import pandas as pd
from IPython.core.display_functions import display

import biogeme.biogeme_logging as blog
from alternatives import ID_COLUMN, alternatives, partitions
from biogeme.biogeme import BIOGEME
from biogeme.results_processing import get_pandas_estimated_parameters
from biogeme.sampling_of_alternatives import (
    ChoiceSetsGeneration,
    GenerateModel,
    SamplingContext,
    generate_segment_size,
)
from compare import compare
from specification_sampling import V, combined_variables
    ID  rating  price  ...   rest_lon    distance  downtown
0    0       1      4  ...  42.220972   71.735518       1.0
1    1       2      2  ...  50.549434  106.267205       0.0
2    2       3      3  ...  97.830520  136.298409       0.0
3    3       4      1  ...  69.152206   85.941147       0.0
4    4       4      3  ...  89.145620   96.773021       0.0
..  ..     ...    ...  ...        ...         ...       ...
95  95       4      3  ...   9.511387   84.166441       0.0
96  96       1      1  ...  92.144641   95.601366       0.0
97  97       4      2  ...  27.657518   30.440555       1.0
98  98       4      4  ...  32.303213   45.027143       1.0
99  99       4      1  ...  13.672495   25.703295       1.0

[100 rows x 16 columns]
Number of asian restaurants: 33
logger = blog.get_screen_logger(level=blog.INFO)

The data file contains several columns associated with synthetic choices. Here we arbitrarily select logit_4.

CHOICE_COLUMN = 'logit_4'
SAMPLE_SIZE = 10
PARTITION = 'asian'
MODEL_NAME = f'logit_{PARTITION}_{SAMPLE_SIZE}_alt'
FILE_NAME = f'{MODEL_NAME}.dat'
OBS_FILE = 'obs_choice.dat'
the_partition = partitions.get(PARTITION)
if the_partition is None:
    raise ValueError(f'Unknown partition: {PARTITION}')
segment_sizes = generate_segment_size(SAMPLE_SIZE, the_partition.number_of_segments())
observations = pd.read_csv(OBS_FILE)
context = SamplingContext(
    the_partition=the_partition,
    sample_sizes=segment_sizes,
    individuals=observations,
    choice_column=CHOICE_COLUMN,
    alternatives=alternatives,
    id_column=ID_COLUMN,
    biogeme_file_name=FILE_NAME,
    utility_function=V,
    combined_variables=combined_variables,
)
logger.info(context.reporting())
Size of the choice set: 100
Main partition: 2 segment(s) of size 33, 67
Main sample: 10: 5/33, 5/67
the_data_generation = ChoiceSetsGeneration(context=context)
the_model_generation = GenerateModel(context=context)
biogeme_database = the_data_generation.sample_and_merge(recycle=False)
Generating 10 + 0 alternatives for 10000 observations

  0%|          | 0/10000 [00:00<?, ?it/s]
  1%|▏         | 131/10000 [00:00<00:07, 1301.00it/s]
  3%|▎         | 266/10000 [00:00<00:07, 1325.92it/s]
  4%|▍         | 401/10000 [00:00<00:07, 1336.43it/s]
  5%|▌         | 538/10000 [00:00<00:07, 1347.94it/s]
  7%|▋         | 675/10000 [00:00<00:06, 1355.37it/s]
  8%|▊         | 811/10000 [00:00<00:06, 1352.79it/s]
  9%|▉         | 947/10000 [00:00<00:06, 1346.40it/s]
 11%|█         | 1082/10000 [00:00<00:06, 1335.86it/s]
 12%|█▏        | 1220/10000 [00:00<00:06, 1348.67it/s]
 14%|█▎        | 1358/10000 [00:01<00:06, 1356.92it/s]
 15%|█▍        | 1494/10000 [00:01<00:06, 1356.97it/s]
 16%|█▋        | 1633/10000 [00:01<00:06, 1364.28it/s]
 18%|█▊        | 1773/10000 [00:01<00:05, 1374.08it/s]
 19%|█▉        | 1915/10000 [00:01<00:05, 1385.30it/s]
 21%|██        | 2054/10000 [00:01<00:05, 1382.53it/s]
 22%|██▏       | 2193/10000 [00:01<00:05, 1374.64it/s]
 23%|██▎       | 2331/10000 [00:01<00:05, 1368.63it/s]
 25%|██▍       | 2468/10000 [00:01<00:05, 1353.81it/s]
 26%|██▌       | 2604/10000 [00:01<00:05, 1352.18it/s]
 27%|██▋       | 2744/10000 [00:02<00:05, 1364.29it/s]
 29%|██▉       | 2881/10000 [00:02<00:05, 1361.53it/s]
 30%|███       | 3018/10000 [00:02<00:05, 1354.04it/s]
 32%|███▏      | 3154/10000 [00:02<00:05, 1346.68it/s]
 33%|███▎      | 3289/10000 [00:02<00:04, 1344.16it/s]
 34%|███▍      | 3430/10000 [00:02<00:04, 1362.04it/s]
 36%|███▌      | 3567/10000 [00:02<00:04, 1359.02it/s]
 37%|███▋      | 3704/10000 [00:02<00:04, 1359.82it/s]
 38%|███▊      | 3840/10000 [00:02<00:04, 1357.21it/s]
 40%|███▉      | 3978/10000 [00:02<00:04, 1363.35it/s]
 41%|████      | 4118/10000 [00:03<00:04, 1371.69it/s]
 43%|████▎     | 4256/10000 [00:03<00:04, 1368.92it/s]
 44%|████▍     | 4393/10000 [00:03<00:04, 1366.06it/s]
 45%|████▌     | 4530/10000 [00:03<00:04, 1354.35it/s]
 47%|████▋     | 4667/10000 [00:03<00:03, 1356.87it/s]
 48%|████▊     | 4806/10000 [00:03<00:03, 1366.41it/s]
 49%|████▉     | 4944/10000 [00:03<00:03, 1367.98it/s]
 51%|█████     | 5081/10000 [00:03<00:03, 1365.24it/s]
 52%|█████▏    | 5218/10000 [00:03<00:03, 1356.83it/s]
 54%|█████▎    | 5354/10000 [00:03<00:03, 1353.83it/s]
 55%|█████▍    | 5493/10000 [00:04<00:03, 1364.50it/s]
 56%|█████▋    | 5630/10000 [00:04<00:03, 1366.11it/s]
 58%|█████▊    | 5767/10000 [00:04<00:03, 1361.97it/s]
 59%|█████▉    | 5904/10000 [00:04<00:03, 1363.64it/s]
 60%|██████    | 6043/10000 [00:04<00:02, 1370.12it/s]
 62%|██████▏   | 6183/10000 [00:04<00:02, 1376.36it/s]
 63%|██████▎   | 6324/10000 [00:04<00:02, 1384.49it/s]
 65%|██████▍   | 6464/10000 [00:04<00:02, 1388.30it/s]
 66%|██████▌   | 6603/10000 [00:04<00:02, 1369.67it/s]
 67%|██████▋   | 6741/10000 [00:04<00:02, 1366.02it/s]
 69%|██████▉   | 6878/10000 [00:05<00:02, 1362.48it/s]
 70%|███████   | 7020/10000 [00:05<00:02, 1378.25it/s]
 72%|███████▏  | 7158/10000 [00:05<00:02, 1375.58it/s]
 73%|███████▎  | 7296/10000 [00:05<00:02, 1346.10it/s]
 74%|███████▍  | 7436/10000 [00:05<00:01, 1359.42it/s]
 76%|███████▌  | 7573/10000 [00:05<00:01, 1359.31it/s]
 77%|███████▋  | 7714/10000 [00:05<00:01, 1373.18it/s]
 79%|███████▊  | 7852/10000 [00:05<00:01, 1300.66it/s]
 80%|███████▉  | 7988/10000 [00:05<00:01, 1315.77it/s]
 81%|████████  | 8124/10000 [00:05<00:01, 1328.00it/s]
 83%|████████▎ | 8261/10000 [00:06<00:01, 1338.80it/s]
 84%|████████▍ | 8399/10000 [00:06<00:01, 1350.31it/s]
 85%|████████▌ | 8535/10000 [00:06<00:01, 1319.57it/s]
 87%|████████▋ | 8668/10000 [00:06<00:01, 1290.57it/s]
 88%|████████▊ | 8798/10000 [00:06<00:00, 1285.44it/s]
 89%|████████▉ | 8927/10000 [00:06<00:00, 1284.70it/s]
 91%|█████████ | 9065/10000 [00:06<00:00, 1311.50it/s]
 92%|█████████▏| 9197/10000 [00:06<00:00, 1309.87it/s]
 93%|█████████▎| 9329/10000 [00:06<00:00, 1301.96it/s]
 95%|█████████▍| 9462/10000 [00:07<00:00, 1309.50it/s]
 96%|█████████▌| 9595/10000 [00:07<00:00, 1313.54it/s]
 97%|█████████▋| 9730/10000 [00:07<00:00, 1322.07it/s]
 99%|█████████▊| 9865/10000 [00:07<00:00, 1327.92it/s]
100%|██████████| 10000/10000 [00:07<00:00, 1308.23it/s]
Define new variables

Defining new variables...:   0%|          | 0/10 [00:00<?, ?it/s]
Defining new variables...:  20%|██        | 2/10 [00:00<00:00, 17.76it/s]
Defining new variables...:  40%|████      | 4/10 [00:00<00:00, 18.97it/s]
Defining new variables...:  70%|███████   | 7/10 [00:00<00:00, 19.77it/s]
Defining new variables...: 100%|██████████| 10/10 [00:00<00:00, 19.99it/s]
Defining new variables...: 100%|██████████| 10/10 [00:00<00:00, 19.69it/s]
File logit_asian_10_alt.dat has been created.
logprob = the_model_generation.get_logit()
the_biogeme = BIOGEME(biogeme_database, logprob)
the_biogeme.modelName = MODEL_NAME
Default values of the Biogeme parameters are used.
File biogeme.toml has been created
/Users/bierlair/MyFiles/github/biogeme/docs/source/examples/sampling/plot_b01logit.py:84: DeprecationWarning: 'modelName' is deprecated. Please use 'model_name' instead.
  the_biogeme.modelName = MODEL_NAME

Calculate the null log likelihood for reporting.

the_biogeme.calculate_null_loglikelihood({i: 1 for i in range(SAMPLE_SIZE)})
-23025.850929940458

Estimate the parameters

results = the_biogeme.estimate(recycle=False)
*** Initial values of the parameters are obtained from the file __logit_asian_10_alt.iter
Cannot read file __logit_asian_10_alt.iter. Statement is ignored.
Starting values for the algorithm: {}
As the model is not too complex, we activate the calculation of second derivatives. To change this behavior, modify the algorithm to "simple_bounds" in the TOML file.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.     beta_rating      beta_price    beta_chinese   beta_japanese     beta_korean     beta_indian     beta_french    beta_mexican   beta_lebanese  beta_ethiopian   beta_log_dist     Function    Relgrad   Radius      Rho
    0            0.45           -0.39           0.023               1            0.42            0.36           0.097            0.89          -0.044          -0.071           -0.68      1.9e+04      0.066       10        1   ++
    1             0.7            -0.4            0.58             1.2            0.68            0.89            0.57             1.2            0.63            0.44            -0.6      1.8e+04       0.01    1e+02      1.1   ++
    2            0.75           -0.41            0.61             1.2            0.71            0.94            0.61             1.2            0.66            0.47            -0.6      1.8e+04    0.00033    1e+03        1   ++
    3            0.75           -0.41            0.61             1.2            0.71            0.94            0.61             1.2            0.66            0.47            -0.6      1.8e+04    3.8e-07    1e+03        1   ++
Optimization algorithm has converged.
Relative gradient: 3.8171102916004234e-07
Cause of termination: Relative gradient = 3.8e-07 <= 6.1e-06
Number of function evaluations: 13
Number of gradient evaluations: 9
Number of hessian evaluations: 4
Algorithm: Newton with trust region for simple bound constraints
Number of iterations: 4
Proportion of Hessian calculation: 4/4 = 100.0%
Optimization time: 0:00:03.529146
Calculate second derivatives and BHHH
File logit_asian_10_alt.html has been generated.
File logit_asian_10_alt.yaml has been generated.
print(results.short_summary())
Results for model logit_asian_10_alt
Nbr of parameters:              11
Sample size:                    10000
Excluded data:                  0
Null log likelihood:            -23025.85
Final log likelihood:           -18389.31
Likelihood ratio test (null):           9273.078
Rho square (null):                      0.201
Rho bar square (null):                  0.201
Akaike Information Criterion:   36800.62
Bayesian Information Criterion: 36879.94
estimated_parameters = get_pandas_estimated_parameters(estimation_results=results)
display(estimated_parameters)
              Name     Value  Robust std err.  Robust t-stat.  Robust p-value
0      beta_rating  0.747405         0.015277       48.922826             0.0
1       beta_price -0.406566         0.012767      -31.843948             0.0
2     beta_chinese  0.612019         0.050248       12.180081             0.0
3    beta_japanese  1.187620         0.046358       25.618523             0.0
4      beta_korean  0.710032         0.042348       16.766548             0.0
5      beta_indian  0.937761         0.043111       21.752391             0.0
6      beta_french  0.611551         0.061565        9.933479             0.0
7     beta_mexican  1.223355         0.036539       33.480934             0.0
8    beta_lebanese  0.657816         0.062525       10.520805             0.0
9   beta_ethiopian  0.472634         0.050153        9.423939             0.0
10   beta_log_dist -0.601792         0.015140      -39.748700             0.0
df, msg = compare(estimated_parameters)
print(df)
              Name  True Value  Estimated Value    T-Test
0      beta_rating        0.75         0.747405  0.169838
1       beta_price       -0.40        -0.406566  0.514300
2     beta_chinese        0.75         0.612019  2.746018
3    beta_japanese        1.25         1.187620  1.345622
4      beta_korean        0.75         0.710032  0.943790
5      beta_indian        1.00         0.937761  1.443713
6      beta_french        0.75         0.611551  2.248835
7     beta_mexican        1.25         1.223355  0.729218
8    beta_lebanese        0.75         0.657816  1.474341
9   beta_ethiopian        0.50         0.472634  0.545648
10   beta_log_dist       -0.60        -0.601792  0.118391
print(msg)
Parameters not estimated: ['mu_asian', 'mu_downtown']

Total running time of the script: (0 minutes 14.095 seconds)

Gallery generated by Sphinx-Gallery