Logit

Estimation of a logit model using sampling of alternatives.

author:

Michel Bierlaire

date:

Wed Nov 1 17:39:47 2023

import pandas as pd
from biogeme.sampling_of_alternatives import (
    SamplingContext,
    ChoiceSetsGeneration,
    GenerateModel,
    generate_segment_size,
)
import biogeme.biogeme_logging as blog
import biogeme.biogeme as bio
from compare import compare
from specification_sampling import V, combined_variables
from alternatives import (
    alternatives,
    ID_COLUMN,
    partitions,
)
Number of asian restaurants: 33
logger = blog.get_screen_logger(level=blog.INFO)

The data file contains several columns associated with synthetic choices. Here we arbitrarily select logit_4.

CHOICE_COLUMN = 'logit_4'
SAMPLE_SIZE = 10
PARTITION = 'asian'
MODEL_NAME = f'logit_{PARTITION}_{SAMPLE_SIZE}_alt'
FILE_NAME = f'{MODEL_NAME}.dat'
OBS_FILE = 'obs_choice.dat'
the_partition = partitions.get(PARTITION)
if the_partition is None:
    raise ValueError(f'Unknown partition: {PARTITION}')
segment_sizes = generate_segment_size(SAMPLE_SIZE, the_partition.number_of_segments())
observations = pd.read_csv(OBS_FILE)
context = SamplingContext(
    the_partition=the_partition,
    sample_sizes=segment_sizes,
    individuals=observations,
    choice_column=CHOICE_COLUMN,
    alternatives=alternatives,
    id_column=ID_COLUMN,
    biogeme_file_name=FILE_NAME,
    utility_function=V,
    combined_variables=combined_variables,
)
logger.info(context.reporting())
Size of the choice set: 100
Main partition: 2 segment(s) of size 33, 67
Main sample: 10: 5/33, 5/67
the_data_generation = ChoiceSetsGeneration(context=context)
the_model_generation = GenerateModel(context=context)
biogeme_database = the_data_generation.sample_and_merge(recycle=False)
Generating 10 alternatives for 10000 observations

  0%|          | 0/10000 [00:00<?, ?it/s]
  1%|▏         | 133/10000 [00:00<00:07, 1325.02it/s]
  3%|▎         | 273/10000 [00:00<00:07, 1367.65it/s]
  4%|▍         | 415/10000 [00:00<00:06, 1391.39it/s]
  6%|▌         | 561/10000 [00:00<00:06, 1418.21it/s]
  7%|▋         | 705/10000 [00:00<00:06, 1424.02it/s]
  8%|▊         | 850/10000 [00:00<00:06, 1431.95it/s]
 10%|▉         | 995/10000 [00:00<00:06, 1434.86it/s]
 11%|█▏        | 1141/10000 [00:00<00:06, 1440.42it/s]
 13%|█▎        | 1287/10000 [00:00<00:06, 1444.24it/s]
 14%|█▍        | 1434/10000 [00:01<00:05, 1449.63it/s]
 16%|█▌        | 1579/10000 [00:01<00:05, 1447.92it/s]
 17%|█▋        | 1725/10000 [00:01<00:05, 1449.75it/s]
 19%|█▊        | 1872/10000 [00:01<00:05, 1454.61it/s]
 20%|██        | 2019/10000 [00:01<00:05, 1457.68it/s]
 22%|██▏       | 2165/10000 [00:01<00:05, 1438.65it/s]
 23%|██▎       | 2309/10000 [00:01<00:05, 1406.30it/s]
 24%|██▍       | 2450/10000 [00:01<00:05, 1399.83it/s]
 26%|██▌       | 2591/10000 [00:01<00:05, 1390.57it/s]
 27%|██▋       | 2731/10000 [00:01<00:05, 1389.14it/s]
 29%|██▊       | 2872/10000 [00:02<00:05, 1393.76it/s]
 30%|███       | 3012/10000 [00:02<00:05, 1383.84it/s]
 32%|███▏      | 3151/10000 [00:02<00:04, 1385.18it/s]
 33%|███▎      | 3290/10000 [00:02<00:04, 1375.19it/s]
 34%|███▍      | 3428/10000 [00:02<00:04, 1372.56it/s]
 36%|███▌      | 3566/10000 [00:02<00:04, 1372.29it/s]
 37%|███▋      | 3705/10000 [00:02<00:04, 1375.80it/s]
 38%|███▊      | 3844/10000 [00:02<00:04, 1379.30it/s]
 40%|███▉      | 3983/10000 [00:02<00:04, 1380.56it/s]
 41%|████      | 4122/10000 [00:02<00:04, 1377.72it/s]
 43%|████▎     | 4260/10000 [00:03<00:04, 1375.19it/s]
 44%|████▍     | 4399/10000 [00:03<00:04, 1377.73it/s]
 45%|████▌     | 4537/10000 [00:03<00:03, 1374.31it/s]
 47%|████▋     | 4676/10000 [00:03<00:03, 1377.29it/s]
 48%|████▊     | 4814/10000 [00:03<00:03, 1377.04it/s]
 50%|████▉     | 4952/10000 [00:03<00:03, 1368.55it/s]
 51%|█████     | 5090/10000 [00:03<00:03, 1370.71it/s]
 52%|█████▏    | 5228/10000 [00:03<00:03, 1365.39it/s]
 54%|█████▎    | 5365/10000 [00:03<00:03, 1362.10it/s]
 55%|█████▌    | 5502/10000 [00:03<00:03, 1359.52it/s]
 56%|█████▋    | 5638/10000 [00:04<00:03, 1356.92it/s]
 58%|█████▊    | 5775/10000 [00:04<00:03, 1358.17it/s]
 59%|█████▉    | 5911/10000 [00:04<00:03, 1356.86it/s]
 60%|██████    | 6047/10000 [00:04<00:02, 1357.19it/s]
 62%|██████▏   | 6183/10000 [00:04<00:02, 1354.27it/s]
 63%|██████▎   | 6320/10000 [00:04<00:02, 1356.64it/s]
 65%|██████▍   | 6458/10000 [00:04<00:02, 1360.86it/s]
 66%|██████▌   | 6595/10000 [00:04<00:02, 1355.89it/s]
 67%|██████▋   | 6731/10000 [00:04<00:02, 1350.63it/s]
 69%|██████▊   | 6870/10000 [00:04<00:02, 1360.48it/s]
 70%|███████   | 7007/10000 [00:05<00:02, 1355.43it/s]
 71%|███████▏  | 7143/10000 [00:05<00:02, 1350.47it/s]
 73%|███████▎  | 7280/10000 [00:05<00:02, 1356.24it/s]
 74%|███████▍  | 7416/10000 [00:05<00:01, 1344.12it/s]
 76%|███████▌  | 7551/10000 [00:05<00:01, 1341.40it/s]
 77%|███████▋  | 7687/10000 [00:05<00:01, 1345.57it/s]
 78%|███████▊  | 7822/10000 [00:05<00:01, 1343.59it/s]
 80%|███████▉  | 7960/10000 [00:05<00:01, 1354.25it/s]
 81%|████████  | 8096/10000 [00:05<00:01, 1351.67it/s]
 82%|████████▏ | 8232/10000 [00:05<00:01, 1350.14it/s]
 84%|████████▎ | 8369/10000 [00:06<00:01, 1354.40it/s]
 85%|████████▌ | 8505/10000 [00:06<00:01, 1353.60it/s]
 86%|████████▋ | 8641/10000 [00:06<00:01, 1352.96it/s]
 88%|████████▊ | 8777/10000 [00:06<00:00, 1350.07it/s]
 89%|████████▉ | 8913/10000 [00:06<00:00, 1350.64it/s]
 91%|█████████ | 9053/10000 [00:06<00:00, 1362.44it/s]
 92%|█████████▏| 9190/10000 [00:06<00:00, 1354.36it/s]
 93%|█████████▎| 9326/10000 [00:06<00:00, 1347.15it/s]
 95%|█████████▍| 9463/10000 [00:06<00:00, 1353.14it/s]
 96%|█████████▌| 9601/10000 [00:06<00:00, 1359.32it/s]
 97%|█████████▋| 9738/10000 [00:07<00:00, 1359.82it/s]
 99%|█████████▉| 9878/10000 [00:07<00:00, 1370.32it/s]
100%|██████████| 10000/10000 [00:07<00:00, 1334.90it/s]
Define new variables

Defining new variables...:   0%|          | 0/10 [00:00<?, ?it/s]
Defining new variables...:  20%|██        | 2/10 [00:00<00:00, 14.35it/s]
Defining new variables...:  40%|████      | 4/10 [00:00<00:00, 14.00it/s]
Defining new variables...:  60%|██████    | 6/10 [00:00<00:00, 13.78it/s]
Defining new variables...:  80%|████████  | 8/10 [00:00<00:00, 13.53it/s]
Defining new variables...: 100%|██████████| 10/10 [00:00<00:00, 13.36it/s]
Defining new variables...: 100%|██████████| 10/10 [00:00<00:00, 13.55it/s]
File logit_asian_10_alt.dat has been created.
logprob = the_model_generation.get_logit()
the_biogeme = bio.BIOGEME(biogeme_database, logprob)
the_biogeme.modelName = MODEL_NAME
Default values of the Biogeme parameters are used.
File biogeme.toml has been created

Calculate the null log likelihood for reporting.

the_biogeme.calculate_null_loglikelihood({i: 1 for i in range(SAMPLE_SIZE)})
np.float64(-23025.850929942502)

Estimate the parameters

results = the_biogeme.estimate(recycle=False)
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __logit_asian_10_alt.iter
Cannot read file __logit_asian_10_alt.iter. Statement is ignored.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.    beta_chinese  beta_ethiopian     beta_french     beta_indian   beta_japanese     beta_korean   beta_lebanese   beta_log_dist    beta_mexican      beta_price     beta_rating     Function    Relgrad   Radius      Rho
    0            0.04           -0.08            0.13            0.36               1            0.44          -0.046           -0.67            0.84            -0.4            0.46      1.9e+04      0.065       10        1   ++
    1            0.58             0.4             0.6            0.87             1.1            0.69            0.64           -0.58             1.1            -0.4             0.7      1.8e+04     0.0099    1e+02      1.1   ++
    2            0.61            0.44            0.64            0.92             1.2            0.71            0.65           -0.59             1.2           -0.41            0.75      1.8e+04    0.00032    1e+03        1   ++
    3            0.61            0.44            0.64            0.92             1.2            0.71            0.65           -0.59             1.2           -0.41            0.75      1.8e+04    3.6e-07    1e+03        1   ++
Results saved in file logit_asian_10_alt.html
Results saved in file logit_asian_10_alt.pickle
print(results.short_summary())
Results for model logit_asian_10_alt
Nbr of parameters:              11
Sample size:                    10000
Excluded data:                  0
Null log likelihood:            -23025.85
Final log likelihood:           -18451.88
Likelihood ratio test (null):           9147.951
Rho square (null):                      0.199
Rho bar square (null):                  0.198
Akaike Information Criterion:   36925.75
Bayesian Information Criterion: 37005.06
estimated_parameters = results.get_estimated_parameters()
estimated_parameters
Value Rob. Std err Rob. t-test Rob. p-value
beta_chinese 0.607892 0.050236 12.100600 0.0
beta_ethiopian 0.436107 0.050743 8.594396 0.0
beta_french 0.637537 0.061926 10.295114 0.0
beta_indian 0.917637 0.043002 21.339266 0.0
beta_japanese 1.152123 0.046626 24.709975 0.0
beta_korean 0.713678 0.042509 16.788924 0.0
beta_lebanese 0.655038 0.062660 10.453805 0.0
beta_log_dist -0.586987 0.014885 -39.435942 0.0
beta_mexican 1.181418 0.036514 32.355573 0.0
beta_price -0.408117 0.012796 -31.894969 0.0
beta_rating 0.751644 0.015430 48.714276 0.0


df, msg = compare(estimated_parameters)
print(df)
              Name  True Value  Estimated Value    T-Test
0      beta_rating        0.75         0.751644 -0.106559
1       beta_price       -0.40        -0.408117  0.634324
2     beta_chinese        0.75         0.607892  2.828786
3    beta_japanese        1.25         1.152123  2.099197
4      beta_korean        0.75         0.713678  0.854456
5      beta_indian        1.00         0.917637  1.915316
6      beta_french        0.75         0.637537  1.816087
7     beta_mexican        1.25         1.181418  1.878265
8    beta_lebanese        0.75         0.655038  1.515505
9   beta_ethiopian        0.50         0.436107  1.259136
10   beta_log_dist       -0.60        -0.586987 -0.874276
print(msg)
Parameters not estimated: ['mu_asian', 'mu_downtown']

Total running time of the script: (0 minutes 11.010 seconds)

Gallery generated by Sphinx-Gallery