Note
Go to the end to download the full example code.
Logit
Estimation of a logit model using sampling of alternatives.
- author:
Michel Bierlaire
- date:
Wed Nov 1 17:39:47 2023
import pandas as pd
from biogeme.sampling_of_alternatives import (
SamplingContext,
ChoiceSetsGeneration,
GenerateModel,
generate_segment_size,
)
import biogeme.biogeme_logging as blog
import biogeme.biogeme as bio
from compare import compare
from specification_sampling import V, combined_variables
from alternatives import (
alternatives,
ID_COLUMN,
partitions,
)
Number of asian restaurants: 33
logger = blog.get_screen_logger(level=blog.INFO)
The data file contains several columns associated with synthetic choices. Here we arbitrarily select logit_4.
CHOICE_COLUMN = 'logit_4'
SAMPLE_SIZE = 10
PARTITION = 'asian'
MODEL_NAME = f'logit_{PARTITION}_{SAMPLE_SIZE}_alt'
FILE_NAME = f'{MODEL_NAME}.dat'
OBS_FILE = 'obs_choice.dat'
the_partition = partitions.get(PARTITION)
if the_partition is None:
raise ValueError(f'Unknown partition: {PARTITION}')
segment_sizes = generate_segment_size(SAMPLE_SIZE, the_partition.number_of_segments())
observations = pd.read_csv(OBS_FILE)
context = SamplingContext(
the_partition=the_partition,
sample_sizes=segment_sizes,
individuals=observations,
choice_column=CHOICE_COLUMN,
alternatives=alternatives,
id_column=ID_COLUMN,
biogeme_file_name=FILE_NAME,
utility_function=V,
combined_variables=combined_variables,
)
logger.info(context.reporting())
Size of the choice set: 100
Main partition: 2 segment(s) of size 33, 67
Main sample: 10: 5/33, 5/67
the_data_generation = ChoiceSetsGeneration(context=context)
the_model_generation = GenerateModel(context=context)
biogeme_database = the_data_generation.sample_and_merge(recycle=False)
Generating 10 alternatives for 10000 observations
0%| | 0/10000 [00:00<?, ?it/s]
1%|▏ | 133/10000 [00:00<00:07, 1325.02it/s]
3%|▎ | 273/10000 [00:00<00:07, 1367.65it/s]
4%|▍ | 415/10000 [00:00<00:06, 1391.39it/s]
6%|▌ | 561/10000 [00:00<00:06, 1418.21it/s]
7%|▋ | 705/10000 [00:00<00:06, 1424.02it/s]
8%|▊ | 850/10000 [00:00<00:06, 1431.95it/s]
10%|▉ | 995/10000 [00:00<00:06, 1434.86it/s]
11%|█▏ | 1141/10000 [00:00<00:06, 1440.42it/s]
13%|█▎ | 1287/10000 [00:00<00:06, 1444.24it/s]
14%|█▍ | 1434/10000 [00:01<00:05, 1449.63it/s]
16%|█▌ | 1579/10000 [00:01<00:05, 1447.92it/s]
17%|█▋ | 1725/10000 [00:01<00:05, 1449.75it/s]
19%|█▊ | 1872/10000 [00:01<00:05, 1454.61it/s]
20%|██ | 2019/10000 [00:01<00:05, 1457.68it/s]
22%|██▏ | 2165/10000 [00:01<00:05, 1438.65it/s]
23%|██▎ | 2309/10000 [00:01<00:05, 1406.30it/s]
24%|██▍ | 2450/10000 [00:01<00:05, 1399.83it/s]
26%|██▌ | 2591/10000 [00:01<00:05, 1390.57it/s]
27%|██▋ | 2731/10000 [00:01<00:05, 1389.14it/s]
29%|██▊ | 2872/10000 [00:02<00:05, 1393.76it/s]
30%|███ | 3012/10000 [00:02<00:05, 1383.84it/s]
32%|███▏ | 3151/10000 [00:02<00:04, 1385.18it/s]
33%|███▎ | 3290/10000 [00:02<00:04, 1375.19it/s]
34%|███▍ | 3428/10000 [00:02<00:04, 1372.56it/s]
36%|███▌ | 3566/10000 [00:02<00:04, 1372.29it/s]
37%|███▋ | 3705/10000 [00:02<00:04, 1375.80it/s]
38%|███▊ | 3844/10000 [00:02<00:04, 1379.30it/s]
40%|███▉ | 3983/10000 [00:02<00:04, 1380.56it/s]
41%|████ | 4122/10000 [00:02<00:04, 1377.72it/s]
43%|████▎ | 4260/10000 [00:03<00:04, 1375.19it/s]
44%|████▍ | 4399/10000 [00:03<00:04, 1377.73it/s]
45%|████▌ | 4537/10000 [00:03<00:03, 1374.31it/s]
47%|████▋ | 4676/10000 [00:03<00:03, 1377.29it/s]
48%|████▊ | 4814/10000 [00:03<00:03, 1377.04it/s]
50%|████▉ | 4952/10000 [00:03<00:03, 1368.55it/s]
51%|█████ | 5090/10000 [00:03<00:03, 1370.71it/s]
52%|█████▏ | 5228/10000 [00:03<00:03, 1365.39it/s]
54%|█████▎ | 5365/10000 [00:03<00:03, 1362.10it/s]
55%|█████▌ | 5502/10000 [00:03<00:03, 1359.52it/s]
56%|█████▋ | 5638/10000 [00:04<00:03, 1356.92it/s]
58%|█████▊ | 5775/10000 [00:04<00:03, 1358.17it/s]
59%|█████▉ | 5911/10000 [00:04<00:03, 1356.86it/s]
60%|██████ | 6047/10000 [00:04<00:02, 1357.19it/s]
62%|██████▏ | 6183/10000 [00:04<00:02, 1354.27it/s]
63%|██████▎ | 6320/10000 [00:04<00:02, 1356.64it/s]
65%|██████▍ | 6458/10000 [00:04<00:02, 1360.86it/s]
66%|██████▌ | 6595/10000 [00:04<00:02, 1355.89it/s]
67%|██████▋ | 6731/10000 [00:04<00:02, 1350.63it/s]
69%|██████▊ | 6870/10000 [00:04<00:02, 1360.48it/s]
70%|███████ | 7007/10000 [00:05<00:02, 1355.43it/s]
71%|███████▏ | 7143/10000 [00:05<00:02, 1350.47it/s]
73%|███████▎ | 7280/10000 [00:05<00:02, 1356.24it/s]
74%|███████▍ | 7416/10000 [00:05<00:01, 1344.12it/s]
76%|███████▌ | 7551/10000 [00:05<00:01, 1341.40it/s]
77%|███████▋ | 7687/10000 [00:05<00:01, 1345.57it/s]
78%|███████▊ | 7822/10000 [00:05<00:01, 1343.59it/s]
80%|███████▉ | 7960/10000 [00:05<00:01, 1354.25it/s]
81%|████████ | 8096/10000 [00:05<00:01, 1351.67it/s]
82%|████████▏ | 8232/10000 [00:05<00:01, 1350.14it/s]
84%|████████▎ | 8369/10000 [00:06<00:01, 1354.40it/s]
85%|████████▌ | 8505/10000 [00:06<00:01, 1353.60it/s]
86%|████████▋ | 8641/10000 [00:06<00:01, 1352.96it/s]
88%|████████▊ | 8777/10000 [00:06<00:00, 1350.07it/s]
89%|████████▉ | 8913/10000 [00:06<00:00, 1350.64it/s]
91%|█████████ | 9053/10000 [00:06<00:00, 1362.44it/s]
92%|█████████▏| 9190/10000 [00:06<00:00, 1354.36it/s]
93%|█████████▎| 9326/10000 [00:06<00:00, 1347.15it/s]
95%|█████████▍| 9463/10000 [00:06<00:00, 1353.14it/s]
96%|█████████▌| 9601/10000 [00:06<00:00, 1359.32it/s]
97%|█████████▋| 9738/10000 [00:07<00:00, 1359.82it/s]
99%|█████████▉| 9878/10000 [00:07<00:00, 1370.32it/s]
100%|██████████| 10000/10000 [00:07<00:00, 1334.90it/s]
Define new variables
Defining new variables...: 0%| | 0/10 [00:00<?, ?it/s]
Defining new variables...: 20%|██ | 2/10 [00:00<00:00, 14.35it/s]
Defining new variables...: 40%|████ | 4/10 [00:00<00:00, 14.00it/s]
Defining new variables...: 60%|██████ | 6/10 [00:00<00:00, 13.78it/s]
Defining new variables...: 80%|████████ | 8/10 [00:00<00:00, 13.53it/s]
Defining new variables...: 100%|██████████| 10/10 [00:00<00:00, 13.36it/s]
Defining new variables...: 100%|██████████| 10/10 [00:00<00:00, 13.55it/s]
File logit_asian_10_alt.dat has been created.
logprob = the_model_generation.get_logit()
the_biogeme = bio.BIOGEME(biogeme_database, logprob)
the_biogeme.modelName = MODEL_NAME
Default values of the Biogeme parameters are used.
File biogeme.toml has been created
Calculate the null log likelihood for reporting.
the_biogeme.calculate_null_loglikelihood({i: 1 for i in range(SAMPLE_SIZE)})
np.float64(-23025.850929942502)
Estimate the parameters
results = the_biogeme.estimate(recycle=False)
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __logit_asian_10_alt.iter
Cannot read file __logit_asian_10_alt.iter. Statement is ignored.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter. beta_chinese beta_ethiopian beta_french beta_indian beta_japanese beta_korean beta_lebanese beta_log_dist beta_mexican beta_price beta_rating Function Relgrad Radius Rho
0 0.04 -0.08 0.13 0.36 1 0.44 -0.046 -0.67 0.84 -0.4 0.46 1.9e+04 0.065 10 1 ++
1 0.58 0.4 0.6 0.87 1.1 0.69 0.64 -0.58 1.1 -0.4 0.7 1.8e+04 0.0099 1e+02 1.1 ++
2 0.61 0.44 0.64 0.92 1.2 0.71 0.65 -0.59 1.2 -0.41 0.75 1.8e+04 0.00032 1e+03 1 ++
3 0.61 0.44 0.64 0.92 1.2 0.71 0.65 -0.59 1.2 -0.41 0.75 1.8e+04 3.6e-07 1e+03 1 ++
Results saved in file logit_asian_10_alt.html
Results saved in file logit_asian_10_alt.pickle
print(results.short_summary())
Results for model logit_asian_10_alt
Nbr of parameters: 11
Sample size: 10000
Excluded data: 0
Null log likelihood: -23025.85
Final log likelihood: -18451.88
Likelihood ratio test (null): 9147.951
Rho square (null): 0.199
Rho bar square (null): 0.198
Akaike Information Criterion: 36925.75
Bayesian Information Criterion: 37005.06
estimated_parameters = results.get_estimated_parameters()
estimated_parameters
df, msg = compare(estimated_parameters)
print(df)
Name True Value Estimated Value T-Test
0 beta_rating 0.75 0.751644 -0.106559
1 beta_price -0.40 -0.408117 0.634324
2 beta_chinese 0.75 0.607892 2.828786
3 beta_japanese 1.25 1.152123 2.099197
4 beta_korean 0.75 0.713678 0.854456
5 beta_indian 1.00 0.917637 1.915316
6 beta_french 0.75 0.637537 1.816087
7 beta_mexican 1.25 1.181418 1.878265
8 beta_lebanese 0.75 0.655038 1.515505
9 beta_ethiopian 0.50 0.436107 1.259136
10 beta_log_dist -0.60 -0.586987 -0.874276
print(msg)
Parameters not estimated: ['mu_asian', 'mu_downtown']
Total running time of the script: (0 minutes 11.010 seconds)