Note
Go to the end to download the full example code
Logit
Estimation of a logit model using sampling of alternatives.
- author:
Michel Bierlaire
- date:
Wed Nov 1 17:39:47 2023
import pandas as pd
from biogeme.sampling_of_alternatives import (
SamplingContext,
ChoiceSetsGeneration,
GenerateModel,
generate_segment_size,
)
import biogeme.biogeme_logging as blog
import biogeme.biogeme as bio
from compare import compare
from specification import V, combined_variables
from alternatives import (
alternatives,
ID_COLUMN,
partitions,
)
Number of asian restaurants: 33
logger = blog.get_screen_logger(level=blog.INFO)
The data file contains several columns associated with synthetic choices. Here we arbitrarily select logit_4.
CHOICE_COLUMN = 'logit_4'
SAMPLE_SIZE = 10
PARTITION = 'asian'
MODEL_NAME = f'logit_{PARTITION}_{SAMPLE_SIZE}_alt'
FILE_NAME = f'{MODEL_NAME}.dat'
OBS_FILE = 'obs_choice.dat'
the_partition = partitions.get(PARTITION)
if the_partition is None:
raise ValueError(f'Unknown partition: {PARTITION}')
segment_sizes = generate_segment_size(SAMPLE_SIZE, the_partition.number_of_segments())
observations = pd.read_csv(OBS_FILE)
context = SamplingContext(
the_partition=the_partition,
sample_sizes=segment_sizes,
individuals=observations,
choice_column=CHOICE_COLUMN,
alternatives=alternatives,
id_column=ID_COLUMN,
biogeme_file_name=FILE_NAME,
utility_function=V,
combined_variables=combined_variables,
)
logger.info(context.reporting())
Size of the choice set: 100
Main partition: 2 segment(s) of size 33, 67
Main sample: 10: 5/33, 5/67
the_data_generation = ChoiceSetsGeneration(context=context)
the_model_generation = GenerateModel(context=context)
biogeme_database = the_data_generation.sample_and_merge(recycle=False)
Generating 10 alternatives for 10000 observations
0%| | 0/10000 [00:00<?, ?it/s]
1%| | 77/10000 [00:00<00:12, 768.19it/s]
2%|▏ | 154/10000 [00:00<00:13, 749.42it/s]
2%|▏ | 229/10000 [00:00<00:13, 735.04it/s]
3%|▎ | 303/10000 [00:00<00:13, 729.89it/s]
4%|▍ | 385/10000 [00:00<00:12, 759.12it/s]
5%|▍ | 470/10000 [00:00<00:12, 789.23it/s]
5%|▌ | 549/10000 [00:00<00:12, 767.74it/s]
6%|▋ | 626/10000 [00:00<00:12, 756.75it/s]
7%|▋ | 712/10000 [00:00<00:11, 787.32it/s]
8%|▊ | 794/10000 [00:01<00:11, 795.66it/s]
9%|▊ | 874/10000 [00:01<00:11, 779.76it/s]
10%|▉ | 953/10000 [00:01<00:11, 768.28it/s]
10%|█ | 1032/10000 [00:01<00:11, 774.31it/s]
11%|█ | 1117/10000 [00:01<00:11, 795.44it/s]
12%|█▏ | 1198/10000 [00:01<00:11, 798.44it/s]
13%|█▎ | 1278/10000 [00:01<00:11, 787.08it/s]
14%|█▎ | 1357/10000 [00:01<00:11, 762.93it/s]
14%|█▍ | 1434/10000 [00:01<00:11, 748.33it/s]
15%|█▌ | 1510/10000 [00:01<00:11, 748.53it/s]
16%|█▌ | 1585/10000 [00:02<00:11, 733.15it/s]
17%|█▋ | 1659/10000 [00:02<00:11, 726.96it/s]
17%|█▋ | 1732/10000 [00:02<00:11, 721.21it/s]
18%|█▊ | 1807/10000 [00:02<00:11, 728.32it/s]
19%|█▉ | 1896/10000 [00:02<00:10, 773.43it/s]
20%|█▉ | 1974/10000 [00:02<00:10, 757.47it/s]
20%|██ | 2050/10000 [00:02<00:10, 739.83it/s]
21%|██▏ | 2132/10000 [00:02<00:10, 762.45it/s]
22%|██▏ | 2209/10000 [00:02<00:10, 749.80it/s]
23%|██▎ | 2297/10000 [00:03<00:09, 784.74it/s]
24%|██▍ | 2379/10000 [00:03<00:09, 794.96it/s]
25%|██▍ | 2459/10000 [00:03<00:09, 762.82it/s]
25%|██▌ | 2536/10000 [00:03<00:09, 760.76it/s]
26%|██▌ | 2620/10000 [00:03<00:09, 781.79it/s]
27%|██▋ | 2699/10000 [00:03<00:09, 767.27it/s]
28%|██▊ | 2776/10000 [00:03<00:09, 762.11it/s]
29%|██▊ | 2853/10000 [00:03<00:09, 754.02it/s]
29%|██▉ | 2929/10000 [00:03<00:09, 753.36it/s]
30%|███ | 3006/10000 [00:03<00:09, 758.22it/s]
31%|███ | 3082/10000 [00:04<00:09, 741.74it/s]
32%|███▏ | 3157/10000 [00:04<00:09, 737.00it/s]
32%|███▏ | 3231/10000 [00:04<00:09, 732.87it/s]
33%|███▎ | 3307/10000 [00:04<00:09, 740.03it/s]
34%|███▍ | 3388/10000 [00:04<00:08, 759.61it/s]
35%|███▍ | 3465/10000 [00:04<00:08, 754.25it/s]
35%|███▌ | 3541/10000 [00:04<00:08, 753.29it/s]
36%|███▌ | 3617/10000 [00:04<00:08, 747.23it/s]
37%|███▋ | 3692/10000 [00:04<00:08, 742.36it/s]
38%|███▊ | 3771/10000 [00:04<00:08, 755.80it/s]
39%|███▊ | 3856/10000 [00:05<00:07, 783.21it/s]
39%|███▉ | 3935/10000 [00:05<00:07, 770.56it/s]
40%|████ | 4013/10000 [00:05<00:07, 765.32it/s]
41%|████ | 4090/10000 [00:05<00:07, 757.94it/s]
42%|████▏ | 4175/10000 [00:05<00:07, 783.39it/s]
43%|████▎ | 4254/10000 [00:05<00:07, 774.80it/s]
43%|████▎ | 4332/10000 [00:05<00:07, 748.24it/s]
44%|████▍ | 4408/10000 [00:05<00:07, 740.02it/s]
45%|████▍ | 4485/10000 [00:05<00:07, 747.57it/s]
46%|████▌ | 4572/10000 [00:06<00:06, 782.36it/s]
47%|████▋ | 4651/10000 [00:06<00:06, 782.30it/s]
47%|████▋ | 4730/10000 [00:06<00:06, 783.70it/s]
48%|████▊ | 4827/10000 [00:06<00:06, 836.16it/s]
49%|████▉ | 4911/10000 [00:06<00:06, 824.42it/s]
50%|█████ | 5007/10000 [00:06<00:05, 861.97it/s]
51%|█████ | 5094/10000 [00:06<00:05, 854.64it/s]
52%|█████▏ | 5202/10000 [00:06<00:05, 918.73it/s]
53%|█████▎ | 5307/10000 [00:06<00:04, 954.44it/s]
54%|█████▍ | 5403/10000 [00:06<00:04, 931.00it/s]
55%|█████▍ | 5497/10000 [00:07<00:04, 905.93it/s]
56%|█████▌ | 5588/10000 [00:07<00:04, 902.71it/s]
57%|█████▋ | 5684/10000 [00:07<00:04, 918.30it/s]
58%|█████▊ | 5776/10000 [00:07<00:04, 895.85it/s]
59%|█████▉ | 5875/10000 [00:07<00:04, 922.85it/s]
60%|█████▉ | 5968/10000 [00:07<00:04, 898.42it/s]
61%|██████ | 6059/10000 [00:07<00:04, 861.87it/s]
61%|██████▏ | 6146/10000 [00:07<00:04, 859.69it/s]
62%|██████▏ | 6233/10000 [00:07<00:04, 840.06it/s]
63%|██████▎ | 6319/10000 [00:07<00:04, 845.42it/s]
64%|██████▍ | 6406/10000 [00:08<00:04, 849.68it/s]
65%|██████▍ | 6492/10000 [00:08<00:04, 835.57it/s]
66%|██████▌ | 6576/10000 [00:08<00:04, 826.30it/s]
67%|██████▋ | 6659/10000 [00:08<00:04, 826.40it/s]
68%|██████▊ | 6751/10000 [00:08<00:03, 852.65it/s]
68%|██████▊ | 6837/10000 [00:08<00:03, 853.99it/s]
69%|██████▉ | 6923/10000 [00:08<00:03, 836.16it/s]
70%|███████ | 7014/10000 [00:08<00:03, 854.97it/s]
71%|███████ | 7100/10000 [00:08<00:03, 855.61it/s]
72%|███████▏ | 7192/10000 [00:09<00:03, 872.73it/s]
73%|███████▎ | 7281/10000 [00:09<00:03, 874.97it/s]
74%|███████▎ | 7369/10000 [00:09<00:03, 871.66it/s]
75%|███████▍ | 7460/10000 [00:09<00:02, 881.51it/s]
76%|███████▌ | 7554/10000 [00:09<00:02, 896.59it/s]
76%|███████▋ | 7649/10000 [00:09<00:02, 912.32it/s]
77%|███████▋ | 7741/10000 [00:09<00:02, 893.66it/s]
78%|███████▊ | 7831/10000 [00:09<00:02, 888.19it/s]
79%|███████▉ | 7920/10000 [00:09<00:02, 883.60it/s]
80%|████████ | 8009/10000 [00:09<00:02, 871.79it/s]
81%|████████ | 8108/10000 [00:10<00:02, 905.32it/s]
82%|████████▏ | 8199/10000 [00:10<00:02, 870.72it/s]
83%|████████▎ | 8331/10000 [00:10<00:01, 998.58it/s]
84%|████████▍ | 8432/10000 [00:10<00:01, 892.53it/s]
85%|████████▌ | 8539/10000 [00:10<00:01, 935.87it/s]
86%|████████▋ | 8635/10000 [00:10<00:01, 913.63it/s]
87%|████████▋ | 8728/10000 [00:10<00:01, 858.82it/s]
89%|████████▊ | 8861/10000 [00:10<00:01, 984.83it/s]
90%|████████▉ | 8962/10000 [00:10<00:01, 942.82it/s]
91%|█████████ | 9059/10000 [00:11<00:01, 935.75it/s]
92%|█████████▏| 9154/10000 [00:11<00:00, 897.79it/s]
92%|█████████▏| 9245/10000 [00:11<00:00, 892.81it/s]
93%|█████████▎| 9335/10000 [00:11<00:00, 822.33it/s]
94%|█████████▍| 9419/10000 [00:11<00:00, 805.43it/s]
95%|█████████▌| 9501/10000 [00:11<00:00, 800.46it/s]
96%|█████████▌| 9582/10000 [00:11<00:00, 786.48it/s]
97%|█████████▋| 9661/10000 [00:11<00:00, 782.33it/s]
97%|█████████▋| 9740/10000 [00:11<00:00, 775.33it/s]
98%|█████████▊| 9822/10000 [00:12<00:00, 786.60it/s]
99%|█████████▉| 9901/10000 [00:12<00:00, 782.60it/s]
100%|█████████▉| 9980/10000 [00:12<00:00, 765.56it/s]
100%|██████████| 10000/10000 [00:12<00:00, 792.30it/s]
Define new variables
Defining new variables...: 0%| | 0/10 [00:00<?, ?it/s]
Defining new variables...: 20%|██ | 2/10 [00:00<00:00, 10.09it/s]
Defining new variables...: 40%|████ | 4/10 [00:00<00:00, 9.89it/s]
Defining new variables...: 60%|██████ | 6/10 [00:00<00:00, 10.24it/s]
Defining new variables...: 80%|████████ | 8/10 [00:00<00:00, 9.85it/s]
Defining new variables...: 90%|█████████ | 9/10 [00:00<00:00, 9.68it/s]
Defining new variables...: 100%|██████████| 10/10 [00:01<00:00, 9.57it/s]
Defining new variables...: 100%|██████████| 10/10 [00:01<00:00, 9.77it/s]
File logit_asian_10_alt.dat has been created.
logprob = the_model_generation.get_logit()
the_biogeme = bio.BIOGEME(biogeme_database, logprob)
the_biogeme.modelName = MODEL_NAME
File biogeme.toml has been parsed.
Calculate the null log likelihood for reporting.
the_biogeme.calculateNullLoglikelihood({i: 1 for i in range(SAMPLE_SIZE)})
-23025.850929942502
Estimate the parameters
results = the_biogeme.estimate(recycle=False)
*** Initial values of the parameters are obtained from the file __logit_asian_10_alt.iter
Parameter values restored from __logit_asian_10_alt.iter
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter. beta_chinese beta_ethiopian beta_french beta_indian beta_japanese beta_korean beta_lebanese beta_log_dist beta_mexican beta_price beta_rating Function Relgrad Radius Rho
0 0.62 0.44 0.64 0.93 1.2 0.73 0.71 -0.6 1.2 -0.41 0.76 1.8e+04 1.9e-05 10 1 ++
1 0.62 0.44 0.64 0.93 1.2 0.73 0.71 -0.6 1.2 -0.41 0.76 1.8e+04 1.5e-09 10 1 ++
Results saved in file logit_asian_10_alt~00.html
Results saved in file logit_asian_10_alt~00.pickle
print(results.short_summary())
Results for model logit_asian_10_alt
Nbr of parameters: 11
Sample size: 10000
Excluded data: 0
Null log likelihood: -23025.85
Final log likelihood: -18419.15
Likelihood ratio test (null): 9213.409
Rho square (null): 0.2
Rho bar square (null): 0.2
Akaike Information Criterion: 36860.29
Bayesian Information Criterion: 36939.61
estimated_parameters = results.getEstimatedParameters()
estimated_parameters
df, msg = compare(estimated_parameters)
print(df)
Name True Value Estimated Value T-Test
0 beta_rating 0.75 0.759850 -0.636700
1 beta_price -0.40 -0.405947 0.467069
2 beta_chinese 0.75 0.624533 2.480988
3 beta_japanese 1.25 1.191176 1.261347
4 beta_korean 0.75 0.726871 0.541905
5 beta_indian 1.00 0.927575 1.688384
6 beta_french 0.75 0.641939 1.725793
7 beta_mexican 1.25 1.216204 0.924075
8 beta_lebanese 0.75 0.708292 0.666092
9 beta_ethiopian 0.50 0.441458 1.155270
10 beta_log_dist -0.60 -0.595134 -0.323468
print(msg)
Parameters not estimated: ['mu_asian', 'mu_downtown']
Total running time of the script: (0 minutes 16.782 seconds)