Note
Go to the end to download the full example code.
12bis. Mixture of logit with panel data and segmented ASCΒΆ
Variant of the panel mixed logit example where one ASC is segmented using the socio-economic variable MALE. This can be used to reproduce the panel renaming issue if shared Variable objects are renamed multiple times.
Michel Bierlaire, EPFL
from IPython.core.display_functions import display
See the data processing script: Panel data preparation for Swissmetro.
from swissmetro_panel import (
CAR_AV_SP,
CAR_CO_SCALED,
CAR_TT_SCALED,
CHOICE,
MALE,
SM_AV,
SM_COST_SCALED,
SM_TT_SCALED,
TRAIN_AV_SP,
TRAIN_COST_SCALED,
TRAIN_TT_SCALED,
database,
)
import biogeme.biogeme_logging as blog
from biogeme.biogeme import BIOGEME
from biogeme.expressions import Beta, Draws, MonteCarlo, PanelLikelihoodTrajectory, log
from biogeme.models import logit
from biogeme.results_processing import (
EstimationResults,
get_pandas_estimated_parameters,
)
logger = blog.get_screen_logger(level=blog.INFO)
logger.info("Example b12_panel_segmented_male.py")
Example b12_panel_segmented_male.py
Parameters to be estimated.
b_cost = Beta("b_cost", 0, None, 0, 0)
Define a random parameter, normally distributed across individuals, designed to be used for Monte-Carlo simulation.
b_time = Beta("b_time", 0, None, 0, 0)
It is advised not to use 0 as starting value for the following parameter.
b_time_s = Beta("b_time_s", 1, 1.0e-5, None, 0)
b_time_rnd = b_time + b_time_s * Draws("b_time_rnd", "NORMAL_ANTI")
Random ASCs to address serial correlation.
asc_car = Beta("asc_car", 0, None, None, 0)
asc_car_s = Beta("asc_car_s", 1, 1.0e-5, None, 0)
asc_car_rnd = asc_car + asc_car_s * Draws("asc_car_rnd", "NORMAL_ANTI")
asc_train = Beta("asc_train", 0, None, None, 0)
asc_train_s = Beta("asc_train_s", 1, 1.0e-5, None, 0)
asc_train_rnd = asc_train + asc_train_s * Draws("asc_train_rnd", "NORMAL_ANTI")
asc_sm = Beta("asc_sm", 0, None, None, 0)
asc_sm_s = Beta("asc_sm_s", 1, 1.0e-5, None, 0)
asc_sm_rnd_base = asc_sm + asc_sm_s * Draws("asc_sm_rnd", "NORMAL_ANTI")
Segmentation of one ASC by the socioeconomic variable MALE. This is intentionally written in a way that reuses the same variable in a comparison expression, to reproduce the original issue.
asc_sm_male = Beta("asc_sm_male", 0, None, None, 0)
asc_sm_rnd = asc_sm_rnd_base + asc_sm_male * (MALE == 1)
asc_train_male = Beta("asc_train_male", 0, None, None, 0)
asc_sm_male = Beta("asc_sm_male", 0, None, None, 0)
asc_car_male = Beta("asc_car_male", 0, None, None, 0)
v_train = (
asc_train_rnd
+ asc_train_male * (MALE == 1)
+ b_time_rnd * TRAIN_TT_SCALED
+ b_cost * TRAIN_COST_SCALED
)
v_swissmetro = (
asc_sm_rnd_base
+ asc_sm_male * (MALE == 1)
+ b_time_rnd * SM_TT_SCALED
+ b_cost * SM_COST_SCALED
)
v_car = (
asc_car_rnd
+ asc_car_male * (MALE == 1)
+ b_time_rnd * CAR_TT_SCALED
+ b_cost * CAR_CO_SCALED
)
Associate utility functions with the numbering of alternatives.
v = {1: v_train, 2: v_swissmetro, 3: v_car}
Associate the availability conditions with the alternatives.
av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}
Conditional on the random parameters, the likelihood of one observation is given by the logit model (called the kernel).
choice_probability_one_observation = logit(v, av, CHOICE)
Conditional on the random parameters, the likelihood of all observations for one individual (the trajectory) is the product of the likelihood of each observation.
conditional_trajectory_probability = PanelLikelihoodTrajectory(
choice_probability_one_observation
)
We integrate over the random parameters using Monte-Carlo.
log_probability = log(MonteCarlo(conditional_trajectory_probability))
the_biogeme = BIOGEME(
database,
log_probability,
number_of_draws=5_000,
seed=1223,
calculating_second_derivatives="never",
)
the_biogeme.model_name = "b12_panel_segmented_male"
Biogeme parameters read from biogeme.toml.
Estimate the parameters.
try:
results = EstimationResults.from_yaml_file(
filename=f"saved_results/{the_biogeme.model_name}.yaml"
)
except FileNotFoundError:
results = the_biogeme.estimate()
print(results.short_summary())
Results for model b12_panel_segmented_male
Nbr of parameters: 12
Sample size: 752
Observations: 6768
Excluded data: 0
Final log likelihood: -3543.843
Akaike Information Criterion: 7111.686
Bayesian Information Criterion: 7167.159
pandas_results = get_pandas_estimated_parameters(estimation_results=results)
display(pandas_results)
{'Estimated parameters': Name Value ... BHHH p-value Active bound
0 asc_train 1.233454 ... 1.461649e-06 False
1 asc_train_s 2.521465 ... 0.000000e+00 False
2 asc_train_male -2.132275 ... 2.626788e-13 False
3 b_time -6.026511 ... 0.000000e+00 False
4 b_time_s 3.502407 ... 0.000000e+00 False
5 b_cost -3.523512 ... 0.000000e+00 False
6 asc_sm 0.029208 ... 8.770092e-01 False
7 asc_sm_s 0.000010 ... 9.999874e-01 True
8 asc_sm_male 0.178862 ... 3.960256e-01 False
9 asc_car -1.145479 ... 4.813668e-04 False
10 asc_car_s 3.925120 ... 0.000000e+00 False
11 asc_car_male 1.998629 ... 3.694476e-08 False
[12 rows x 6 columns]}
List of columns of the flat database generated by Biogeme.
for col in the_biogeme.model_elements.database.dataframe.columns:
print(col)
GROUP
SURVEY
SP
ID
PURPOSE
FIRST
TICKET
WHO
LUGGAGE
AGE
MALE
INCOME
GA
ORIGIN
DEST
TRAIN_AV
CAR_AV
SM_AV
TRAIN_TT
TRAIN_CO
TRAIN_HE
SM_TT
SM_CO
SM_HE
SM_SEATS
CAR_TT
CAR_CO
CHOICE
SM_COST
TRAIN_COST
CAR_AV_SP
TRAIN_AV_SP
TRAIN_TT_SCALED
TRAIN_COST_SCALED
SM_TT_SCALED
SM_COST_SCALED
CAR_TT_SCALED
CAR_CO_SCALED
List of variables that have been renamed in the model specification.
for var in the_biogeme.model_elements.expressions_registry.variables:
print(var)
GROUP
SURVEY
SP
ID
PURPOSE
FIRST
TICKET
WHO
LUGGAGE
AGE
MALE
INCOME
GA
ORIGIN
DEST
TRAIN_AV
CAR_AV
SM_AV
TRAIN_TT
TRAIN_CO
TRAIN_HE
SM_TT
SM_CO
SM_HE
SM_SEATS
CAR_TT
CAR_CO
CHOICE
SM_COST
TRAIN_COST
CAR_AV_SP
TRAIN_AV_SP
TRAIN_TT_SCALED
TRAIN_COST_SCALED
SM_TT_SCALED
SM_COST_SCALED
CAR_TT_SCALED
CAR_CO_SCALED
Total running time of the script: (0 minutes 0.114 seconds)