Note
Go to the end to download the full example code.
15b. Discrete mixture with panel dataΒΆ
- Example of a discrete mixture of logit models, also called latent
class model. The datafile is organized as panel data. Compared to 15a. Discrete mixture with panel data, we integrate before the discrete mixture to show that it is equivalent.
Michel Bierlaire, EPFL Sat Jun 21 2025, 17:22:38
from IPython.core.display_functions import display
See the data processing script: Panel data preparation for Swissmetro.
from swissmetro_panel import (
CAR_AV_SP,
CAR_CO_SCALED,
CAR_TT_SCALED,
CHOICE,
SM_AV,
SM_COST_SCALED,
SM_TT_SCALED,
TRAIN_AV_SP,
TRAIN_COST_SCALED,
TRAIN_TT_SCALED,
database,
)
import biogeme.biogeme_logging as blog
from biogeme.biogeme import BIOGEME
from biogeme.expressions import (
Beta,
Draws,
ExpressionOrNumeric,
MonteCarlo,
PanelLikelihoodTrajectory,
log,
)
from biogeme.models import logit
from biogeme.results_processing import (
EstimationResults,
get_pandas_estimated_parameters,
)
logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example b15panel_discrete_bis.py')
Example b15panel_discrete_bis.py
Parameters to be estimated. One version for each latent class.
NUMBER_OF_CLASSES = 2
b_cost = [Beta(f'b_cost_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]
Define a random parameter, normally distributed across individuals, designed to be used for Monte-Carlo simulation.
b_time = [Beta(f'b_time_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]
It is advised not to use 0 as starting value for the following parameter.
b_time_s = [
Beta(f'b_time_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
b_time_rnd: list[ExpressionOrNumeric] = [
b_time[i] + b_time_s[i] * Draws(f'b_time_rnd_class{i}', 'NORMAL_ANTI')
for i in range(NUMBER_OF_CLASSES)
]
We do the same for the constants, to address serial correlation.
asc_car = [
Beta(f'asc_car_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_car_s = [
Beta(f'asc_car_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_car_rnd = [
asc_car[i] + asc_car_s[i] * Draws(f'asc_car_rnd_class{i}', 'NORMAL_ANTI')
for i in range(NUMBER_OF_CLASSES)
]
asc_train = [
Beta(f'asc_train_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_train_s = [
Beta(f'asc_train_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_TRAIN_RND = [
asc_train[i] + asc_train_s[i] * Draws(f'asc_train_rnd_class{i}', 'NORMAL_ANTI')
for i in range(NUMBER_OF_CLASSES)
]
asc_sm = [Beta(f'asc_sm_class{i}', 0, None, None, 1) for i in range(NUMBER_OF_CLASSES)]
asc_sm_s = [
Beta(f'asc_sm_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_sm_rnd = [
asc_sm[i] + asc_sm_s[i] * Draws(f'asc_sm_rnd_class{i}', 'NORMAL_ANTI')
for i in range(NUMBER_OF_CLASSES)
]
Parameter groups used in the generated reports.
Each group contains the estimated parameters associated with one latent class.
Fixed parameters and parameters that are replaced by identification constraints
are not listed here, as they do not appear in the estimation results. The class
membership parameter is not included in either group, and will therefore be
reported in the automatically generated Other parameters section.
PARAMETER_GROUPS = {
'Class 0': [
'b_cost_class0',
'asc_car_class0',
'asc_car_s_class0',
'asc_train_class0',
'asc_train_s_class0',
'asc_sm_s_class0',
],
'Class 1': [
'b_cost_class1',
'b_time_class1',
'b_time_s_class1',
'asc_car_class1',
'asc_car_s_class1',
'asc_train_class1',
'asc_train_s_class1',
'asc_sm_s_class1',
],
}
Class membership probability.
score_class_0 = Beta('score_class_0', 0, None, None, 0)
probability_class_0 = logit({0: score_class_0, 1: 0}, None, 0)
probability_class_1 = logit({0: score_class_0, 1: 0}, None, 1)
In class 0, it is assumed that the time coefficient is zero.
b_time_rnd[0] = 0
Utility functions.
v_train_per_class = [
ASC_TRAIN_RND[i] + b_time_rnd[i] * TRAIN_TT_SCALED + b_cost[i] * TRAIN_COST_SCALED
for i in range(NUMBER_OF_CLASSES)
]
v_swissmetro_per_class = [
asc_sm_rnd[i] + b_time_rnd[i] * SM_TT_SCALED + b_cost[i] * SM_COST_SCALED
for i in range(NUMBER_OF_CLASSES)
]
v_car_per_class = [
asc_car_rnd[i] + b_time_rnd[i] * CAR_TT_SCALED + b_cost[i] * CAR_CO_SCALED
for i in range(NUMBER_OF_CLASSES)
]
v_per_class = [
{1: v_train_per_class[i], 2: v_swissmetro_per_class[i], 3: v_car_per_class[i]}
for i in range(NUMBER_OF_CLASSES)
]
Associate the availability conditions with the alternatives
av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}
The choice model is a discrete mixture of logit, with availability conditions We calculate the conditional probability for each class.
choice_probability_per_class = [
MonteCarlo(PanelLikelihoodTrajectory(logit(v_per_class[i], av, CHOICE)))
for i in range(NUMBER_OF_CLASSES)
]
Conditional to the random variables, likelihood for the individual.
choice_probability = (
probability_class_0 * choice_probability_per_class[0]
+ probability_class_1 * choice_probability_per_class[1]
)
We integrate over the random variables using Monte-Carlo.
log_probability = log(choice_probability)
The model is complex, and there are numerical issues when calculating the second derivatives. Therefore, we instruct Biogeme not to evaluate the second derivatives. As a consequence, the statistics reported after estimation are based on the BHHH matrix instead of the Rao-Cramer bound.
the_biogeme = BIOGEME(
database,
log_probability,
number_of_draws=5_000,
seed=1223,
calculating_second_derivatives='never',
group_of_parameters=PARAMETER_GROUPS,
)
the_biogeme.model_name = 'b15b_panel_discrete'
Biogeme parameters read from biogeme.toml.
Estimate the parameters.
try:
results = EstimationResults.from_yaml_file(
filename=f'saved_results/{the_biogeme.model_name}.yaml'
)
except FileNotFoundError:
results = the_biogeme.estimate()
print(results.short_summary())
Results for model b15b_panel_discrete
Nbr of parameters: 15
Sample size: 752
Observations: 6768
Excluded data: 0
Final log likelihood: -3524.886
Akaike Information Criterion: 7079.772
Bayesian Information Criterion: 7149.113
pandas_results = get_pandas_estimated_parameters(
estimation_results=results,
group_of_parameters=PARAMETER_GROUPS,
)
for group_name, pandas_table in pandas_results.items():
display(group_name if group_name else 'Estimated parameters')
display(pandas_table)
Class 0
Name Value BHHH std err. BHHH t-stat. BHHH p-value
1 asc_train_class0 -1.047491 0.683214 -1.533182 0.125231
2 asc_train_s_class0 3.112161 0.647272 4.808122 0.000002
3 b_cost_class0 -1.171741 0.536481 -2.184127 0.028953
4 asc_sm_s_class0 0.039437 6.381571 0.006180 0.995069
5 asc_car_class0 -4.859510 1.626500 -2.987710 0.002811
6 asc_car_s_class0 6.071268 1.855649 3.271777 0.001069
Class 1
Name Value BHHH std err. BHHH t-stat. BHHH p-value
7 asc_train_class1 -0.319506 0.284532 -1.122916 2.614732e-01
8 asc_train_s_class1 1.756304 0.458112 3.833789 1.261843e-04
9 b_time_class1 -6.957603 0.367301 -18.942496 0.000000e+00
10 b_time_s_class1 3.281670 0.353771 9.276248 0.000000e+00
11 b_cost_class1 -4.757314 0.253148 -18.792606 0.000000e+00
12 asc_sm_s_class1 1.699070 0.333620 5.092831 3.527559e-07
13 asc_car_class1 1.020241 0.215801 4.727687 2.270923e-06
14 asc_car_s_class1 2.875929 0.260628 11.034597 0.000000e+00
Other parameters
Name Value BHHH std err. BHHH t-stat. BHHH p-value
0 score_class_0 -1.760147 0.22623 -7.780336 7.327472e-15
Total running time of the script: (0 minutes 0.150 seconds)