Note
Go to the end to download the full example code.
16. Discrete mixture with panel dataΒΆ
Example of a discrete mixture of logit models, also called latent class model. The class membership model includes socio-economic variables. The datafile is organized as panel data.
Michel Bierlaire, EPFL Mon Jun 23 2025, 16:29:45
import biogeme.biogeme_logging as blog
from IPython.core.display_functions import display
from biogeme.biogeme import BIOGEME
from biogeme.expressions import (
Beta,
Draws,
ExpressionOrNumeric,
MonteCarlo,
PanelLikelihoodTrajectory,
log,
)
from biogeme.models import logit
from biogeme.results_processing import (
EstimationResults,
get_pandas_estimated_parameters,
)
See the data processing script: Panel data preparation for Swissmetro.
from swissmetro_panel import (
CAR_AV_SP,
CAR_CO_SCALED,
CAR_TT_SCALED,
CHOICE,
INCOME,
SM_AV,
SM_COST_SCALED,
SM_TT_SCALED,
TRAIN_AV_SP,
TRAIN_COST_SCALED,
TRAIN_TT_SCALED,
database,
)
logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example b16_panel_discrete_socio_eco.py')
Example b16_panel_discrete_socio_eco.py
Parameters to be estimated. One version for each latent class.
NUMBER_OF_CLASSES = 2
b_cost = [Beta(f'b_cost_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]
Define a random parameter, normally distributed across individuals, designed to be used for Monte-Carlo simulation.
b_time = [Beta(f'b_time_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]
It is advised not to use 0 as starting value for the following parameter.
b_time_s = [
Beta(f'b_time_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
b_time_rnd: list[ExpressionOrNumeric] = [
b_time[i] + b_time_s[i] * Draws(f'b_time_rnd_class{i}', 'NORMAL_ANTI')
for i in range(NUMBER_OF_CLASSES)
]
We do the same for the constants, to address serial correlation.
asc_car = [
Beta(f'asc_car_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_car_s = [
Beta(f'asc_car_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_car_rnd = [
asc_car[i] + asc_car_s[i] * Draws(f'asc_car_rnd_class{i}', 'NORMAL_ANTI')
for i in range(NUMBER_OF_CLASSES)
]
asc_train = [
Beta(f'asc_train_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_train_s = [
Beta(f'asc_train_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_train_rnd = [
asc_train[i] + asc_train_s[i] * Draws(f'asc_train_rnd_class{i}', 'NORMAL_ANTI')
for i in range(NUMBER_OF_CLASSES)
]
asc_sm = [Beta(f'asc_sm_class{i}', 0, None, None, 1) for i in range(NUMBER_OF_CLASSES)]
asc_sm_s = [
Beta(f'asc_sm_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_sm_rnd = [
asc_sm[i] + asc_sm_s[i] * Draws(f'asc_sm_rnd_class{i}', 'NORMAL_ANTI')
for i in range(NUMBER_OF_CLASSES)
]
Parameters for the class membership model.
class_cte = Beta('class_cte', 0, None, None, 0)
class_inc = Beta('class_inc', 0, None, None, 0)
In class 0, it is assumed that the time coefficient is zero
b_time_rnd[0] = 0
Utility functions.
v_train_per_class = [
asc_train_rnd[i] + b_time_rnd[i] * TRAIN_TT_SCALED + b_cost[i] * TRAIN_COST_SCALED
for i in range(NUMBER_OF_CLASSES)
]
v_swissmetro_per_class = [
asc_sm_rnd[i] + b_time_rnd[i] * SM_TT_SCALED + b_cost[i] * SM_COST_SCALED
for i in range(NUMBER_OF_CLASSES)
]
v_car_per_class = [
asc_car_rnd[i] + b_time_rnd[i] * CAR_TT_SCALED + b_cost[i] * CAR_CO_SCALED
for i in range(NUMBER_OF_CLASSES)
]
v = [
{1: v_train_per_class[i], 2: v_swissmetro_per_class[i], 3: v_car_per_class[i]}
for i in range(NUMBER_OF_CLASSES)
]
Associate the availability conditions with the alternatives
av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}
The choice model is a discrete mixture of logit, with availability conditions We calculate the conditional probability for each class.
choice_probability_per_class = [
PanelLikelihoodTrajectory(logit(v[i], av, CHOICE)) for i in range(NUMBER_OF_CLASSES)
]
Class membership model.
score_class_0 = class_cte + class_inc * INCOME
prob_class0 = logit({0: score_class_0, 1: 0}, None, 0)
prob_class1 = logit({0: score_class_0, 1: 0}, None, 1)
Conditional on the random variables, likelihood for the individual.
conditional_choice_probability = (
prob_class0 * choice_probability_per_class[0]
+ prob_class1 * choice_probability_per_class[1]
)
We integrate over the random variables using Monte-Carlo
log_probability = log(MonteCarlo(conditional_choice_probability))
The model is complex, and there are numerical issues when calculating the second derivatives. Therefore, we instruct Biogeme not to evaluate the second derivatives. As a consequence, the statistics reported after estimation are based on the BHHH matrix instead of the Rao-Cramer bound.
the_biogeme = BIOGEME(
database,
log_probability,
number_of_draws=5_000,
seed=1223,
calculating_second_derivatives='never',
)
the_biogeme.model_name = 'b16_panel_discrete_socio_eco'
Biogeme parameters read from biogeme.toml.
Estimate the parameters.
try:
results = EstimationResults.from_yaml_file(
filename=f'saved_results/{the_biogeme.model_name}.yaml'
)
except FileNotFoundError:
results = the_biogeme.estimate()
Flattening database [(6768, 38)].
Database flattened [(752, 362)]
*** Initial values of the parameters are obtained from the file __b16_panel_discrete_socio_eco.iter
Cannot read file __b16_panel_discrete_socio_eco.iter. Statement is ignored.
Starting values for the algorithm: {}
As the model is rather complex, we cancel the calculation of second derivatives. If you want to control the parameters, change the algorithm from "automatic" to "simple_bounds" in the TOML file.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: BFGS with trust region for simple bounds
Iter. Function Relgrad Radius Rho
0 4e+03 0.035 1 0.45 +
1 3.9e+03 0.034 1 0.3 +
2 3.7e+03 0.036 1 0.39 +
3 3.6e+03 0.019 1 0.4 +
4 3.6e+03 0.03 1 0.48 +
5 3.6e+03 0.011 1 0.37 +
6 3.6e+03 0.025 1 0.11 +
7 3.5e+03 0.0068 1 0.59 +
8 3.5e+03 0.0068 0.5 -1.8 -
9 3.5e+03 0.0068 0.25 -0.71 -
10 3.5e+03 0.0068 0.12 -0.22 -
11 3.5e+03 0.0038 0.12 0.43 +
12 3.5e+03 0.0068 0.12 0.82 +
13 3.5e+03 0.0081 0.12 0.68 +
14 3.5e+03 0.0029 1.2 0.95 ++
15 3.5e+03 0.0029 0.62 -1.6 -
16 3.5e+03 0.0029 0.31 -0.15 -
17 3.5e+03 0.0033 0.31 0.51 +
18 3.5e+03 0.0058 0.31 0.28 +
19 3.5e+03 0.0017 0.31 0.25 +
20 3.5e+03 0.0017 0.16 -0.84 -
21 3.5e+03 0.0034 0.16 0.2 +
22 3.5e+03 0.0015 0.16 0.8 +
23 3.5e+03 0.0011 0.16 0.38 +
24 3.5e+03 0.0013 1.6 1 ++
25 3.5e+03 0.0013 0.78 -2.6 -
26 3.5e+03 0.0013 0.39 -2.1 -
27 3.5e+03 0.0013 0.2 -0.7 -
28 3.5e+03 0.0013 0.098 -0.28 -
29 3.5e+03 0.0018 0.098 0.23 +
30 3.5e+03 0.001 0.098 0.84 +
31 3.5e+03 0.0011 0.098 0.33 +
32 3.5e+03 0.00047 0.98 0.97 ++
33 3.5e+03 0.0033 0.98 0.72 +
34 3.5e+03 0.0033 0.49 -0.21 -
35 3.5e+03 0.0018 0.49 0.15 +
36 3.5e+03 0.0018 0.24 -0.82 -
37 3.5e+03 0.0016 0.24 0.67 +
38 3.5e+03 0.0016 0.12 -1.3 -
39 3.5e+03 0.0012 0.12 0.42 +
40 3.5e+03 0.0012 0.061 -0.79 -
41 3.5e+03 0.0022 0.061 0.31 +
42 3.5e+03 0.0014 0.061 0.31 +
43 3.5e+03 0.00047 0.061 0.62 +
44 3.5e+03 0.0004 0.061 0.81 +
45 3.5e+03 0.0004 0.031 -0.12 -
46 3.5e+03 0.0004 0.015 -0.3 -
47 3.5e+03 0.00053 0.015 0.19 +
48 3.5e+03 0.00014 0.15 0.93 ++
49 3.5e+03 0.00024 0.15 0.75 +
50 3.5e+03 0.00024 0.076 -0.11 -
51 3.5e+03 0.00017 0.076 0.3 +
52 3.5e+03 0.00034 0.076 0.21 +
53 3.5e+03 0.00014 0.076 0.26 +
54 3.5e+03 0.00034 0.076 0.25 +
55 3.5e+03 0.00034 0.076 0.22 +
56 3.5e+03 0.00034 0.038 -0.42 -
57 3.5e+03 0.00033 0.038 0.44 +
58 3.5e+03 0.00019 0.038 0.63 +
59 3.5e+03 4.7e-05 0.38 0.92 ++
60 3.5e+03 5.9e-05 3.8 1.4 ++
61 3.5e+03 9.6e-05 3.8 0.21 +
62 3.5e+03 7.6e-05 3.8 0.8 +
63 3.5e+03 4e-05 3.8 0.75 +
64 3.5e+03 4.7e-05 38 1.7 ++
65 3.5e+03 4.7e-05 0.3 -0.1 -
66 3.5e+03 7.3e-05 3 1.3 ++
67 3.5e+03 7.3e-05 0.33 -7.2 -
68 3.5e+03 7.3e-05 0.16 -0.74 -
69 3.5e+03 0.00016 0.16 0.67 +
70 3.5e+03 7.4e-05 0.16 0.58 +
71 3.5e+03 7.2e-05 0.16 0.38 +
72 3.5e+03 5.2e-05 0.16 0.79 +
73 3.5e+03 5.2e-05 0.033 -0.41 -
74 3.5e+03 5.2e-05 0.016 -0.31 -
75 3.5e+03 0.00011 0.016 0.6 +
76 3.5e+03 3.6e-05 0.016 0.77 +
77 3.5e+03 4.7e-06 0.016 0.64 +
Optimization algorithm has converged.
Relative gradient: 4.749699240944466e-06
Cause of termination: Relative gradient = 4.7e-06 <= 6.1e-06
Number of function evaluations: 189
Number of gradient evaluations: 111
Number of hessian evaluations: 0
Algorithm: BFGS with trust region for simple bound constraints
Number of iterations: 78
Proportion of Hessian calculation: 0/55 = 0.0%
Optimization time: 0:02:38.021463
Calculate BHHH
File b16_panel_discrete_socio_eco.html has been generated.
File b16_panel_discrete_socio_eco.yaml has been generated.
print(results.short_summary())
Results for model b16_panel_discrete_socio_eco
Nbr of parameters: 16
Sample size: 752
Observations: 6768
Excluded data: 0
Final log likelihood: -3524.399
Akaike Information Criterion: 7080.798
Bayesian Information Criterion: 7154.762
pandas_results = get_pandas_estimated_parameters(estimation_results=results)
display(pandas_results)
Name Value BHHH std err. BHHH t-stat. BHHH p-value
0 class_cte -1.276124 0.483823 -2.637584 8.349885e-03
1 class_inc -0.201190 0.179175 -1.122870 2.614927e-01
2 asc_train_class0 -0.956264 0.677309 -1.411859 1.579914e-01
3 asc_train_s_class0 3.039756 0.646170 4.704268 2.547778e-06
4 b_cost_class0 -1.174641 0.549715 -2.136820 3.261266e-02
5 asc_sm_s_class0 0.078896 6.116803 0.012898 9.897090e-01
6 asc_car_class0 -4.871382 1.645420 -2.960571 3.070692e-03
7 asc_car_s_class0 6.157855 1.884002 3.268497 1.081203e-03
8 asc_train_class1 -0.351970 0.287517 -1.224170 2.208879e-01
9 asc_train_s_class1 1.754739 0.458400 3.827965 1.292071e-04
10 b_time_class1 -6.952903 0.365772 -19.008831 0.000000e+00
11 b_time_s_class1 3.277018 0.369304 8.873506 0.000000e+00
12 b_cost_class1 -4.747042 0.252347 -18.811558 0.000000e+00
13 asc_sm_s_class1 1.707763 0.332193 5.140873 2.734650e-07
14 asc_car_class1 1.013071 0.216166 4.686542 2.778594e-06
15 asc_car_s_class1 2.874308 0.261210 11.003840 0.000000e+00
Total running time of the script: (2 minutes 51.277 seconds)