Note
Go to the end to download the full example code.
Discrete mixture with panel data
Example of a discrete mixture of logit models, also called latent class model. The class membership model includes socio-economic variables. The datafile is organized as panel data.
- author:
Michel Bierlaire, EPFL
- date:
Mon Apr 10 12:07:15 2023
import biogeme.biogeme_logging as blog
import biogeme.biogeme as bio
from biogeme import models
from biogeme.expressions import (
Beta,
Variable,
bioDraws,
MonteCarlo,
log,
exp,
bioMultSum,
ExpressionOrNumeric,
)
from biogeme.parameters import Parameters
See the data processing script: Panel data preparation for Swissmetro.
from swissmetro_panel import (
flat_database,
SM_AV,
CAR_AV_SP,
TRAIN_AV_SP,
INCOME,
)
logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example b16panel_discrete_socio_eco.py')
Example b16panel_discrete_socio_eco.py
Parameters to be estimated. One version for each latent class.
NUMBER_OF_CLASSES = 2
B_COST = [Beta(f'B_COST_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]
Define a random parameter, normally distributed across individuals, designed to be used for Monte-Carlo simulation.
B_TIME = [Beta(f'B_TIME_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]
It is advised not to use 0 as starting value for the following parameter.
B_TIME_S = [
Beta(f'B_TIME_S_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
B_TIME_RND: list[ExpressionOrNumeric] = [
B_TIME[i] + B_TIME_S[i] * bioDraws(f'B_TIME_RND_class{i}', 'NORMAL_ANTI')
for i in range(NUMBER_OF_CLASSES)
]
We do the same for the constants, to address serial correlation.
ASC_CAR = [
Beta(f'ASC_CAR_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_CAR_S = [
Beta(f'ASC_CAR_S_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_CAR_RND = [
ASC_CAR[i] + ASC_CAR_S[i] * bioDraws(f'ASC_CAR_RND_class{i}', 'NORMAL_ANTI')
for i in range(NUMBER_OF_CLASSES)
]
ASC_TRAIN = [
Beta(f'ASC_TRAIN_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_TRAIN_S = [
Beta(f'ASC_TRAIN_S_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_TRAIN_RND = [
ASC_TRAIN[i] + ASC_TRAIN_S[i] * bioDraws(f'ASC_TRAIN_RND_class{i}', 'NORMAL_ANTI')
for i in range(NUMBER_OF_CLASSES)
]
ASC_SM = [Beta(f'ASC_SM_class{i}', 0, None, None, 1) for i in range(NUMBER_OF_CLASSES)]
ASC_SM_S = [
Beta(f'ASC_SM_S_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_SM_RND = [
ASC_SM[i] + ASC_SM_S[i] * bioDraws(f'ASC_SM_RND_class{i}', 'NORMAL_ANTI')
for i in range(NUMBER_OF_CLASSES)
]
Parameters for the class membership model.
CLASS_CTE = Beta('CLASS_CTE', 0, None, None, 0)
CLASS_INC = Beta('CLASS_INC', 0, None, None, 0)
In class 0, it is assumed that the time coefficient is zero
B_TIME_RND[0] = 0
Utility functions
V1 = [
[
ASC_TRAIN_RND[i]
+ B_TIME_RND[i] * Variable(f'{t}_TRAIN_TT_SCALED')
+ B_COST[i] * Variable(f'{t}_TRAIN_COST_SCALED')
for t in range(1, 10)
]
for i in range(NUMBER_OF_CLASSES)
]
V2 = [
[
ASC_SM_RND[i]
+ B_TIME_RND[i] * Variable(f'{t}_SM_TT_SCALED')
+ B_COST[i] * Variable(f'{t}_SM_COST_SCALED')
for t in range(1, 10)
]
for i in range(NUMBER_OF_CLASSES)
]
V3 = [
[
ASC_CAR_RND[i]
+ B_TIME_RND[i] * Variable(f'{t}_CAR_TT_SCALED')
+ B_COST[i] * Variable(f'{t}_CAR_CO_SCALED')
for t in range(1, 10)
]
for i in range(NUMBER_OF_CLASSES)
]
V = [
[{1: V1[i][t], 2: V2[i][t], 3: V3[i][t]} for t in range(9)]
for i in range(NUMBER_OF_CLASSES)
]
Associate the availability conditions with the alternatives
av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}
The choice model is a discrete mixture of logit, with availability conditions We calculate the conditional probability for each class.
prob = [
exp(
bioMultSum(
[models.loglogit(V[i][t], av, Variable(f'{t+1}_CHOICE')) for t in range(9)]
)
)
for i in range(NUMBER_OF_CLASSES)
]
Class membership model.
W = CLASS_CTE + CLASS_INC * INCOME
PROB_class0 = models.logit({0: W, 1: 0}, None, 0)
PROB_class1 = models.logit({0: W, 1: 0}, None, 1)
Conditional on the random variables, likelihood for the individual.
probIndiv = PROB_class0 * prob[0] + PROB_class1 * prob[1]
We integrate over the random variables using Monte-Carlo
logprob = log(MonteCarlo(probIndiv))
As the objective is to illustrate the syntax, we calculate the Monte-Carlo approximation with a small number of draws.
the_biogeme = bio.BIOGEME(flat_database, logprob, number_of_draws=100, seed=1223)
the_biogeme.modelName = 'b16panel_discrete_socio_eco'
Biogeme parameters read from biogeme.toml.
Estimate the parameters.
results = the_biogeme.estimate()
As the model is rather complex, we cancel the calculation of second derivatives. If you want to control the parameters, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __b16panel_discrete_socio_eco.iter
Cannot read file __b16panel_discrete_socio_eco.iter. Statement is ignored.
The number of draws (100) is low. The results may not be meaningful.
As the model is rather complex, we cancel the calculation of second derivatives. If you want to control the parameters, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: BFGS with trust region for simple bounds
Iter. Function Relgrad Radius Rho
0 4e+03 0.034 1 0.44 +
1 3.8e+03 0.024 1 0.63 +
2 3.8e+03 0.024 0.5 0.052 -
3 3.7e+03 0.029 0.5 0.52 +
4 3.7e+03 0.028 0.5 0.39 +
5 3.7e+03 0.032 5 0.97 ++
6 3.7e+03 0.032 2.5 -2 -
7 3.6e+03 0.018 2.5 0.51 +
8 3.6e+03 0.018 1.2 -2.6 -
9 3.6e+03 0.018 0.62 -2.7 -
10 3.6e+03 0.018 0.31 -1 -
11 3.6e+03 0.018 0.16 0.057 -
12 3.6e+03 0.0069 0.16 0.59 +
13 3.6e+03 0.011 0.16 0.19 +
14 3.6e+03 0.0036 0.16 0.68 +
15 3.6e+03 0.0073 0.16 0.58 +
16 3.6e+03 0.0029 0.16 0.81 +
17 3.6e+03 0.0055 0.16 0.65 +
18 3.6e+03 0.0039 0.16 0.36 +
19 3.6e+03 0.0045 1.6 1 ++
20 3.6e+03 0.0045 0.78 -2.7 -
21 3.6e+03 0.0045 0.39 -0.61 -
22 3.6e+03 0.0099 0.39 0.15 +
23 3.6e+03 0.0099 0.2 -0.024 -
24 3.6e+03 0.0045 0.2 0.46 +
25 3.6e+03 0.0038 0.2 0.24 +
26 3.5e+03 0.0013 0.2 0.69 +
27 3.5e+03 0.0025 0.2 0.39 +
28 3.5e+03 0.0015 2 0.96 ++
29 3.5e+03 0.0015 0.98 -25 -
30 3.5e+03 0.0015 0.49 -14 -
31 3.5e+03 0.0015 0.24 -5 -
32 3.5e+03 0.0015 0.12 -2.3 -
33 3.5e+03 0.0015 0.061 -1.6 -
34 3.5e+03 0.0015 0.031 -0.38 -
35 3.5e+03 0.0028 0.031 0.43 +
36 3.5e+03 0.0028 0.015 0.02 -
37 3.5e+03 0.00093 0.015 0.54 +
38 3.5e+03 0.00092 0.015 0.88 +
39 3.5e+03 0.00087 0.015 0.89 +
40 3.5e+03 0.00091 0.015 0.89 +
41 3.5e+03 0.00087 0.15 0.97 ++
42 3.5e+03 0.00092 1.5 0.99 ++
43 3.5e+03 0.00092 0.76 -0.67 -
44 3.5e+03 0.0022 7.6 1.1 ++
45 3.5e+03 0.0022 3.1 -26 -
46 3.5e+03 0.0022 1.6 -12 -
47 3.5e+03 0.0022 0.78 -9.8 -
48 3.5e+03 0.0022 0.39 -2.9 -
49 3.5e+03 0.0022 0.2 -0.81 -
50 3.5e+03 0.0022 0.098 -0.17 -
51 3.5e+03 0.002 0.098 0.63 +
52 3.5e+03 0.003 0.098 0.4 +
53 3.5e+03 0.0013 0.098 0.88 +
54 3.5e+03 0.0024 0.098 0.73 +
55 3.5e+03 0.00096 0.098 0.9 +
56 3.5e+03 0.0013 0.98 0.93 ++
57 3.5e+03 0.0014 0.98 0.49 +
58 3.5e+03 0.0014 0.49 -4 -
59 3.5e+03 0.0014 0.24 -2 -
60 3.5e+03 0.0014 0.12 -1.3 -
61 3.5e+03 0.0015 0.12 0.14 +
62 3.5e+03 0.0013 0.12 0.43 +
63 3.5e+03 0.0013 0.061 -0.83 -
64 3.5e+03 0.0011 0.061 0.65 +
65 3.5e+03 0.0011 0.031 -0.19 -
66 3.5e+03 0.00053 0.031 0.44 +
67 3.5e+03 0.00054 0.031 0.27 +
68 3.5e+03 0.0004 0.031 0.84 +
69 3.5e+03 0.00037 0.031 0.46 +
70 3.5e+03 0.00044 0.031 0.26 +
71 3.5e+03 0.00033 0.031 0.69 +
72 3.5e+03 0.00033 0.015 -0.036 -
73 3.5e+03 0.00034 0.015 0.51 +
74 3.5e+03 0.00023 0.015 0.89 +
75 3.5e+03 0.00025 0.15 0.92 ++
76 3.5e+03 0.00027 0.15 0.77 +
77 3.5e+03 0.00027 0.077 -2.9 -
78 3.5e+03 0.00027 0.038 -1.8 -
79 3.5e+03 0.00027 0.019 -0.59 -
80 3.5e+03 0.00027 0.0096 0.083 -
81 3.5e+03 0.00022 0.0096 0.8 +
82 3.5e+03 0.00022 0.0048 -0.25 -
83 3.5e+03 0.00021 0.0048 0.49 +
84 3.5e+03 0.00013 0.0048 0.81 +
85 3.5e+03 0.00013 0.048 0.96 ++
86 3.5e+03 0.00013 0.48 0.96 ++
87 3.5e+03 0.00013 0.16 -3.2 -
88 3.5e+03 0.00013 0.079 -2.2 -
89 3.5e+03 0.00013 0.04 -0.84 -
90 3.5e+03 0.0001 0.04 0.37 -
Results saved in file b16panel_discrete_socio_eco.html
Results saved in file b16panel_discrete_socio_eco.pickle
print(results.short_summary())
Results for model b16panel_discrete_socio_eco
Nbr of parameters: 16
Sample size: 752
Excluded data: 0
Final log likelihood: -3545.388
Akaike Information Criterion: 7122.776
Bayesian Information Criterion: 7196.74
pandas_results = results.get_estimated_parameters()
pandas_results
Total running time of the script: (12 minutes 21.457 seconds)