Note
Go to the end to download the full example code
Discrete mixture with panel data
Example of a discrete mixture of logit models, also called latent class model. The class membership model includes socio-economic variables. The datafile is organized as panel data.
- author:
Michel Bierlaire, EPFL
- date:
Mon Apr 10 12:07:15 2023
import biogeme.biogeme_logging as blog
import biogeme.biogeme as bio
from biogeme import models
from biogeme.expressions import (
Beta,
Variable,
bioDraws,
MonteCarlo,
log,
exp,
bioMultSum,
)
See the data processing script: Panel data preparation for Swissmetro.
from swissmetro_panel import (
flat_database,
SM_AV,
CAR_AV_SP,
TRAIN_AV_SP,
INCOME,
)
logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example b16panel_discrete_socio_eco.py')
Example b16panel_discrete_socio_eco.py
Parameters to be estimated. One version for each latent class.
NUMBER_OF_CLASSES = 2
B_COST = [Beta(f'B_COST_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]
Define a random parameter, normally distributed across individuals, designed to be used for Monte-Carlo simulation.
B_TIME = [Beta(f'B_TIME_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]
It is advised not to use 0 as starting value for the following parameter.
B_TIME_S = [
Beta(f'B_TIME_S_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
B_TIME_RND = [
B_TIME[i] + B_TIME_S[i] * bioDraws(f'B_TIME_RND_class{i}', 'NORMAL_ANTI')
for i in range(NUMBER_OF_CLASSES)
]
We do the same for the constants, to address serial correlation.
ASC_CAR = [
Beta(f'ASC_CAR_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_CAR_S = [
Beta(f'ASC_CAR_S_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_CAR_RND = [
ASC_CAR[i] + ASC_CAR_S[i] * bioDraws(f'ASC_CAR_RND_class{i}', 'NORMAL_ANTI')
for i in range(NUMBER_OF_CLASSES)
]
ASC_TRAIN = [
Beta(f'ASC_TRAIN_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_TRAIN_S = [
Beta(f'ASC_TRAIN_S_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_TRAIN_RND = [
ASC_TRAIN[i] + ASC_TRAIN_S[i] * bioDraws(f'ASC_TRAIN_RND_class{i}', 'NORMAL_ANTI')
for i in range(NUMBER_OF_CLASSES)
]
ASC_SM = [Beta(f'ASC_SM_class{i}', 0, None, None, 1) for i in range(NUMBER_OF_CLASSES)]
ASC_SM_S = [
Beta(f'ASC_SM_S_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
ASC_SM_RND = [
ASC_SM[i] + ASC_SM_S[i] * bioDraws(f'ASC_SM_RND_class{i}', 'NORMAL_ANTI')
for i in range(NUMBER_OF_CLASSES)
]
Parameters for the class membership model.
CLASS_CTE = Beta('CLASS_CTE', 0, None, None, 0)
CLASS_INC = Beta('CLASS_INC', 0, None, None, 0)
In class 0, it is assumed that the time coefficient is zero
B_TIME_RND[0] = 0
Utility functions
V1 = [
[
ASC_TRAIN_RND[i]
+ B_TIME_RND[i] * Variable(f'{t}_TRAIN_TT_SCALED')
+ B_COST[i] * Variable(f'{t}_TRAIN_COST_SCALED')
for t in range(1, 10)
]
for i in range(NUMBER_OF_CLASSES)
]
V2 = [
[
ASC_SM_RND[i]
+ B_TIME_RND[i] * Variable(f'{t}_SM_TT_SCALED')
+ B_COST[i] * Variable(f'{t}_SM_COST_SCALED')
for t in range(1, 10)
]
for i in range(NUMBER_OF_CLASSES)
]
V3 = [
[
ASC_CAR_RND[i]
+ B_TIME_RND[i] * Variable(f'{t}_CAR_TT_SCALED')
+ B_COST[i] * Variable(f'{t}_CAR_CO_SCALED')
for t in range(1, 10)
]
for i in range(NUMBER_OF_CLASSES)
]
V = [
[{1: V1[i][t], 2: V2[i][t], 3: V3[i][t]} for t in range(9)]
for i in range(NUMBER_OF_CLASSES)
]
Associate the availability conditions with the alternatives
av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}
The choice model is a discrete mixture of logit, with availability conditions We calculate the conditional probability for each class.
prob = [
exp(
bioMultSum(
[models.loglogit(V[i][t], av, Variable(f'{t+1}_CHOICE')) for t in range(9)]
)
)
for i in range(NUMBER_OF_CLASSES)
]
Class membership model.
W = CLASS_CTE + CLASS_INC * INCOME
PROB_class0 = models.logit({0: W, 1: 0}, None, 0)
PROB_class1 = models.logit({0: W, 1: 0}, None, 1)
Conditional on the random variables, likelihood for the individual.
probIndiv = PROB_class0 * prob[0] + PROB_class1 * prob[1]
We integrate over the random variables using Monte-Carlo
logprob = log(MonteCarlo(probIndiv))
Create the Biogeme object. As the objective is to illustrate the syntax, we calculate the Monte-Carlo approximation with a small number of draws. To achieve that, we provide a parameter file different from the default one.
the_biogeme = bio.BIOGEME(flat_database, logprob, parameter_file='few_draws.toml')
the_biogeme.modelName = 'b16panel_discrete_socio_eco'
File few_draws.toml has been parsed.
Estimate the parameters.
results = the_biogeme.estimate()
*** Initial values of the parameters are obtained from the file __b16panel_discrete_socio_eco.iter
Cannot read file __b16panel_discrete_socio_eco.iter. Statement is ignored.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter. Function Relgrad Radius Rho
0 4.1e+03 0.031 10 0.95 ++
1 4.1e+03 0.031 5 -0.059 -
2 3.7e+03 0.037 5 0.54 +
3 3.7e+03 0.037 2.5 -1.8 -
4 3.7e+03 0.037 1.2 -2.7 -
5 3.7e+03 0.037 0.62 -0.88 -
6 3.7e+03 0.035 0.62 0.68 +
7 3.6e+03 0.006 6.2 1 ++
8 3.6e+03 0.006 3.1 -1.7 -
9 3.6e+03 0.006 1.6 -0.018 -
10 3.6e+03 0.0081 1.6 0.63 +
11 3.6e+03 0.0091 1.6 0.27 +
12 3.6e+03 0.0091 0.78 -0.94 -
13 3.6e+03 0.0091 0.39 -0.22 -
14 3.6e+03 0.015 0.39 0.35 +
15 3.6e+03 0.0032 0.39 0.88 +
16 3.5e+03 0.0043 3.9 1.4 ++
17 3.5e+03 0.0043 2 -4.1 -
18 3.5e+03 0.0043 0.98 -2 -
19 3.5e+03 0.0052 0.98 0.31 +
20 3.5e+03 0.00056 9.8 1 ++
21 3.5e+03 0.00077 9.8 0.75 +
22 3.5e+03 9.4e-05 98 1.1 ++
23 3.5e+03 5.8e-06 98 1 ++
Results saved in file b16panel_discrete_socio_eco.html
Results saved in file b16panel_discrete_socio_eco.pickle
print(results.short_summary())
Results for model b16panel_discrete_socio_eco
Nbr of parameters: 16
Sample size: 752
Excluded data: 0
Final log likelihood: -3545.441
Akaike Information Criterion: 7122.881
Bayesian Information Criterion: 7196.845
pandas_results = results.getEstimatedParameters()
pandas_results
Total running time of the script: (1 minutes 14.091 seconds)