Note
Go to the end to download the full example code.
16. Latent class model with panel data¶
Bayesian estimation of a discrete mixture of logit models, also called latent class model. The class membership model includes socio-economic variables. The datafile is organized as panel data.
Michel Bierlaire, EPFL Sat Nov 15 2025, 18:12:05
from IPython.core.display_functions import display
import biogeme.biogeme_logging as blog
from biogeme.bayesian_estimation import BayesianResults, get_pandas_estimated_parameters
from biogeme.biogeme import BIOGEME
from biogeme.expressions import (
Beta,
DistributedParameter,
Draws,
exp,
log,
)
from biogeme.models import logit
See the data processing script: Panel data preparation for Swissmetro.
from swissmetro_panel import (
CAR_AV_SP,
CAR_CO_SCALED,
CAR_TT_SCALED,
CHOICE,
INCOME,
SM_AV,
SM_COST_SCALED,
SM_TT_SCALED,
TRAIN_AV_SP,
TRAIN_COST_SCALED,
TRAIN_TT_SCALED,
database,
)
logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example b16_panel_discrete_socio_eco.py')
Example b16_panel_discrete_socio_eco.py
Parameters to be estimated. One version for each latent_old class.
NUMBER_OF_CLASSES = 2
b_cost = [Beta(f'b_cost_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]
Define a random parameter, normally distributed across individuals, designed to be used for Monte-Carlo simulation.
b_time = [Beta(f'b_time_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]
It is advised not to use 0 as starting value for the following parameter.
b_time_s = [
Beta(f'b_time_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
b_time_rnd = [
DistributedParameter(
f'b_time_rnd_class{i}',
b_time[i] + b_time_s[i] * Draws(f'b_time_eps_class{i}', 'NORMAL'),
)
for i in range(NUMBER_OF_CLASSES)
]
We do the same for the constants, to address serial correlation.
asc_car = [
Beta(f'asc_car_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_car_s = [
Beta(f'asc_car_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_car_rnd = [
DistributedParameter(
f'asc_car_rnd_class{i}',
asc_car[i] + asc_car_s[i] * Draws(f'asc_car_eps_class{i}', 'NORMAL'),
)
for i in range(NUMBER_OF_CLASSES)
]
asc_train = [
Beta(f'asc_train_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_train_s = [
Beta(f'asc_train_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_train_rnd = [
DistributedParameter(
f'asc_train_rnd_class{i}',
asc_train[i] + asc_train_s[i] * Draws(f'asc_train_eps_class{i}', 'NORMAL'),
)
for i in range(NUMBER_OF_CLASSES)
]
asc_sm = [Beta(f'asc_sm_class{i}', 0, None, None, 1) for i in range(NUMBER_OF_CLASSES)]
asc_sm_s = [
Beta(f'asc_sm_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_sm_rnd = [
DistributedParameter(
f'asc_sm_rnd_class{i}',
asc_sm[i] + asc_sm_s[i] * Draws(f'asc_sm_eps_class{i}', 'NORMAL'),
)
for i in range(NUMBER_OF_CLASSES)
]
Parameters for the class membership model.
class_cte = Beta('class_cte', 0, None, None, 0)
class_inc = Beta('class_inc', 0, None, None, 0)
In class 0, it is assumed that the time coefficient is zero
b_time_rnd[0] = 0
Utility functions.
v_train_per_class = [
asc_train_rnd[i] + b_time_rnd[i] * TRAIN_TT_SCALED + b_cost[i] * TRAIN_COST_SCALED
for i in range(NUMBER_OF_CLASSES)
]
v_swissmetro_per_class = [
asc_sm_rnd[i] + b_time_rnd[i] * SM_TT_SCALED + b_cost[i] * SM_COST_SCALED
for i in range(NUMBER_OF_CLASSES)
]
v_car_per_class = [
asc_car_rnd[i] + b_time_rnd[i] * CAR_TT_SCALED + b_cost[i] * CAR_CO_SCALED
for i in range(NUMBER_OF_CLASSES)
]
v_per_class = [
{1: v_train_per_class[i], 2: v_swissmetro_per_class[i], 3: v_car_per_class[i]}
for i in range(NUMBER_OF_CLASSES)
]
Associate the availability conditions with the alternatives
av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}
The choice model is a discrete mixture of logit, with availability conditions We calculate the conditional probability for each class.
conditional_probability_per_class = [
logit(v_per_class[i], av, CHOICE) for i in range(NUMBER_OF_CLASSES)
]
Class membership model.
score_class_0 = class_cte + class_inc * INCOME
probability_class_1 = 1 / (1 + exp(score_class_0))
probability_class_0 = 1 - probability_class_1
Conditional on the random variables, likelihood for the individual.
conditional_choice_probability = (
probability_class_0 * conditional_probability_per_class[0]
+ probability_class_1 * conditional_probability_per_class[1]
)
We need the log probability per observation
conditional_log_probability = log(conditional_choice_probability)
%%
the_biogeme = BIOGEME(
database,
conditional_log_probability,
warmup=40,
bayesian_draws=40,
chains=1,
)
the_biogeme.model_name = 'b16_panel_discrete_socio_eco'
Biogeme parameters read from biogeme.toml.
Estimate the parameters.
try:
results = BayesianResults.from_netcdf(
filename=f'saved_results/{the_biogeme.model_name}.nc'
)
except FileNotFoundError:
results = the_biogeme.bayesian_estimation()
Loaded NetCDF file size: 10.9 MB
load finished in 324 ms
print(results.short_summary())
/Users/bierlair/python_envs/venv313/lib/python3.13/site-packages/arviz/stats/stats.py:1667: UserWarning: For one or more samples the posterior variance of the log predictive densities exceeds 0.4. This could be indication of WAIC starting to fail.
See http://arxiv.org/abs/1507.04544 for details
warnings.warn(
/Users/bierlair/python_envs/venv313/lib/python3.13/site-packages/arviz/stats/stats.py:797: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.38 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
warnings.warn(
loo_res finished in 52 ms
loo finished in 52 ms
Sample size 6768
Sampler NUTS
Number of chains 1
Number of draws per chain 40
Total number of draws 40
Acceptance rate target 0.9
Run time 0:01:06.651876
Posterior predictive log-likelihood (sum of log mean p) -2154.03
Expected log-likelihood E[log L(Y|θ)] -2347.36
Best-draw log-likelihood (posterior upper bound) -2289.28
WAIC (Widely Applicable Information Criterion) -2744.28
WAIC Standard Error 80.29
Effective number of parameters (p_WAIC) 590.25
LOO (Leave-One-Out Cross-Validation) -2651.28
LOO Standard Error 73.43
Effective number of parameters (p_LOO) 497.24
pandas_results = get_pandas_estimated_parameters(estimation_results=results)
display(pandas_results)
Shape validation failed: input_shape: (1, 40), minimum_shape: (chains=2, draws=4)
Diagnostics computation took 18.3 seconds (cached).
Name Value (mean) ... ESS (bulk) ESS (tail)
0 class_cte -1.985520 ... 1.993462 17.407238
1 class_inc -0.265144 ... 2.829976 21.105249
2 asc_train_class0 -6.728808 ... 4.910675 21.105249
3 asc_train_s_class0 -1.427167 ... 1.691975 24.157661
4 b_cost_class0 -8.732684 ... 3.386415 18.886680
5 asc_sm_s_class0 7.462313 ... 3.103692 44.679600
6 asc_car_class0 -3.999915 ... 4.905896 40.190375
7 asc_car_s_class0 -0.819028 ... 17.224894 40.190375
8 asc_train_class1 0.203905 ... 6.125934 44.679600
9 asc_train_s_class1 2.625066 ... 3.883404 40.190375
10 b_time_class1 -8.874934 ... 1.623690 16.681299
11 b_time_s_class1 5.466661 ... 1.944467 22.003474
12 b_cost_class1 -4.827650 ... 2.680831 44.679600
13 asc_sm_s_class1 -1.767727 ... 2.155726 16.681299
14 asc_car_class1 1.067482 ... 2.194947 21.105249
15 asc_car_s_class1 5.277336 ... 1.990461 16.681299
[16 rows x 12 columns]
Total running time of the script: (0 minutes 18.722 seconds)