16. Latent class model with panel data

Bayesian estimation of a discrete mixture of logit models, also called latent class model. The class membership model includes socio-economic variables. The datafile is organized as panel data.

Michel Bierlaire, EPFL Sat Nov 15 2025, 18:12:05

from IPython.core.display_functions import display

import biogeme.biogeme_logging as blog
from biogeme.bayesian_estimation import BayesianResults, get_pandas_estimated_parameters
from biogeme.biogeme import BIOGEME
from biogeme.expressions import (
    Beta,
    DistributedParameter,
    Draws,
    exp,
    log,
)
from biogeme.models import logit

See the data processing script: Panel data preparation for Swissmetro.

from swissmetro_panel import (
    CAR_AV_SP,
    CAR_CO_SCALED,
    CAR_TT_SCALED,
    CHOICE,
    INCOME,
    SM_AV,
    SM_COST_SCALED,
    SM_TT_SCALED,
    TRAIN_AV_SP,
    TRAIN_COST_SCALED,
    TRAIN_TT_SCALED,
    database,
)

logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example b16_panel_discrete_socio_eco.py')
Example b16_panel_discrete_socio_eco.py

Parameters to be estimated. One version for each latent_old class.

NUMBER_OF_CLASSES = 2
b_cost = [Beta(f'b_cost_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]

Define a random parameter, normally distributed across individuals, designed to be used for Monte-Carlo simulation.

b_time = [Beta(f'b_time_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]

It is advised not to use 0 as starting value for the following parameter.

b_time_s = [
    Beta(f'b_time_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
b_time_rnd = [
    DistributedParameter(
        f'b_time_rnd_class{i}',
        b_time[i] + b_time_s[i] * Draws(f'b_time_eps_class{i}', 'NORMAL'),
    )
    for i in range(NUMBER_OF_CLASSES)
]

We do the same for the constants, to address serial correlation.

asc_car = [
    Beta(f'asc_car_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_car_s = [
    Beta(f'asc_car_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_car_rnd = [
    DistributedParameter(
        f'asc_car_rnd_class{i}',
        asc_car[i] + asc_car_s[i] * Draws(f'asc_car_eps_class{i}', 'NORMAL'),
    )
    for i in range(NUMBER_OF_CLASSES)
]

asc_train = [
    Beta(f'asc_train_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_train_s = [
    Beta(f'asc_train_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_train_rnd = [
    DistributedParameter(
        f'asc_train_rnd_class{i}',
        asc_train[i] + asc_train_s[i] * Draws(f'asc_train_eps_class{i}', 'NORMAL'),
    )
    for i in range(NUMBER_OF_CLASSES)
]

asc_sm = [Beta(f'asc_sm_class{i}', 0, None, None, 1) for i in range(NUMBER_OF_CLASSES)]
asc_sm_s = [
    Beta(f'asc_sm_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_sm_rnd = [
    DistributedParameter(
        f'asc_sm_rnd_class{i}',
        asc_sm[i] + asc_sm_s[i] * Draws(f'asc_sm_eps_class{i}', 'NORMAL'),
    )
    for i in range(NUMBER_OF_CLASSES)
]

Parameters for the class membership model.

class_cte = Beta('class_cte', 0, None, None, 0)
class_inc = Beta('class_inc', 0, None, None, 0)

In class 0, it is assumed that the time coefficient is zero

b_time_rnd[0] = 0

Utility functions.

v_train_per_class = [
    asc_train_rnd[i] + b_time_rnd[i] * TRAIN_TT_SCALED + b_cost[i] * TRAIN_COST_SCALED
    for i in range(NUMBER_OF_CLASSES)
]
v_swissmetro_per_class = [
    asc_sm_rnd[i] + b_time_rnd[i] * SM_TT_SCALED + b_cost[i] * SM_COST_SCALED
    for i in range(NUMBER_OF_CLASSES)
]
v_car_per_class = [
    asc_car_rnd[i] + b_time_rnd[i] * CAR_TT_SCALED + b_cost[i] * CAR_CO_SCALED
    for i in range(NUMBER_OF_CLASSES)
]
v_per_class = [
    {1: v_train_per_class[i], 2: v_swissmetro_per_class[i], 3: v_car_per_class[i]}
    for i in range(NUMBER_OF_CLASSES)
]

Associate the availability conditions with the alternatives

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

The choice model is a discrete mixture of logit, with availability conditions We calculate the conditional probability for each class.

conditional_probability_per_class = [
    logit(v_per_class[i], av, CHOICE) for i in range(NUMBER_OF_CLASSES)
]

Class membership model.

score_class_0 = class_cte + class_inc * INCOME
probability_class_1 = 1 / (1 + exp(score_class_0))
probability_class_0 = 1 - probability_class_1

Conditional on the random variables, likelihood for the individual.

conditional_choice_probability = (
    probability_class_0 * conditional_probability_per_class[0]
    + probability_class_1 * conditional_probability_per_class[1]
)

We need the log probability per observation

conditional_log_probability = log(conditional_choice_probability)

%%

the_biogeme = BIOGEME(
    database,
    conditional_log_probability,
    warmup=40,
    bayesian_draws=40,
    chains=1,
)
the_biogeme.model_name = 'b16_panel_discrete_socio_eco'
Biogeme parameters read from biogeme.toml.

Estimate the parameters.

try:
    results = BayesianResults.from_netcdf(
        filename=f'saved_results/{the_biogeme.model_name}.nc'
    )
except FileNotFoundError:
    results = the_biogeme.bayesian_estimation()
Loaded NetCDF file size: 10.9 MB
load finished in 324 ms
print(results.short_summary())
/Users/bierlair/python_envs/venv313/lib/python3.13/site-packages/arviz/stats/stats.py:1667: UserWarning: For one or more samples the posterior variance of the log predictive densities exceeds 0.4. This could be indication of WAIC starting to fail.
See http://arxiv.org/abs/1507.04544 for details
  warnings.warn(
/Users/bierlair/python_envs/venv313/lib/python3.13/site-packages/arviz/stats/stats.py:797: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.38 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  warnings.warn(
loo_res finished in 52 ms
loo finished in 52 ms
Sample size                                              6768
Sampler                                                  NUTS
Number of chains                                         1
Number of draws per chain                                40
Total number of draws                                    40
Acceptance rate target                                   0.9
Run time                                                 0:01:06.651876
Posterior predictive log-likelihood (sum of log mean p)  -2154.03
Expected log-likelihood E[log L(Y|θ)]                    -2347.36
Best-draw log-likelihood (posterior upper bound)         -2289.28
WAIC (Widely Applicable Information Criterion)           -2744.28
WAIC Standard Error                                      80.29
Effective number of parameters (p_WAIC)                  590.25
LOO (Leave-One-Out Cross-Validation)                     -2651.28
LOO Standard Error                                       73.43
Effective number of parameters (p_LOO)                   497.24
pandas_results = get_pandas_estimated_parameters(estimation_results=results)
display(pandas_results)
Shape validation failed: input_shape: (1, 40), minimum_shape: (chains=2, draws=4)
Diagnostics computation took 18.3 seconds (cached).
                  Name  Value (mean)  ...  ESS (bulk)  ESS (tail)
0            class_cte     -1.985520  ...    1.993462   17.407238
1            class_inc     -0.265144  ...    2.829976   21.105249
2     asc_train_class0     -6.728808  ...    4.910675   21.105249
3   asc_train_s_class0     -1.427167  ...    1.691975   24.157661
4        b_cost_class0     -8.732684  ...    3.386415   18.886680
5      asc_sm_s_class0      7.462313  ...    3.103692   44.679600
6       asc_car_class0     -3.999915  ...    4.905896   40.190375
7     asc_car_s_class0     -0.819028  ...   17.224894   40.190375
8     asc_train_class1      0.203905  ...    6.125934   44.679600
9   asc_train_s_class1      2.625066  ...    3.883404   40.190375
10       b_time_class1     -8.874934  ...    1.623690   16.681299
11     b_time_s_class1      5.466661  ...    1.944467   22.003474
12       b_cost_class1     -4.827650  ...    2.680831   44.679600
13     asc_sm_s_class1     -1.767727  ...    2.155726   16.681299
14      asc_car_class1      1.067482  ...    2.194947   21.105249
15    asc_car_s_class1      5.277336  ...    1.990461   16.681299

[16 rows x 12 columns]

Total running time of the script: (0 minutes 18.722 seconds)

Gallery generated by Sphinx-Gallery