16. Discrete mixture with panel data¶

Example of a discrete mixture of logit models, also called latent class model. The class membership model includes socio-economic variables. The datafile is organized as panel data.

Michel Bierlaire, EPFL Mon Jun 23 2025, 16:29:45

import biogeme.biogeme_logging as blog
from IPython.core.display_functions import display
from biogeme.biogeme import BIOGEME
from biogeme.expressions import (
    Beta,
    Draws,
    ExpressionOrNumeric,
    MonteCarlo,
    PanelLikelihoodTrajectory,
    log,
)
from biogeme.models import logit
from biogeme.results_processing import (
    EstimationResults,
    get_pandas_estimated_parameters,
)

See the data processing script: Panel data preparation for Swissmetro.

from swissmetro_panel import (
    CAR_AV_SP,
    CAR_CO_SCALED,
    CAR_TT_SCALED,
    CHOICE,
    INCOME,
    SM_AV,
    SM_COST_SCALED,
    SM_TT_SCALED,
    TRAIN_AV_SP,
    TRAIN_COST_SCALED,
    TRAIN_TT_SCALED,
    database,
)

logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example b16_panel_discrete_socio_eco.py')

Example b16_panel_discrete_socio_eco.py

Parameters to be estimated. One version for each latent class.

NUMBER_OF_CLASSES = 2
b_cost = [Beta(f'b_cost_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]

Define a random parameter, normally distributed across individuals, designed to be used for Monte-Carlo simulation.

b_time = [Beta(f'b_time_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]

It is advised not to use 0 as starting value for the following parameter.

b_time_s = [
    Beta(f'b_time_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
b_time_rnd: list[ExpressionOrNumeric] = [
    b_time[i] + b_time_s[i] * Draws(f'b_time_rnd_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

We do the same for the constants, to address serial correlation.

asc_car = [
    Beta(f'asc_car_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_car_s = [
    Beta(f'asc_car_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_car_rnd = [
    asc_car[i] + asc_car_s[i] * Draws(f'asc_car_rnd_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

asc_train = [
    Beta(f'asc_train_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_train_s = [
    Beta(f'asc_train_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_train_rnd = [
    asc_train[i] + asc_train_s[i] * Draws(f'asc_train_rnd_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

asc_sm = [Beta(f'asc_sm_class{i}', 0, None, None, 1) for i in range(NUMBER_OF_CLASSES)]
asc_sm_s = [
    Beta(f'asc_sm_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_sm_rnd = [
    asc_sm[i] + asc_sm_s[i] * Draws(f'asc_sm_rnd_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

Parameters for the class membership model.

class_cte = Beta('class_cte', 0, None, None, 0)
class_inc = Beta('class_inc', 0, None, None, 0)

In class 0, it is assumed that the time coefficient is zero

b_time_rnd[0] = 0

Utility functions.

v_train_per_class = [
    asc_train_rnd[i] + b_time_rnd[i] * TRAIN_TT_SCALED + b_cost[i] * TRAIN_COST_SCALED
    for i in range(NUMBER_OF_CLASSES)
]
v_swissmetro_per_class = [
    asc_sm_rnd[i] + b_time_rnd[i] * SM_TT_SCALED + b_cost[i] * SM_COST_SCALED
    for i in range(NUMBER_OF_CLASSES)
]
v_car_per_class = [
    asc_car_rnd[i] + b_time_rnd[i] * CAR_TT_SCALED + b_cost[i] * CAR_CO_SCALED
    for i in range(NUMBER_OF_CLASSES)
]
v = [
    {1: v_train_per_class[i], 2: v_swissmetro_per_class[i], 3: v_car_per_class[i]}
    for i in range(NUMBER_OF_CLASSES)
]

Associate the availability conditions with the alternatives

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

The choice model is a discrete mixture of logit, with availability conditions We calculate the conditional probability for each class.

choice_probability_per_class = [
    PanelLikelihoodTrajectory(logit(v[i], av, CHOICE)) for i in range(NUMBER_OF_CLASSES)
]

Class membership model.

score_class_0 = class_cte + class_inc * INCOME
prob_class0 = logit({0: score_class_0, 1: 0}, None, 0)
prob_class1 = logit({0: score_class_0, 1: 0}, None, 1)

Conditional on the random variables, likelihood for the individual.

conditional_choice_probability = (
    prob_class0 * choice_probability_per_class[0]
    + prob_class1 * choice_probability_per_class[1]
)

We integrate over the random variables using Monte-Carlo

log_probability = log(MonteCarlo(conditional_choice_probability))

The model is complex, and there are numerical issues when calculating the second derivatives. Therefore, we instruct Biogeme not to evaluate the second derivatives. As a consequence, the statistics reported after estimation are based on the BHHH matrix instead of the Rao-Cramer bound.

the_biogeme = BIOGEME(
    database,
    log_probability,
    number_of_draws=5_000,
    seed=1223,
    calculating_second_derivatives='never',
)
the_biogeme.model_name = 'b16_panel_discrete_socio_eco'

Biogeme parameters read from biogeme.toml.

Estimate the parameters.

try:
    results = EstimationResults.from_yaml_file(
        filename=f'saved_results/{the_biogeme.model_name}.yaml'
    )
except FileNotFoundError:
    results = the_biogeme.estimate()

Flattening database [(6768, 38)].
Database flattened [(752, 362)]
*** Initial values of the parameters are obtained from the file __b16_panel_discrete_socio_eco.iter
Cannot read file __b16_panel_discrete_socio_eco.iter. Statement is ignored.
Starting values for the algorithm: {}
As the model is rather complex, we cancel the calculation of second derivatives. If you want to control the parameters, change the algorithm from "automatic" to "simple_bounds" in the TOML file.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: BFGS with trust region for simple bounds
Iter.     Function    Relgrad   Radius      Rho
      4e+03      0.035        1     0.45    +
    3.9e+03      0.034        1      0.3    +
    3.7e+03      0.036        1     0.39    +
    3.6e+03      0.019        1      0.4    +
    3.6e+03       0.03        1     0.48    +
    3.6e+03      0.011        1     0.37    +
    3.6e+03      0.025        1     0.11    +
    3.5e+03     0.0068        1     0.59    +
    3.5e+03     0.0068      0.5     -1.8    -
    3.5e+03     0.0068     0.25    -0.71    -
    3.5e+03     0.0068     0.12    -0.22    -
    3.5e+03     0.0038     0.12     0.43    +
    3.5e+03     0.0068     0.12     0.82    +
    3.5e+03     0.0081     0.12     0.68    +
    3.5e+03     0.0029      1.2     0.95   ++
    3.5e+03     0.0029     0.62     -1.6    -
    3.5e+03     0.0029     0.31    -0.15    -
    3.5e+03     0.0033     0.31     0.51    +
    3.5e+03     0.0058     0.31     0.28    +
    3.5e+03     0.0017     0.31     0.25    +
    3.5e+03     0.0017     0.16    -0.84    -
    3.5e+03     0.0034     0.16      0.2    +
    3.5e+03     0.0015     0.16      0.8    +
    3.5e+03     0.0011     0.16     0.38    +
    3.5e+03     0.0013      1.6        1   ++
    3.5e+03     0.0013     0.78     -2.6    -
    3.5e+03     0.0013     0.39     -2.1    -
    3.5e+03     0.0013      0.2     -0.7    -
    3.5e+03     0.0013    0.098    -0.28    -
    3.5e+03     0.0018    0.098     0.23    +
    3.5e+03      0.001    0.098     0.84    +
    3.5e+03     0.0011    0.098     0.33    +
    3.5e+03    0.00047     0.98     0.97   ++
    3.5e+03     0.0033     0.98     0.72    +
    3.5e+03     0.0033     0.49    -0.21    -
    3.5e+03     0.0018     0.49     0.15    +
    3.5e+03     0.0018     0.24    -0.82    -
    3.5e+03     0.0016     0.24     0.67    +
    3.5e+03     0.0016     0.12     -1.3    -
    3.5e+03     0.0012     0.12     0.42    +
    3.5e+03     0.0012    0.061    -0.79    -
    3.5e+03     0.0022    0.061     0.31    +
    3.5e+03     0.0014    0.061     0.31    +
    3.5e+03    0.00047    0.061     0.62    +
    3.5e+03     0.0004    0.061     0.81    +
    3.5e+03     0.0004    0.031    -0.12    -
    3.5e+03     0.0004    0.015     -0.3    -
    3.5e+03    0.00053    0.015     0.19    +
    3.5e+03    0.00014     0.15     0.93   ++
    3.5e+03    0.00024     0.15     0.75    +
    3.5e+03    0.00024    0.076    -0.11    -
    3.5e+03    0.00017    0.076      0.3    +
    3.5e+03    0.00034    0.076     0.21    +
    3.5e+03    0.00014    0.076     0.26    +
    3.5e+03    0.00034    0.076     0.25    +
    3.5e+03    0.00034    0.076     0.22    +
    3.5e+03    0.00034    0.038    -0.42    -
    3.5e+03    0.00033    0.038     0.44    +
    3.5e+03    0.00019    0.038     0.63    +
    3.5e+03    4.7e-05     0.38     0.92   ++
    3.5e+03    5.9e-05      3.8      1.4   ++
    3.5e+03    9.6e-05      3.8     0.21    +
    3.5e+03    7.6e-05      3.8      0.8    +
    3.5e+03      4e-05      3.8     0.75    +
    3.5e+03    4.7e-05       38      1.7   ++
    3.5e+03    4.7e-05      0.3     -0.1    -
    3.5e+03    7.3e-05        3      1.3   ++
    3.5e+03    7.3e-05     0.33     -7.2    -
    3.5e+03    7.3e-05     0.16    -0.74    -
    3.5e+03    0.00016     0.16     0.67    +
    3.5e+03    7.4e-05     0.16     0.58    +
    3.5e+03    7.2e-05     0.16     0.38    +
    3.5e+03    5.2e-05     0.16     0.79    +
    3.5e+03    5.2e-05    0.033    -0.41    -
    3.5e+03    5.2e-05    0.016    -0.31    -
    3.5e+03    0.00011    0.016      0.6    +
    3.5e+03    3.6e-05    0.016     0.77    +
    3.5e+03    4.7e-06    0.016     0.64    +
Optimization algorithm has converged.
Relative gradient: 4.749699240944466e-06
Cause of termination: Relative gradient = 4.7e-06 <= 6.1e-06
Number of function evaluations: 189
Number of gradient evaluations: 111
Number of hessian evaluations: 0
Algorithm: BFGS with trust region for simple bound constraints
Number of iterations: 78
Proportion of Hessian calculation: 0/55 = 0.0%
Optimization time: 0:02:38.021463
Calculate BHHH
File b16_panel_discrete_socio_eco.html has been generated.
File b16_panel_discrete_socio_eco.yaml has been generated.

print(results.short_summary())

Results for model b16_panel_discrete_socio_eco
Nbr of parameters:              16
Sample size:                    752
Observations:                   6768
Excluded data:                  0
Final log likelihood:           -3524.399
Akaike Information Criterion:   7080.798
Bayesian Information Criterion: 7154.762

pandas_results = get_pandas_estimated_parameters(estimation_results=results)
display(pandas_results)

                  Name     Value  BHHH std err.  BHHH t-stat.  BHHH p-value
          class_cte -1.276124       0.483823     -2.637584  8.349885e-03
          class_inc -0.201190       0.179175     -1.122870  2.614927e-01
   asc_train_class0 -0.956264       0.677309     -1.411859  1.579914e-01
 asc_train_s_class0  3.039756       0.646170      4.704268  2.547778e-06
      b_cost_class0 -1.174641       0.549715     -2.136820  3.261266e-02
    asc_sm_s_class0  0.078896       6.116803      0.012898  9.897090e-01
     asc_car_class0 -4.871382       1.645420     -2.960571  3.070692e-03
   asc_car_s_class0  6.157855       1.884002      3.268497  1.081203e-03
   asc_train_class1 -0.351970       0.287517     -1.224170  2.208879e-01
 asc_train_s_class1  1.754739       0.458400      3.827965  1.292071e-04
     b_time_class1 -6.952903       0.365772    -19.008831  0.000000e+00
   b_time_s_class1  3.277018       0.369304      8.873506  0.000000e+00
     b_cost_class1 -4.747042       0.252347    -18.811558  0.000000e+00
   asc_sm_s_class1  1.707763       0.332193      5.140873  2.734650e-07
    asc_car_class1  1.013071       0.216166      4.686542  2.778594e-06
  asc_car_s_class1  2.874308       0.261210     11.003840  0.000000e+00

Total running time of the script: (2 minutes 51.277 seconds)

Gallery generated by Sphinx-Gallery