Discrete mixture with panel dataΒΆ

Example of a discrete mixture of logit models, also called latent_old class model. The class membership model includes socio-economic variables. The datafile is organized as panel data.

Michel Bierlaire, EPFL Mon Jun 23 2025, 16:29:45

from IPython.core.display_functions import display

See the data processing script: Panel data preparation for Swissmetro.

from swissmetro_panel import (
    CAR_AV_SP,
    CAR_CO_SCALED,
    CAR_TT_SCALED,
    CHOICE,
    INCOME,
    SM_AV,
    SM_COST_SCALED,
    SM_TT_SCALED,
    TRAIN_AV_SP,
    TRAIN_COST_SCALED,
    TRAIN_TT_SCALED,
    database,
)

import biogeme.biogeme_logging as blog
from biogeme.biogeme import BIOGEME
from biogeme.expressions import (
    Beta,
    Draws,
    ExpressionOrNumeric,
    MonteCarlo,
    PanelLikelihoodTrajectory,
    log,
)
from biogeme.models import logit
from biogeme.results_processing import get_pandas_estimated_parameters

logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example b16panel_discrete_socio_eco.py')
Example b16panel_discrete_socio_eco.py

Parameters to be estimated. One version for each latent_old class.

NUMBER_OF_CLASSES = 2
b_cost = [Beta(f'b_cost_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]

Define a random parameter, normally distributed across individuals, designed to be used for Monte-Carlo simulation.

b_time = [Beta(f'b_time_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)]

It is advised not to use 0 as starting value for the following parameter.

b_time_s = [
    Beta(f'b_time_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
b_time_rnd: list[ExpressionOrNumeric] = [
    b_time[i] + b_time_s[i] * Draws(f'b_time_rnd_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

We do the same for the constants, to address serial correlation.

asc_car = [
    Beta(f'asc_car_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_car_s = [
    Beta(f'asc_car_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_car_rnd = [
    asc_car[i] + asc_car_s[i] * Draws(f'asc_car_rnd_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

asc_train = [
    Beta(f'asc_train_class{i}', 0, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_train_s = [
    Beta(f'asc_train_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_train_rnd = [
    asc_train[i] + asc_train_s[i] * Draws(f'asc_train_rnd_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

asc_sm = [Beta(f'asc_sm_class{i}', 0, None, None, 1) for i in range(NUMBER_OF_CLASSES)]
asc_sm_s = [
    Beta(f'asc_sm_s_class{i}', 1, None, None, 0) for i in range(NUMBER_OF_CLASSES)
]
asc_sm_rnd = [
    asc_sm[i] + asc_sm_s[i] * Draws(f'asc_sm_rnd_class{i}', 'NORMAL_ANTI')
    for i in range(NUMBER_OF_CLASSES)
]

Parameters for the class membership model.

class_cte = Beta('class_cte', 0, None, None, 0)
class_inc = Beta('class_inc', 0, None, None, 0)

In class 0, it is assumed that the time coefficient is zero

b_time_rnd[0] = 0

Utility functions.

v_train_per_class = [
    asc_train_rnd[i] + b_time_rnd[i] * TRAIN_TT_SCALED + b_cost[i] * TRAIN_COST_SCALED
    for i in range(NUMBER_OF_CLASSES)
]
v_swissmetro_per_class = [
    asc_sm_rnd[i] + b_time_rnd[i] * SM_TT_SCALED + b_cost[i] * SM_COST_SCALED
    for i in range(NUMBER_OF_CLASSES)
]
v_car_per_class = [
    asc_car_rnd[i] + b_time_rnd[i] * CAR_TT_SCALED + b_cost[i] * CAR_CO_SCALED
    for i in range(NUMBER_OF_CLASSES)
]
v = [
    {1: v_train_per_class[i], 2: v_swissmetro_per_class[i], 3: v_car_per_class[i]}
    for i in range(NUMBER_OF_CLASSES)
]

Associate the availability conditions with the alternatives

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

The choice model is a discrete mixture of logit, with availability conditions We calculate the conditional probability for each class.

choice_probability_per_class = [
    PanelLikelihoodTrajectory(logit(v[i], av, CHOICE)) for i in range(NUMBER_OF_CLASSES)
]

Class membership model.

score_class_0 = class_cte + class_inc * INCOME
prob_class0 = logit({0: score_class_0, 1: 0}, None, 0)
prob_class1 = logit({0: score_class_0, 1: 0}, None, 1)

Conditional on the random variables, likelihood for the individual.

conditional_choice_probability = (
    prob_class0 * choice_probability_per_class[0]
    + prob_class1 * choice_probability_per_class[1]
)

We integrate over the random variables using Monte-Carlo

log_probability = log(MonteCarlo(conditional_choice_probability))

The model is complex, and there are numerical issues when calculating the second derivatives. Therefore, we instruct Biogeme not to evaluate the second derivatives. As a consequence, the statistics reported after estimation are based on the BHHH matrix instead of the Rao-Cramer bound.

the_biogeme = BIOGEME(
    database,
    log_probability,
    number_of_draws=10_000,
    seed=1223,
    calculating_second_derivatives='never',
)
the_biogeme.model_name = 'b16panel_discrete_socio_eco'
Biogeme parameters read from biogeme.toml.
Flattening database [(6768, 38)].
Database flattened [(752, 362)]
Flattening database [(6768, 38)].
Database flattened [(752, 362)]
Flattening database [(6768, 38)].
Database flattened [(752, 362)]
Flattening database [(6768, 38)].
Database flattened [(752, 362)]

Estimate the parameters.

results = the_biogeme.estimate()
*** Initial values of the parameters are obtained from the file __b16panel_discrete_socio_eco.iter
Parameter values restored from __b16panel_discrete_socio_eco.iter
Starting values for the algorithm: {'class_cte': -1.1838573632904563, 'class_inc': -0.2168566401672651, 'asc_train_class0': -1.0070579844765999, 'asc_train_s_class0': 2.9246102810862666, 'b_cost_class0': -1.245143131711365, 'asc_sm_s_class0': -0.5275443379396421, 'asc_car_class0': -4.8274919022341685, 'asc_car_s_class0': 5.957644303222493, 'asc_train_class1': -0.2685221222610353, 'asc_train_s_class1': 1.5792373811714624, 'b_time_class1': -7.051757160900264, 'b_time_s_class1': 3.3013552142580487, 'b_cost_class1': -4.793999737865342, 'asc_sm_s_class1': 1.942567308396997, 'asc_car_class1': 1.0720186428091785, 'asc_car_s_class1': 2.729276132405476}
As the model is rather complex, we cancel the calculation of second derivatives. If you want to control the parameters, change the algorithm from "automatic" to "simple_bounds" in the TOML file.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: BFGS with trust region for simple bounds
Optimization algorithm has converged.
Relative gradient: 5.147837856960197e-06
Cause of termination: Relative gradient = 5.1e-06 <= 6.1e-06
Number of function evaluations: 1
Number of gradient evaluations: 1
Number of hessian evaluations: 0
Algorithm: BFGS with trust region for simple bound constraints
Number of iterations: 0
Optimization time: 0:00:22.909314
Calculate BHHH
File b16panel_discrete_socio_eco~00.html has been generated.
File b16panel_discrete_socio_eco~00.yaml has been generated.
print(results.short_summary())
Results for model b16panel_discrete_socio_eco
Nbr of parameters:              16
Sample size:                    752
Observations:                   6768
Excluded data:                  0
Final log likelihood:           -3525.907
Akaike Information Criterion:   7083.813
Bayesian Information Criterion: 7157.777
pandas_results = get_pandas_estimated_parameters(estimation_results=results)
display(pandas_results)
                  Name     Value  BHHH std err.  BHHH t-stat.  BHHH p-value
0            class_cte -1.183857       0.468198     -2.528537  1.145389e-02
1            class_inc -0.216857       0.174246     -1.244541  2.133006e-01
2     asc_train_class0 -1.007058       0.638902     -1.576231  1.149725e-01
3   asc_train_s_class0  2.924610       1.040826      2.809894  4.955783e-03
4        b_cost_class0 -1.245143       0.549631     -2.265415  2.348721e-02
5      asc_sm_s_class0 -0.527544       5.292939     -0.099669  9.206068e-01
6       asc_car_class0 -4.827492       1.697726     -2.843505  4.462035e-03
7     asc_car_s_class0  5.957644       1.800618      3.308666  9.374171e-04
8     asc_train_class1 -0.268522       0.287712     -0.933302  3.506638e-01
9   asc_train_s_class1  1.579237       0.505302      3.125331  1.776050e-03
10       b_time_class1 -7.051757       0.380884    -18.514210  0.000000e+00
11     b_time_s_class1  3.301355       0.402533      8.201461  2.220446e-16
12       b_cost_class1 -4.794000       0.254748    -18.818578  0.000000e+00
13     asc_sm_s_class1  1.942567       0.315657      6.154051  7.552836e-10
14      asc_car_class1  1.072019       0.218995      4.895179  9.821623e-07
15    asc_car_s_class1  2.729276       0.267099     10.218233  0.000000e+00

Total running time of the script: (1 minutes 20.455 seconds)

Gallery generated by Sphinx-Gallery