Latent class model

Example of a discrete mixture of logit (or latent class model).

author:

Michel Bierlaire, EPFL

date:

Sun Apr 9 17:57:07 2023

import biogeme.biogeme as bio
from biogeme import models
from biogeme.expressions import Beta, log

See the data processing script: Data preparation for Swissmetro.

from swissmetro_data import (
    database,
    CHOICE,
    SM_AV,
    CAR_AV_SP,
    TRAIN_AV_SP,
    TRAIN_TT_SCALED,
    TRAIN_COST_SCALED,
    SM_TT_SCALED,
    SM_COST_SCALED,
    CAR_TT_SCALED,
    CAR_CO_SCALED,
)

Parameters to be estimated.

ASC_CAR = Beta('ASC_CAR', 0, None, None, 0)
ASC_TRAIN = Beta('ASC_TRAIN', 0, None, None, 0)
ASC_SM = Beta('ASC_SM', 0, None, None, 1)
B_TIME = Beta('B_TIME', 0, None, None, 0)
B_COST = Beta('B_COST', 0, None, None, 0)

Class membership probability.

PROB_CLASS1 = Beta('PROB_CLASS1', 0.5, 0, 1, 0)
PROB_CLASS2 = 1 - PROB_CLASS1

Definition of the utility functions for latent class 1, where the time coefficient is zero.

V11 = ASC_TRAIN + B_COST * TRAIN_COST_SCALED
V12 = ASC_SM + B_COST * SM_COST_SCALED
V13 = ASC_CAR + B_COST * CAR_CO_SCALED

Associate utility functions with the numbering of alternatives.

V1 = {1: V11, 2: V12, 3: V13}

Definition of the utility functions for latent class 2, whete the time coefficient is estimated.

V21 = ASC_TRAIN + B_TIME * TRAIN_TT_SCALED + B_COST * TRAIN_COST_SCALED
V22 = ASC_SM + B_TIME * SM_TT_SCALED + B_COST * SM_COST_SCALED
V23 = ASC_CAR + B_TIME * CAR_TT_SCALED + B_COST * CAR_CO_SCALED

Associate utility functions with the numbering of alternatives.

V2 = {1: V21, 2: V22, 3: V23}

Associate the availability conditions with the alternatives.

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

The choice model is a discrete mixture of logit, with availability conditions

prob1 = models.logit(V1, av, CHOICE)
prob2 = models.logit(V2, av, CHOICE)
prob = PROB_CLASS1 * prob1 + PROB_CLASS2 * prob2
logprob = log(prob)

Create the Biogeme object

the_biogeme = bio.BIOGEME(database, logprob)
the_biogeme.modelName = 'b07discrete_mixture'

Estimate the parameters

results = the_biogeme.estimate()
print(results.short_summary())
Results for model b07discrete_mixture
Nbr of parameters:              5
Sample size:                    6768
Excluded data:                  3960
Final log likelihood:           -5208.498
Akaike Information Criterion:   10427
Bayesian Information Criterion: 10461.1
pandas_results = results.getEstimatedParameters()
pandas_results
Value Rob. Std err Rob. t-test Rob. p-value
ASC_CAR 0.124605 0.050735 2.455992 1.404963e-02
ASC_TRAIN -0.397586 0.062033 -6.409280 1.462082e-10
B_COST -1.264065 0.085606 -14.766051 0.000000e+00
B_TIME -2.797932 0.171663 -16.298946 0.000000e+00
PROB_CLASS1 0.250792 0.021741 11.535649 0.000000e+00


Total running time of the script: (0 minutes 0.836 seconds)

Gallery generated by Sphinx-Gallery