Choice model with latent variable: sequential estimation

Mixture of logit. Measurement equation for the indicators. Sequential estimation.

author:

Michel Bierlaire, EPFL

date:

Fri Apr 14 09:47:53 2023

import sys
import biogeme.biogeme_logging as blog
import biogeme.exceptions as excep
import biogeme.biogeme as bio
import biogeme.distributions as dist
import biogeme.results as res
from biogeme import models
from biogeme.expressions import (
    Beta,
    RandomVariable,
    exp,
    log,
    Integrate,
)

from read_or_estimate import read_or_estimate

from biogeme.data.optima import (
    read_data,
    male,
    age,
    haveChildren,
    highEducation,
    childCenter,
    childSuburb,
    SocioProfCat,
    WaitingTimePT,
    Choice,
    TimePT_scaled,
    TimeCar_scaled,
    MarginalCostPT_scaled,
    CostCarCHF_scaled,
    distance_km_scaled,
    PurpHWH,
    PurpOther,
)

logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example m02_sequential_estimation.py')
Example m02_sequential_estimation.py

Read the estimates from the structural equation estimation.

MODELNAME = 'm01_latent_variable'
try:
    struct_results = res.bioResults(pickle_file=f'saved_results/{MODELNAME}.pickle')
except excep.BiogemeError:
    print(
        f'Run first the script {MODELNAME}.py in order to generate the '
        f'file {MODELNAME}.pickle, and move it to the directory saved_results'
    )
    sys.exit()
struct_betas = struct_results.get_beta_values()

Coefficients

coef_intercept = struct_betas['coef_intercept']
coef_age_30_less = struct_betas['coef_age_30_less']
coef_male = struct_betas['coef_male']
coef_haveChildren = struct_betas['coef_haveChildren']
coef_highEducation = struct_betas['coef_highEducation']
coef_artisans = struct_betas['coef_artisans']
coef_employees = struct_betas['coef_employees']
coef_child_center = struct_betas['coef_child_center']
coef_child_suburb = struct_betas['coef_child_suburb']

Latent variable: structural equation

Define a random parameter, normally distributed, designed to be used for numerical integration

omega = RandomVariable('omega')
density = dist.normalpdf(omega)
sigma_s = Beta('sigma_s', 1, None, None, 0)
ACTIVELIFE = (
    coef_intercept
    + coef_child_center * childCenter
    + coef_child_suburb * childSuburb
    + coef_highEducation * highEducation
    + coef_artisans * (SocioProfCat == 5)
    + coef_employees * (SocioProfCat == 6)
    + coef_age_30_less * (age <= 30)
    + coef_male * male
    + coef_haveChildren * haveChildren
    + sigma_s * omega
)

Choice model

ASC_CAR = Beta('ASC_CAR', 0.94, None, None, 0)
ASC_PT = Beta('ASC_PT', 0, None, None, 1)
ASC_SM = Beta('ASC_SM', 0.35, None, None, 0)
BETA_COST_HWH = Beta('BETA_COST_HWH', -2.3, None, None, 0)
BETA_COST_OTHER = Beta('BETA_COST_OTHER', -1.9, None, None, 0)
BETA_DIST = Beta('BETA_DIST', -1.3, None, None, 0)
BETA_TIME_CAR_REF = Beta('BETA_TIME_CAR_REF', -6.1, None, 0, 0)
BETA_TIME_PT_REF = Beta('BETA_TIME_PT_REF', 0, None, 0, 0)
BETA_WAITING_TIME = Beta('BETA_WAITING_TIME', -0.075, None, None, 0)

The coefficient of the latent variable should be initialized to something different from zero. If not, the algorithm may be trapped in a local optimum, and never change the value.

BETA_TIME_PT_AL = Beta('BETA_TIME_PT_AL', 1.5, None, None, 0)
BETA_TIME_PT = BETA_TIME_PT_REF * exp(BETA_TIME_PT_AL * ACTIVELIFE)
BETA_TIME_CAR_AL = Beta('BETA_TIME_CAR_AL', -0.11, None, None, 0)
BETA_TIME_CAR = BETA_TIME_CAR_REF * exp(BETA_TIME_CAR_AL * ACTIVELIFE)

Definition of utility functions:

V0 = (
    ASC_PT
    + BETA_TIME_PT * TimePT_scaled
    + BETA_WAITING_TIME * WaitingTimePT
    + BETA_COST_HWH * MarginalCostPT_scaled * PurpHWH
    + BETA_COST_OTHER * MarginalCostPT_scaled * PurpOther
)

V1 = (
    ASC_CAR
    + BETA_TIME_CAR * TimeCar_scaled
    + BETA_COST_HWH * CostCarCHF_scaled * PurpHWH
    + BETA_COST_OTHER * CostCarCHF_scaled * PurpOther
)

V2 = ASC_SM + BETA_DIST * distance_km_scaled

Associate utility functions with the numbering of alternatives

V = {0: V0, 1: V1, 2: V2}

Conditional on omega, we have a logit model (called the kernel)

condprob = models.logit(V, None, Choice)

We integrate over omega using numerical integration

loglike = log(Integrate(condprob * density, 'omega'))

Read the data

database = read_data()

Create the Biogeme object

the_biogeme = bio.BIOGEME(database, loglike)
the_biogeme.modelName = 'm02_sequential_estimation'
the_biogeme.maxiter = 1000
Biogeme parameters read from biogeme.toml.

If estimation results are saved on file, we read them to speed up the process. If not, we estimate the parameters.

results = read_or_estimate(the_biogeme=the_biogeme, directory='saved_results')
print(results.short_summary())
Results for model m02_sequential_estimation
Nbr of parameters:              11
Sample size:                    1906
Excluded data:                  0
Final log likelihood:           -1205.888
Akaike Information Criterion:   2433.776
Bayesian Information Criterion: 2494.857
print(f'Final log likelihood: {results.data.logLike:.3f}')
print(f'Output file: {results.data.htmlFileName}')
Final log likelihood: -1205.888
Output file: m02_sequential_estimation.html
results.get_estimated_parameters()
Value Rob. Std err Rob. t-test Rob. p-value
ASC_CAR 0.930591 0.126404 7.362020 1.811884e-13
ASC_SM 0.338507 0.158255 2.139000 3.243567e-02
BETA_COST_HWH -2.198955 0.190777 -11.526323 0.000000e+00
BETA_COST_OTHER -1.871430 0.183922 -10.175115 0.000000e+00
BETA_DIST -1.335395 0.042709 -31.267240 0.000000e+00
BETA_TIME_CAR_AL -0.206488 0.213538 -0.966986 3.335508e-01
BETA_TIME_CAR_REF -5.884133 1.034545 -5.687650 1.287992e-08
BETA_TIME_PT_AL 2.021286 3.126650 0.646470 5.179749e-01
BETA_TIME_PT_REF -0.000611 0.002995 -0.203955 8.383888e-01
BETA_WAITING_TIME -0.075037 0.008487 -8.841859 0.000000e+00
sigma_s 3.674751 3.635663 1.010751 3.121354e-01


Total running time of the script: (0 minutes 0.275 seconds)

Gallery generated by Sphinx-Gallery