Measurement equations: discrete indicators

Ordered probit.

author:: Michel Bierlaire, EPFL
date:: Fri Apr 14 09:39:10 2023

import biogeme.biogeme_logging as blog
import biogeme.biogeme as bio
from biogeme.expressions import (
    Beta,
    log,
    Elem,
    bioNormalCdf,
    Variable,
    bioMultSum,
)

from biogeme.data.optima import (
    read_data,
    male,
    age,
    haveChildren,
    highEducation,
    childCenter,
    childSuburb,
    SocioProfCat,
)

logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example m01_latent_variable.py')

Example m01_latent_variable.py

Parameters for the structural equation

coef_intercept = Beta('coef_intercept', 0.0, None, None, 0)
coef_age_30_less = Beta('coef_age_30_less', 0.0, None, None, 0)
coef_male = Beta('coef_male', 0.0, None, None, 0)
coef_haveChildren = Beta('coef_haveChildren', 0.0, None, None, 0)
coef_highEducation = Beta('coef_highEducation', 0.0, None, None, 0)
coef_artisans = Beta('coef_artisans', 0.0, None, None, 0)
coef_employees = Beta('coef_employees', 0.0, None, None, 0)
coef_child_center = Beta('coef_child_center', 0.0, None, None, 0)
coef_child_suburb = Beta('coef_child_suburb', 0.0, None, None, 0)

Latent variable: structural equation

ACTIVELIFE = (
    coef_intercept
    + coef_child_center * childCenter
    + coef_child_suburb * childSuburb
    + coef_highEducation * highEducation
    + coef_artisans * (SocioProfCat == 5)
    + coef_employees * (SocioProfCat == 6)
    + coef_age_30_less * (age <= 30)
    + coef_male * male
    + coef_haveChildren * haveChildren
)

Measurement equations

indicators = [
    'ResidCh01',
    'ResidCh04',
    'ResidCh05',
    'ResidCh06',
    'LifSty07',
    'LifSty10',
]

We define the intercept parameters. The first one is normalized to 0.

inter = {k: Beta(f'inter_{k}', 0, None, None, 0) for k in indicators[1:]}
inter[indicators[0]] = Beta(f'INTER_{indicators[0]}', 0, None, None, 1)

We define the coefficients. The first one is normalized to 1.

coefficients = {k: Beta(f'coeff_{k}', 0, None, None, 0) for k in indicators[1:]}
coefficients[indicators[0]] = Beta(f'B_{indicators[0]}', 1, None, None, 1)

We define the measurement equations for each indicator

models = {k: inter[k] + coefficients[k] * ACTIVELIFE for k in indicators}

We define the scale parameters of the error terms.

sigma_star = {k: Beta(f'sigma_star_{k}', 1, 1.0e-5, None, 0) for k in indicators[1:]}
sigma_star[indicators[0]] = Beta(f'sigma_star_{indicators[0]}', 1, None, None, 1)

Symmetric threshold.

delta_1 = Beta('delta_1', 0.1, 1.0e-5, None, 0)
delta_2 = Beta('delta_2', 0.2, 1.0e-5, None, 0)
tau_1 = -delta_1 - delta_2
tau_2 = -delta_1
tau_3 = delta_1
tau_4 = delta_1 + delta_2

Ordered probit models.

tau_1_residual = {k: (tau_1 - models[k]) / sigma_star[k] for k in indicators}
tau_2_residual = {k: (tau_2 - models[k]) / sigma_star[k] for k in indicators}
tau_3_residual = {k: (tau_3 - models[k]) / sigma_star[k] for k in indicators}
tau_4_residual = {k: (tau_4 - models[k]) / sigma_star[k] for k in indicators}
dict_prob_indicators = {
    k: {
        1: bioNormalCdf(tau_1_residual[k]),
        2: bioNormalCdf(tau_2_residual[k]) - bioNormalCdf(tau_1_residual[k]),
        3: bioNormalCdf(tau_3_residual[k]) - bioNormalCdf(tau_2_residual[k]),
        4: bioNormalCdf(tau_4_residual[k]) - bioNormalCdf(tau_3_residual[k]),
        5: 1 - bioNormalCdf(tau_4_residual[k]),
        6: 1.0,
        -1: 1.0,
        -2: 1.0,
    }
    for k in indicators
}

log_proba = {k: log(Elem(dict_prob_indicators[k], Variable(k))) for k in indicators}
loglike = bioMultSum(log_proba)

Read the data

database = read_data()

Create the Biogeme object

biogeme = bio.BIOGEME(database, loglike)
biogeme.modelName = 'm01_latent_variable'

Default values of the Biogeme parameters are used.
File biogeme.toml has been created

Estimate the parameters

results = biogeme.estimate()

As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __m01_latent_variable.iter
Cannot read file __m01_latent_variable.iter. Statement is ignored.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.     Function    Relgrad   Radius      Rho
    2.2e+04        1.2      0.5        0    -
    1.6e+04       0.38        5      1.1   ++
    1.6e+04       0.38      2.5      1.1    -
    1.6e+04       0.38      1.2      1.1    -
    1.6e+04       0.38     0.62      1.1    -
    1.6e+04       0.38     0.31      1.1    -
    1.5e+04       0.12     0.31     0.67    +
    1.5e+04      0.055      3.1      1.2   ++
    1.5e+04      0.055      1.6      -20    -
    1.5e+04      0.055     0.78     -1.5    -
    1.4e+04      0.025     0.78      0.9    +
    1.4e+04      0.009     0.78      0.6    +
    1.4e+04      0.017     0.78     0.47    +
    1.4e+04      0.017     0.39     -0.3    -
    1.4e+04      0.011     0.39     0.46    +
    1.4e+04    0.00068      3.9     0.99   ++
    1.4e+04    0.00068        2 -1.2e+02    -
    1.4e+04    0.00068     0.98      -11    -
    1.4e+04    0.00068     0.49    -0.79    -
    1.4e+04    0.00061     0.49     0.72    +
    1.4e+04    2.3e-05     0.49        1    +
Results saved in file m01_latent_variable.html
Results saved in file m01_latent_variable.pickle

results.get_estimated_parameters()

	Value	Rob. Std err	Rob. t-test	Rob. p-value
coef_age_30_less	0.411378	0.148961	2.761647	0.005751
coef_artisans	-0.104168	0.059503	-1.750638	0.080008
coef_child_center	0.188734	0.050024	3.772861	0.000161
coef_child_suburb	0.114693	0.037611	3.049471	0.002292
coef_employees	-0.048415	0.032817	-1.475281	0.140137
coef_haveChildren	-0.054407	0.027774	-1.958916	0.050123
coef_highEducation	-0.066652	0.043719	-1.524551	0.127371
coef_intercept	-0.550604	0.050882	-10.821125	0.000000
coef_male	0.114986	0.051108	2.249873	0.024457
coeff_LifSty07	1.141445	0.361607	3.156588	0.001596
coeff_LifSty10	0.690129	0.241688	2.855456	0.004298
coeff_ResidCh04	0.243809	0.206352	1.181520	0.237396
coeff_ResidCh05	2.274781	0.613173	3.709854	0.000207
coeff_ResidCh06	1.676242	0.727597	2.303804	0.021234
delta_1	0.484421	0.012086	40.080263	0.000000
delta_2	1.017231	0.024393	41.701887	0.000000
inter_LifSty07	-0.416229	0.171191	-2.431372	0.015042
inter_LifSty10	0.019140	0.113146	0.169159	0.865672
inter_ResidCh04	0.185378	0.096960	1.911893	0.055890
inter_ResidCh05	-0.810642	0.287186	-2.822704	0.004762
inter_ResidCh06	0.168830	0.336571	0.501618	0.615937
sigma_star_LifSty07	1.120757	0.032652	34.324640	0.000000
sigma_star_LifSty10	0.975216	0.026142	37.304279	0.000000
sigma_star_ResidCh04	0.939158	0.024710	38.007332	0.000000
sigma_star_ResidCh05	1.369354	0.052638	26.014470	0.000000
sigma_star_ResidCh06	1.361454	0.038189	35.649960	0.000000

Total running time of the script: (0 minutes 1.273 seconds)

Gallery generated by Sphinx-Gallery