Box-Cox transforms

Example of a logit model, with a Box-Cox transform of variables.

author:

Michel Bierlaire, EPFL

date:

Sun Apr 9 17:58:15 2023

import biogeme.biogeme as bio
from biogeme import models
from biogeme.expressions import Beta

See the data processing script: Data preparation for Swissmetro.

from swissmetro_data import (
    database,
    CHOICE,
    SM_AV,
    CAR_AV_SP,
    TRAIN_AV_SP,
    TRAIN_TT_SCALED,
    TRAIN_COST_SCALED,
    SM_TT_SCALED,
    SM_COST_SCALED,
    CAR_TT_SCALED,
    CAR_CO_SCALED,
)

Parameters to be estimated.

ASC_CAR = Beta('ASC_CAR', 0, None, None, 0)
ASC_TRAIN = Beta('ASC_TRAIN', 0, None, None, 0)
ASC_SM = Beta('ASC_SM', 0, None, None, 1)
B_TIME = Beta('B_TIME', 0, None, None, 0)
B_COST = Beta('B_COST', 0, None, None, 0)
LAMBDA = Beta('LAMBDA', 0, None, None, 0)

Definition of the utility functions.

V1 = (
    ASC_TRAIN
    + B_TIME * models.boxcox(TRAIN_TT_SCALED, LAMBDA)
    + B_COST * TRAIN_COST_SCALED
)
V2 = ASC_SM + B_TIME * models.boxcox(SM_TT_SCALED, LAMBDA) + B_COST * SM_COST_SCALED
V3 = ASC_CAR + B_TIME * models.boxcox(CAR_TT_SCALED, LAMBDA) + B_COST * CAR_CO_SCALED

Associate utility functions with the numbering of alternatives.

V = {1: V1, 2: V2, 3: V3}

Associate the availability conditions with the alternatives.

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

Definition of the model. This is the contribution of each observation to the log likelihood function.

logprob = models.loglogit(V, av, CHOICE)

Create the Biogeme object.

the_biogeme = bio.BIOGEME(database, logprob)
the_biogeme.modelName = 'b08boxcox'

Check the derivatives of the log liklelihood function around 0.

the_biogeme.checkDerivatives(beta=[0, 0, 0, 0, 0], verbose=True)
(-6964.6629791922205, array([  -99.        , -1541.5       ,  -224.60833333, -1510.70259763,
           0.        ]), array([[-1246.        ,   623.        ,   113.97111111,  -216.89261174,
            0.        ],
       [  623.        , -1536.25      ,   154.53194444,  -754.54814588,
            0.        ],
       [  113.97111111,   154.53194444,  -633.136825  ,   164.98825122,
            0.        ],
       [ -216.89261174,  -754.54814588,   164.98825122,  -896.93691608,
         -290.96756803],
       [    0.        ,     0.        ,     0.        ,  -290.96756803,
            0.        ]]), array([-0.00035575, -0.00045858, -0.00030395, -0.00027485,  0.        ]), array([[-2.33821161e-05,  8.70677854e-06,  1.87785348e-05,
         1.28155432e-06,  0.00000000e+00],
       [-1.78165961e-04,  2.44009268e-05, -3.65803303e-05,
        -3.42056321e-05,  0.00000000e+00],
       [ 5.42033138e-06, -7.02175248e-06, -6.80464154e-07,
         1.17407737e-06,  0.00000000e+00],
       [ 3.55529107e-06,  8.99536622e-06,  6.05643180e-07,
        -5.55560746e-06,  2.10474028e-05],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         7.68242649e-06,  0.00000000e+00]]))

Estimate the parameters

results = the_biogeme.estimate()
print(results.short_summary())
Results for model b08boxcox
Nbr of parameters:              5
Sample size:                    6768
Excluded data:                  3960
Final log likelihood:           -5292.095
Akaike Information Criterion:   10594.19
Bayesian Information Criterion: 10628.29
pandas_results = results.getEstimatedParameters()
pandas_results
Value Rob. Std err Rob. t-test Rob. p-value
ASC_CAR -0.004624 0.048008 -0.096310 9.232741e-01
ASC_TRAIN -0.484974 0.064398 -7.530904 5.040413e-14
B_COST -1.078534 0.068008 -15.858881 0.000000e+00
B_TIME -1.674909 0.076558 -21.877701 0.000000e+00
LAMBDA 0.510059 0.077305 6.598023 4.166778e-11


Total running time of the script: (0 minutes 0.482 seconds)

Gallery generated by Sphinx-Gallery