Note
Go to the end to download the full example code.
Box-Cox transforms
Example of a logit model, with a Box-Cox transform of variables.
- author:
Michel Bierlaire, EPFL
- date:
Sun Apr 9 17:58:15 2023
import biogeme.biogeme as bio
from biogeme import models
from biogeme.expressions import Beta
See the data processing script: Data preparation for Swissmetro.
from swissmetro_data import (
database,
CHOICE,
SM_AV,
CAR_AV_SP,
TRAIN_AV_SP,
TRAIN_TT_SCALED,
TRAIN_COST_SCALED,
SM_TT_SCALED,
SM_COST_SCALED,
CAR_TT_SCALED,
CAR_CO_SCALED,
)
Parameters to be estimated.
ASC_CAR = Beta('ASC_CAR', 0, None, None, 0)
ASC_TRAIN = Beta('ASC_TRAIN', 0, None, None, 0)
ASC_SM = Beta('ASC_SM', 0, None, None, 1)
B_TIME = Beta('B_TIME', 0, None, None, 0)
B_COST = Beta('B_COST', 0, None, None, 0)
LAMBDA = Beta('LAMBDA', 0, None, None, 0)
Definition of the utility functions.
V1 = (
ASC_TRAIN
+ B_TIME * models.boxcox(TRAIN_TT_SCALED, LAMBDA)
+ B_COST * TRAIN_COST_SCALED
)
V2 = ASC_SM + B_TIME * models.boxcox(SM_TT_SCALED, LAMBDA) + B_COST * SM_COST_SCALED
V3 = ASC_CAR + B_TIME * models.boxcox(CAR_TT_SCALED, LAMBDA) + B_COST * CAR_CO_SCALED
Associate utility functions with the numbering of alternatives.
V = {1: V1, 2: V2, 3: V3}
Associate the availability conditions with the alternatives.
av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}
Definition of the model. This is the contribution of each observation to the log likelihood function.
logprob = models.loglogit(V, av, CHOICE)
Create the Biogeme object.
the_biogeme = bio.BIOGEME(database, logprob)
the_biogeme.modelName = 'b08boxcox'
Check the derivatives of the log likelihood function around 0.
the_biogeme.check_derivatives(beta=[0, 0, 0, 0, 0], verbose=True)
(-6964.6629791922205, array([ -99. , -1541.5 , -224.60833333, -1510.70259763,
0. ]), array([[-1246. , 623. , 113.97111111, -216.89261174,
0. ],
[ 623. , -1536.25 , 154.53194444, -754.54814588,
0. ],
[ 113.97111111, 154.53194444, -633.136825 , 164.98825122,
0. ],
[ -216.89261174, -754.54814588, 164.98825122, -896.93691608,
-290.96756803],
[ 0. , 0. , 0. , -290.96756803,
0. ]]), array([-0.00035575, -0.00045858, -0.00030395, -0.00027485, 0. ]), array([[-2.33821161e-05, 8.70677854e-06, 1.87785348e-05,
1.28155432e-06, 0.00000000e+00],
[-1.78165961e-04, 2.44009268e-05, -3.65803303e-05,
-3.42056321e-05, 0.00000000e+00],
[ 5.42033138e-06, -7.02175248e-06, -6.80464268e-07,
1.17407737e-06, 0.00000000e+00],
[ 3.55529107e-06, 8.99536622e-06, 6.05643180e-07,
-5.55560746e-06, 2.10474028e-05],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
7.68242649e-06, 0.00000000e+00]]))
Estimate the parameters
results = the_biogeme.estimate()
print(results.short_summary())
Results for model b08boxcox
Nbr of parameters: 5
Sample size: 6768
Excluded data: 3960
Final log likelihood: -5292.095
Akaike Information Criterion: 10594.19
Bayesian Information Criterion: 10628.29
pandas_results = results.get_estimated_parameters()
pandas_results
Total running time of the script: (0 minutes 0.473 seconds)