Box-Cox transformsΒΆ

Example of a logit model, with a Box-Cox transform of variables.

Michel Bierlaire, EPFL Sat Jun 21 2025, 15:14:39

import biogeme.biogeme_logging as blog
from IPython.core.display_functions import display
from biogeme.biogeme import BIOGEME
from biogeme.expressions import Beta
from biogeme.models import boxcox, loglogit
from biogeme.results_processing import get_pandas_estimated_parameters

See the data processing script: Data preparation for Swissmetro.

from swissmetro_data import (
    CAR_AV_SP,
    CAR_CO_SCALED,
    CAR_TT_SCALED,
    CHOICE,
    SM_AV,
    SM_COST_SCALED,
    SM_TT_SCALED,
    TRAIN_AV_SP,
    TRAIN_COST_SCALED,
    TRAIN_TT_SCALED,
    database,
)

logger = blog.get_screen_logger(level=blog.INFO)

Parameters to be estimated.

asc_car = Beta('asc_car', 0, None, None, 0)
asc_train = Beta('asc_train', 0, None, None, 0)
asc_sm = Beta('asc_sm', 0, None, None, 1)
b_time = Beta('b_time', 0, None, None, 0)
b_cost = Beta('b_cost', 0, None, None, 0)
boxcox_parameter = Beta('boxcox_parameter', 0, -10, 10, 0)

Definition of the utility functions.

v_train = (
    asc_train
    + b_time * boxcox(TRAIN_TT_SCALED, boxcox_parameter)
    + b_cost * TRAIN_COST_SCALED
)
v_swissmetro = (
    asc_sm + b_time * boxcox(SM_TT_SCALED, boxcox_parameter) + b_cost * SM_COST_SCALED
)
v_car = (
    asc_car + b_time * boxcox(CAR_TT_SCALED, boxcox_parameter) + b_cost * CAR_CO_SCALED
)

Associate utility functions with the numbering of alternatives.

v = {1: v_train, 2: v_swissmetro, 3: v_car}

Associate the availability conditions with the alternatives.

av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}

Definition of the model. This is the contribution of each observation to the log likelihood function.

log_probability = loglogit(v, av, CHOICE)

Create the Biogeme object.

the_biogeme = BIOGEME(database, log_probability)
the_biogeme.model_name = 'b08boxcox'
Biogeme parameters read from biogeme.toml.

Check the derivatives of the log likelihood function around 0.

the_biogeme.check_derivatives(verbose=True)
Comparing first derivatives
x                   Gradient     FinDiff    Difference
asc_train          -1541.5    -1541.5     -0.000305176
b_time             -1510.7    -1510.7     -0.000205049
boxcox_parameter       0          0        0
b_cost              -224.608   -224.608   -0.000240072
asc_car              -99        -98.9997  -0.000305176
Comparing second derivatives
Row               Col                 Hessian    FinDiff    Difference
asc_train         asc_train         -1536.25   -1536.25    0.00012207
b_time            asc_train          -754.548   -754.548   1.08605e-05
boxcox_parameter  asc_train             0          0       0
b_cost            asc_train           154.532    154.532  -7.45985e-06
asc_car           asc_train           623        623      -1.14441e-05
asc_train         b_time             -754.548   -754.548  -4.39833e-06
b_time            b_time             -896.937   -896.937   3.46064e-05
boxcox_parameter  b_time             -290.968   -290.968   1.14477e-06
b_cost            b_time              164.988    164.988   2.71933e-05
asc_car           b_time             -216.893   -216.893   3.31385e-05
asc_train         boxcox_parameter      0          0       0
b_time            boxcox_parameter   -290.968   -290.968   2.22989e-05
boxcox_parameter  boxcox_parameter      0          0       0
b_cost            boxcox_parameter      0          0       0
asc_car           boxcox_parameter      0          0       0
asc_train         b_cost              154.532    154.532  -7.45985e-06
b_time            b_cost              164.988    164.988   1.57492e-05
boxcox_parameter  b_cost                0          0       0
b_cost            b_cost             -633.137   -633.137   3.10791e-05
asc_car           b_cost              113.971    113.971   2.65164e-05
asc_train         asc_car             623        623      -6.10352e-05
b_time            asc_car            -216.893   -216.893   1.21577e-05
boxcox_parameter  asc_car               0          0       0
b_cost            asc_car             113.971    113.971  -1.86496e-07
asc_car           asc_car           -1246      -1246       6.48498e-05

CheckDerivativesResults(function=-6964.662979192191, analytical_gradient=array([-1541.5       , -1510.70259763,     0.        ,  -224.60833333,
         -99.        ]), analytical_hessian=array([[-1536.25      ,  -754.54814588,     0.        ,   154.53194444,
          623.        ],
       [ -754.54814588,  -896.93691608,  -290.96756803,   164.98825122,
         -216.89261174],
       [    0.        ,  -290.96756803,     0.        ,     0.        ,
            0.        ],
       [  154.53194444,   164.98825122,     0.        ,  -633.136825  ,
          113.97111111],
       [  623.        ,  -216.89261174,     0.        ,   113.97111111,
        -1246.        ]]), finite_differences_gradient=array([-1541.49969482, -1510.70239258,     0.        ,  -224.60809326,
         -98.99969482]), finite_differences_hessian=array([[-1536.25012207,  -754.54814148,     0.        ,   154.5319519 ,
          623.00006104],
       [ -754.54815674,  -896.93695068,  -290.96759033,   164.98823547,
         -216.8926239 ],
       [    0.        ,  -290.96756918,     0.        ,     0.        ,
            0.        ],
       [  154.5319519 ,   164.98822403,     0.        ,  -633.13685608,
          113.9711113 ],
       [  623.00001144,  -216.89264488,     0.        ,   113.97108459,
        -1246.00006485]]), errors_gradient=array([-0.00030518, -0.00020505,  0.        , -0.00024007, -0.00030518]), errors_hessian=array([[ 1.22070312e-04, -4.39833343e-06,  0.00000000e+00,
        -7.45985247e-06, -6.10351561e-05],
       [ 1.08604556e-05,  3.46063689e-05,  2.22988965e-05,
         1.57491812e-05,  1.21576558e-05],
       [ 0.00000000e+00,  1.14477086e-06,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00],
       [-7.45985244e-06,  2.71932730e-05,  0.00000000e+00,
         3.10791016e-05, -1.86496266e-07],
       [-1.14440917e-05,  3.31384907e-05,  0.00000000e+00,
         2.65163846e-05,  6.48498533e-05]]))

Estimate the parameters

results = the_biogeme.estimate()
*** Initial values of the parameters are obtained from the file __b08boxcox.iter
Parameter values restored from __b08boxcox.iter
Starting values for the algorithm: {'asc_train': -0.48497367640187233, 'b_time': -1.6749085081534583, 'boxcox_parameter': 0.5100594662517955, 'b_cost': -1.0785343734087758, 'asc_car': -0.004623694779428385}
As the model is not too complex, we activate the calculation of second derivatives. To change this behavior, modify the algorithm to "simple_bounds" in the TOML file.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Optimization algorithm has converged.
Relative gradient: 6.882377013481846e-08
Cause of termination: Relative gradient = 6.9e-08 <= 6.1e-06
Number of function evaluations: 1
Number of gradient evaluations: 1
Number of hessian evaluations: 0
Algorithm: Newton with trust region for simple bound constraints
Number of iterations: 0
Optimization time: 0:00:00.005751
Calculate second derivatives and BHHH
File b08boxcox~00.html has been generated.
File b08boxcox~00.yaml has been generated.
print(results.short_summary())
Results for model b08boxcox
Nbr of parameters:              5
Sample size:                    6768
Excluded data:                  3960
Final log likelihood:           -5292.095
Akaike Information Criterion:   10594.19
Bayesian Information Criterion: 10628.29
pandas_results = get_pandas_estimated_parameters(estimation_results=results)
display(pandas_results)
               Name     Value  Robust std err.  Robust t-stat.  Robust p-value
0         asc_train -0.484974         0.064398       -7.530904    5.040413e-14
1            b_time -1.674909         0.076558      -21.877701    0.000000e+00
2  boxcox_parameter  0.510059         0.077305        6.598023    4.166778e-11
3            b_cost -1.078534         0.068008      -15.858881    0.000000e+00
4           asc_car -0.004624         0.048008       -0.096310    9.232741e-01

Total running time of the script: (0 minutes 3.587 seconds)

Gallery generated by Sphinx-Gallery