Note
Go to the end to download the full example code.
Box-Cox transformsΒΆ
Example of a logit model, with a Box-Cox transform of variables.
Michel Bierlaire, EPFL Sat Jun 21 2025, 15:14:39
import biogeme.biogeme_logging as blog
from IPython.core.display_functions import display
from biogeme.biogeme import BIOGEME
from biogeme.expressions import Beta
from biogeme.models import boxcox, loglogit
from biogeme.results_processing import get_pandas_estimated_parameters
See the data processing script: Data preparation for Swissmetro.
from swissmetro_data import (
CAR_AV_SP,
CAR_CO_SCALED,
CAR_TT_SCALED,
CHOICE,
SM_AV,
SM_COST_SCALED,
SM_TT_SCALED,
TRAIN_AV_SP,
TRAIN_COST_SCALED,
TRAIN_TT_SCALED,
database,
)
logger = blog.get_screen_logger(level=blog.INFO)
Parameters to be estimated.
asc_car = Beta('asc_car', 0, None, None, 0)
asc_train = Beta('asc_train', 0, None, None, 0)
asc_sm = Beta('asc_sm', 0, None, None, 1)
b_time = Beta('b_time', 0, None, None, 0)
b_cost = Beta('b_cost', 0, None, None, 0)
boxcox_parameter = Beta('boxcox_parameter', 0, -10, 10, 0)
Definition of the utility functions.
v_train = (
asc_train
+ b_time * boxcox(TRAIN_TT_SCALED, boxcox_parameter)
+ b_cost * TRAIN_COST_SCALED
)
v_swissmetro = (
asc_sm + b_time * boxcox(SM_TT_SCALED, boxcox_parameter) + b_cost * SM_COST_SCALED
)
v_car = (
asc_car + b_time * boxcox(CAR_TT_SCALED, boxcox_parameter) + b_cost * CAR_CO_SCALED
)
Associate utility functions with the numbering of alternatives.
v = {1: v_train, 2: v_swissmetro, 3: v_car}
Associate the availability conditions with the alternatives.
av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}
Definition of the model. This is the contribution of each observation to the log likelihood function.
log_probability = loglogit(v, av, CHOICE)
Create the Biogeme object.
the_biogeme = BIOGEME(database, log_probability)
the_biogeme.model_name = 'b08boxcox'
Biogeme parameters read from biogeme.toml.
Check the derivatives of the log likelihood function around 0.
the_biogeme.check_derivatives(verbose=True)
Comparing first derivatives
x Gradient FinDiff Difference
asc_train -1541.5 -1541.5 -0.000305176
b_time -1510.7 -1510.7 -0.000205049
boxcox_parameter 0 0 0
b_cost -224.608 -224.608 -0.000240072
asc_car -99 -98.9997 -0.000305176
Comparing second derivatives
Row Col Hessian FinDiff Difference
asc_train asc_train -1536.25 -1536.25 0.00012207
b_time asc_train -754.548 -754.548 1.08605e-05
boxcox_parameter asc_train 0 0 0
b_cost asc_train 154.532 154.532 -7.45985e-06
asc_car asc_train 623 623 -1.14441e-05
asc_train b_time -754.548 -754.548 -4.39833e-06
b_time b_time -896.937 -896.937 3.46064e-05
boxcox_parameter b_time -290.968 -290.968 1.14477e-06
b_cost b_time 164.988 164.988 2.71933e-05
asc_car b_time -216.893 -216.893 3.31385e-05
asc_train boxcox_parameter 0 0 0
b_time boxcox_parameter -290.968 -290.968 2.22989e-05
boxcox_parameter boxcox_parameter 0 0 0
b_cost boxcox_parameter 0 0 0
asc_car boxcox_parameter 0 0 0
asc_train b_cost 154.532 154.532 -7.45985e-06
b_time b_cost 164.988 164.988 1.57492e-05
boxcox_parameter b_cost 0 0 0
b_cost b_cost -633.137 -633.137 3.10791e-05
asc_car b_cost 113.971 113.971 2.65164e-05
asc_train asc_car 623 623 -6.10352e-05
b_time asc_car -216.893 -216.893 1.21577e-05
boxcox_parameter asc_car 0 0 0
b_cost asc_car 113.971 113.971 -1.86496e-07
asc_car asc_car -1246 -1246 6.48498e-05
CheckDerivativesResults(function=-6964.662979192191, analytical_gradient=array([-1541.5 , -1510.70259763, 0. , -224.60833333,
-99. ]), analytical_hessian=array([[-1536.25 , -754.54814588, 0. , 154.53194444,
623. ],
[ -754.54814588, -896.93691608, -290.96756803, 164.98825122,
-216.89261174],
[ 0. , -290.96756803, 0. , 0. ,
0. ],
[ 154.53194444, 164.98825122, 0. , -633.136825 ,
113.97111111],
[ 623. , -216.89261174, 0. , 113.97111111,
-1246. ]]), finite_differences_gradient=array([-1541.49969482, -1510.70239258, 0. , -224.60809326,
-98.99969482]), finite_differences_hessian=array([[-1536.25012207, -754.54814148, 0. , 154.5319519 ,
623.00006104],
[ -754.54815674, -896.93695068, -290.96759033, 164.98823547,
-216.8926239 ],
[ 0. , -290.96756918, 0. , 0. ,
0. ],
[ 154.5319519 , 164.98822403, 0. , -633.13685608,
113.9711113 ],
[ 623.00001144, -216.89264488, 0. , 113.97108459,
-1246.00006485]]), errors_gradient=array([-0.00030518, -0.00020505, 0. , -0.00024007, -0.00030518]), errors_hessian=array([[ 1.22070312e-04, -4.39833343e-06, 0.00000000e+00,
-7.45985247e-06, -6.10351561e-05],
[ 1.08604556e-05, 3.46063689e-05, 2.22988965e-05,
1.57491812e-05, 1.21576558e-05],
[ 0.00000000e+00, 1.14477086e-06, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00],
[-7.45985244e-06, 2.71932730e-05, 0.00000000e+00,
3.10791016e-05, -1.86496266e-07],
[-1.14440917e-05, 3.31384907e-05, 0.00000000e+00,
2.65163846e-05, 6.48498533e-05]]))
Estimate the parameters
results = the_biogeme.estimate()
*** Initial values of the parameters are obtained from the file __b08boxcox.iter
Parameter values restored from __b08boxcox.iter
Starting values for the algorithm: {'asc_train': -0.48497367640187233, 'b_time': -1.6749085081534583, 'boxcox_parameter': 0.5100594662517955, 'b_cost': -1.0785343734087758, 'asc_car': -0.004623694779428385}
As the model is not too complex, we activate the calculation of second derivatives. To change this behavior, modify the algorithm to "simple_bounds" in the TOML file.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Optimization algorithm has converged.
Relative gradient: 6.882377013481846e-08
Cause of termination: Relative gradient = 6.9e-08 <= 6.1e-06
Number of function evaluations: 1
Number of gradient evaluations: 1
Number of hessian evaluations: 0
Algorithm: Newton with trust region for simple bound constraints
Number of iterations: 0
Optimization time: 0:00:00.005751
Calculate second derivatives and BHHH
File b08boxcox~00.html has been generated.
File b08boxcox~00.yaml has been generated.
print(results.short_summary())
Results for model b08boxcox
Nbr of parameters: 5
Sample size: 6768
Excluded data: 3960
Final log likelihood: -5292.095
Akaike Information Criterion: 10594.19
Bayesian Information Criterion: 10628.29
pandas_results = get_pandas_estimated_parameters(estimation_results=results)
display(pandas_results)
Name Value Robust std err. Robust t-stat. Robust p-value
0 asc_train -0.484974 0.064398 -7.530904 5.040413e-14
1 b_time -1.674909 0.076558 -21.877701 0.000000e+00
2 boxcox_parameter 0.510059 0.077305 6.598023 4.166778e-11
3 b_cost -1.078534 0.068008 -15.858881 0.000000e+00
4 asc_car -0.004624 0.048008 -0.096310 9.232741e-01
Total running time of the script: (0 minutes 3.587 seconds)