Note
Go to the end to download the full example code.
8. Box-Cox transformsΒΆ
Example of a logit model, with a Box-Cox transform of variables.
Michel Bierlaire, EPFL Sat Jun 21 2025, 15:14:39
import biogeme.biogeme_logging as blog
from IPython.core.display_functions import display
from biogeme.biogeme import BIOGEME
from biogeme.expressions import Beta
from biogeme.models import boxcox, loglogit
from biogeme.results_processing import (
EstimationResults,
get_pandas_estimated_parameters,
)
See the data processing script: Data preparation for Swissmetro.
from swissmetro_data import (
CAR_AV_SP,
CAR_CO_SCALED,
CAR_TT_SCALED,
CHOICE,
SM_AV,
SM_COST_SCALED,
SM_TT_SCALED,
TRAIN_AV_SP,
TRAIN_COST_SCALED,
TRAIN_TT_SCALED,
database,
)
logger = blog.get_screen_logger(level=blog.INFO)
Parameters to be estimated.
asc_car = Beta('asc_car', 0, None, None, 0)
asc_train = Beta('asc_train', 0, None, None, 0)
asc_sm = Beta('asc_sm', 0, None, None, 1)
b_time = Beta('b_time', 0, None, None, 0)
b_cost = Beta('b_cost', 0, None, None, 0)
boxcox_parameter = Beta('boxcox_parameter', 1, -10, 10, 0)
Definition of the utility functions.
v_train = (
asc_train
+ b_time * boxcox(TRAIN_TT_SCALED, boxcox_parameter)
+ b_cost * TRAIN_COST_SCALED
)
v_swissmetro = (
asc_sm + b_time * boxcox(SM_TT_SCALED, boxcox_parameter) + b_cost * SM_COST_SCALED
)
v_car = (
asc_car + b_time * boxcox(CAR_TT_SCALED, boxcox_parameter) + b_cost * CAR_CO_SCALED
)
Associate utility functions with the numbering of alternatives.
v = {1: v_train, 2: v_swissmetro, 3: v_car}
Associate the availability conditions with the alternatives.
av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}
Definition of the model. This is the contribution of each observation to the log likelihood function.
log_probability = loglogit(v, av, CHOICE)
Create the Biogeme object.
the_biogeme = BIOGEME(database, log_probability)
the_biogeme.model_name = 'b08_boxcox'
Biogeme parameters read from biogeme.toml.
Check the derivatives of the log likelihood function around 0.
the_biogeme.check_derivatives(verbose=True)
Comparing first derivatives
x Gradient FinDiff Difference
asc_train -1541.5 -1541.5 -0.000305176
b_time -1847.02 -1847.02 -0.00024821
boxcox_parameter 0 0 0
b_cost -224.608 -224.608 -0.000240072
asc_car -99 -98.9997 -0.000305176
Comparing second derivatives
Row Col Hessian FinDiff Difference
asc_train asc_train -1536.25 -1536.25 0.00012207
b_time asc_train -867.351 -867.351 3.94016e-05
boxcox_parameter asc_train 0 0 0
b_cost asc_train 154.532 154.532 -5.5525e-06
asc_car asc_train 623 623 -1.14441e-05
asc_train b_time -867.351 -867.351 8.88401e-06
b_time b_time -1680.78 -1680.78 2.50027e-05
boxcox_parameter b_time -550.136 -550.136 1.66652e-05
b_cost b_time 206.643 206.643 2.73722e-05
asc_car b_time -234.378 -234.378 2.79321e-05
asc_train boxcox_parameter 0 0 0
b_time boxcox_parameter -550.136 -550.136 1.98296e-06
boxcox_parameter boxcox_parameter 0 0 0
b_cost boxcox_parameter 0 0 0
asc_car boxcox_parameter 0 0 0
asc_train b_cost 154.532 154.532 -7.45985e-06
b_time b_cost 206.643 206.643 4.48405e-06
boxcox_parameter b_cost 0 0 0
b_cost b_cost -633.137 -633.137 2.91718e-05
asc_car b_cost 113.971 113.971 2.65164e-05
asc_train asc_car 623 623 -6.10352e-05
b_time asc_car -234.378 -234.378 -6.78168e-07
boxcox_parameter asc_car 0 0 0
b_cost asc_car 113.971 113.971 -1.86496e-07
asc_car asc_car -1246 -1246 6.48498e-05
CheckDerivativesResults(function=-6964.662979192191, analytical_gradient=array([-1541.5 , -1847.01666667, 0. , -224.60833333,
-99. ]), analytical_hessian=array([[-1536.25 , -867.35111111, 0. , 154.53194444,
623. ],
[ -867.35111111, -1680.78233889, -550.13567917, 206.64311667,
-234.37777778],
[ 0. , -550.13567917, 0. , 0. ,
0. ],
[ 154.53194444, 206.64311667, 0. , -633.136825 ,
113.97111111],
[ 623. , -234.37777778, 0. , 113.97111111,
-1246. ]]), finite_differences_gradient=array([-1541.49969482, -1847.01641846, 0. , -224.60809326,
-98.99969482]), finite_differences_hessian=array([[-1536.25012207, -867.35112 , 0. , 154.5319519 ,
623.00006104],
[ -867.35115051, -1680.78236389, -550.13568115, 206.64311218,
-234.3777771 ],
[ 0. , -550.13569583, 0. , 0. ,
0. ],
[ 154.53195 , 206.64308929, 0. , -633.13685417,
113.9711113 ],
[ 623.00001144, -234.37780571, 0. , 113.97108459,
-1246.00006485]]), errors_gradient=array([-0.00030518, -0.00024821, 0. , -0.00024007, -0.00030518]), errors_hessian=array([[ 1.22070312e-04, 8.88400598e-06, 0.00000000e+00,
-7.45985247e-06, -6.10351561e-05],
[ 3.94015841e-05, 2.50027126e-05, 1.98296027e-06,
4.48404950e-06, -6.78168362e-07],
[ 0.00000000e+00, 1.66651472e-05, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00],
[-5.55250375e-06, 2.73722331e-05, 0.00000000e+00,
2.91717529e-05, -1.86496266e-07],
[-1.14440917e-05, 2.79320611e-05, 0.00000000e+00,
2.65163846e-05, 6.48498533e-05]]))
Estimate the parameters.
try:
results = EstimationResults.from_yaml_file(
filename=f'saved_results/{the_biogeme.model_name}.yaml'
)
except FileNotFoundError:
results = the_biogeme.estimate()
*** Initial values of the parameters are obtained from the file __b08_boxcox.iter
Cannot read file __b08_boxcox.iter. Statement is ignored.
Starting values for the algorithm: {}
As the model is not too complex, we activate the calculation of second derivatives. To change this behavior, modify the algorithm to "simple_bounds" in the TOML file.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter. asc_train b_time boxcox_paramete b_cost asc_car Function Relgrad Radius Rho
0 -0.71 -1 1.7 -0.82 -0.5 5.6e+03 0.058 1 0.77 +
1 -0.77 -1.7 0.69 -1.3 -0.35 5.4e+03 0.049 10 1 ++
2 -0.47 -1.7 0.53 -1 0.017 5.3e+03 0.002 1e+02 0.96 ++
3 -0.48 -1.7 0.51 -1.1 -0.0043 5.3e+03 1.3e-05 1e+03 1 ++
4 -0.48 -1.7 0.51 -1.1 -0.0043 5.3e+03 1.8e-09 1e+03 1 ++
Optimization algorithm has converged.
Relative gradient: 1.7992210209022422e-09
Cause of termination: Relative gradient = 1.8e-09 <= 6.1e-06
Number of function evaluations: 16
Number of gradient evaluations: 11
Number of hessian evaluations: 5
Algorithm: Newton with trust region for simple bound constraints
Number of iterations: 5
Proportion of Hessian calculation: 5/5 = 100.0%
Optimization time: 0:00:00.071980
Calculate second derivatives and BHHH
File b08_boxcox.html has been generated.
File b08_boxcox.yaml has been generated.
print(results.short_summary())
Results for model b08_boxcox
Nbr of parameters: 5
Sample size: 6768
Excluded data: 3960
Final log likelihood: -5292.095
Akaike Information Criterion: 10594.19
Bayesian Information Criterion: 10628.29
pandas_results = get_pandas_estimated_parameters(estimation_results=results)
display(pandas_results)
Name Value Robust std err. Robust t-stat. Robust p-value
0 asc_train -0.484973 0.064398 -7.530893 5.040413e-14
1 b_time -1.674910 0.076558 -21.877709 0.000000e+00
2 boxcox_parameter 0.510059 0.077305 6.598018 4.166911e-11
3 b_cost -1.078535 0.068008 -15.858881 0.000000e+00
4 asc_car -0.004623 0.048008 -0.096303 9.232798e-01
Total running time of the script: (0 minutes 1.553 seconds)