Re-estimate the Pareto optimal models

The assisted specification algorithm generates a file containing the pareto optimal specification. This script is designed to re-estimate the Pareto optimal models. The catalog of specifications is defined in Specification of a catalog of models .

author:: Michel Bierlaire, EPFL
date:: Wed Apr 12 17:46:14 2023

import biogeme.biogeme_logging as blog

try:
    import matplotlib.pyplot as plt

    can_plot = True
except ModuleNotFoundError:
    can_plot = False
from biogeme_optimization.exceptions import OptimizationError
from biogeme.assisted import ParetoPostProcessing
from biogeme.results import compile_estimation_results
from plot_b21multiple_models_spec import the_biogeme

PARETO_FILE_NAME = 'saved_results/b21multiple_models.pareto'

logger = blog.get_screen_logger(blog.INFO)
logger.info('Example b21process_pareto.py')

CSV_FILE = 'b21process_pareto.csv'
SEP_CSV = ','

Example b21process_pareto.py

The constructor of the Pareto post processing object takes two arguments:

the biogeme object,

the name of the file where the algorithm has stored the estimated models.

the_pareto_post = ParetoPostProcessing(
    biogeme_object=the_biogeme,
    pareto_file_name=PARETO_FILE_NAME,
)

Pareto set initialized from file with 36 elements [8 Pareto] and 0 invalid elements.

the_pareto_post.log_statistics()

Pareto: 8
Considered: 36
Removed: 4

Complete re-estimation of the best models, including the calculation of the statistics.

all_results = the_pareto_post.reestimate(recycle=False)

Biogeme parameters provided by the user.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __b21multiple_models_000000.iter
Parameter values restored from __b21multiple_models_000000.iter
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.         ASC_CAR      ASC_CAR_GA       ASC_TRAIN    ASC_TRAIN_GA          B_COST          B_TIME     lambda_time     Function    Relgrad   Radius      Rho
    0          -0.062           -0.34           -0.94             1.9            -1.1            -1.7            0.36        5e+03     0.0063       10        1   ++
    1          -0.064           -0.31              -1               2            -1.1            -1.7            0.38        5e+03    0.00022    1e+02        1   ++
    2          -0.064           -0.31              -1               2            -1.1            -1.7            0.38        5e+03    2.2e-07    1e+02        1   ++
Results saved in file b21multiple_models_000000~00.html
Results saved in file b21multiple_models_000000~00.pickle
Biogeme parameters provided by the user.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __b21multiple_models_000001.iter
Parameter values restored from __b21multiple_models_000001.iter
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.         ASC_CAR      ASC_CAR_GA    ASC_CAR_male       ASC_TRAIN    ASC_TRAIN_GA  ASC_TRAIN_male          B_COST          B_TIME     lambda_time     Function    Relgrad   Radius      Rho
    0           -0.42           -0.45            0.41           -0.22               2            -1.1            -1.1            -1.7            0.34      4.9e+03     0.0002       10        1   ++
    1           -0.42           -0.45            0.41           -0.22               2            -1.1            -1.1            -1.7            0.34      4.9e+03    1.9e-07       10        1   ++
Results saved in file b21multiple_models_000001~00.html
Results saved in file b21multiple_models_000001~00.pickle
Biogeme parameters provided by the user.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __b21multiple_models_000002.iter
Parameter values restored from __b21multiple_models_000002.iter
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Results saved in file b21multiple_models_000002~00.html
Results saved in file b21multiple_models_000002~00.pickle
Biogeme parameters provided by the user.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __b21multiple_models_000003.iter
Parameter values restored from __b21multiple_models_000003.iter
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.         ASC_CAR      ASC_CAR_GA    ASC_CAR_male       ASC_TRAIN    ASC_TRAIN_GA  ASC_TRAIN_male          B_COST       B_COST_GA          B_TIME     lambda_time     Function    Relgrad   Radius      Rho
    0            -0.4            -0.8            0.39           -0.25             1.9            -1.1              -1            0.89            -1.6            0.39      4.9e+03      0.012       10        1   ++
    1           -0.42              -1            0.41           -0.22               2            -1.2            -1.1            0.92            -1.7            0.33      4.9e+03     0.0004    1e+02        1   ++
    2           -0.42              -1            0.41           -0.22               2            -1.2            -1.1            0.92            -1.7            0.33      4.9e+03    2.5e-06    1e+02        1   ++
Results saved in file b21multiple_models_000003~00.html
Results saved in file b21multiple_models_000003~00.pickle
Biogeme parameters provided by the user.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __b21multiple_models_000004.iter
Parameter values restored from __b21multiple_models_000004.iter
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.         ASC_CAR      ASC_CAR_GA    ASC_CAR_male       ASC_TRAIN    ASC_TRAIN_GA  ASC_TRAIN_male          B_COST B_COST_inc-100+ B_COST_inc-50-1 B_COST_inc-unde B_COST_inc-unkn          B_TIME     lambda_time     Function    Relgrad   Radius      Rho
    0           -0.46           -0.32            0.45           -0.28               2            -1.1            -1.5            0.58             0.2           -0.62            0.79            -1.7            0.33      4.9e+03     0.0021       10        1   ++
    1           -0.46           -0.32            0.45           -0.28               2            -1.1            -1.5            0.58             0.2           -0.62            0.79            -1.7            0.33      4.9e+03    3.5e-05       10        1   ++
Results saved in file b21multiple_models_000004~00.html
Results saved in file b21multiple_models_000004~00.pickle
Biogeme parameters provided by the user.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __b21multiple_models_000005.iter
Parameter values restored from __b21multiple_models_000005.iter
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.         ASC_CAR       ASC_TRAIN          B_COST          B_TIME     Function    Relgrad   Radius      Rho
    0          -0.072           -0.73           -0.93            -1.2      5.3e+03      0.017        1     0.82    +
    1           -0.15           -0.71            -1.1            -1.3      5.3e+03     0.0009       10        1   ++
    2           -0.15           -0.71            -1.1            -1.3      5.3e+03    3.9e-06       10        1   ++
Results saved in file b21multiple_models_000005~00.html
Results saved in file b21multiple_models_000005~00.pickle
Biogeme parameters provided by the user.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __b21multiple_models_000006.iter
Parameter values restored from __b21multiple_models_000006.iter
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.         ASC_CAR       ASC_TRAIN          B_COST          B_TIME     lambda_time     Function    Relgrad   Radius      Rho
    0         -0.0036           -0.37            -1.1            -1.7             0.5      5.3e+03      0.016        1     0.83    +
    1         -0.0049           -0.48            -1.1            -1.7            0.51      5.3e+03    0.00057       10        1   ++
    2         -0.0049           -0.48            -1.1            -1.7            0.51      5.3e+03    8.2e-07       10        1   ++
Results saved in file b21multiple_models_000006~00.html
Results saved in file b21multiple_models_000006~00.pickle
Biogeme parameters provided by the user.
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
*** Initial values of the parameters are obtained from the file __b21multiple_models_000007.iter
Parameter values restored from __b21multiple_models_000007.iter
As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds"
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter.         ASC_CAR      ASC_CAR_GA    ASC_CAR_male       ASC_TRAIN    ASC_TRAIN_GA  ASC_TRAIN_male          B_COST          B_TIME     Function    Relgrad   Radius      Rho
    0           -0.42           -0.45            0.41           -0.22               2            -1.2            -1.1            -1.7      4.9e+03    3.7e-05        1        1
Results saved in file b21multiple_models_000007~00.html
Results saved in file b21multiple_models_000007~00.pickle

summary, description = compile_estimation_results(all_results, use_short_names=True)
print(summary)

                                   Model_000000  ...     Model_000007
Number of estimated parameters                7  ...                8
Sample size                                6768  ...             6768
Final log likelihood               -4995.755387  ...     -4900.883444
Akaike Information Criterion       10005.510775  ...      9817.766888
Bayesian Information Criterion     10053.250501  ...      9872.326575
ASC_CAR (t-test)                -0.064  (-1.22)  ...  -0.389  (-3.95)
ASC_CAR_GA (t-test)             -0.313  (-1.59)  ...  -0.415  (-2.02)
ASC_TRAIN (t-test)               -1.03  (-13.9)  ...  -0.203  (-2.23)
ASC_TRAIN_GA (t-test)              2.04  (22.8)  ...     2.03  (22.4)
B_COST (t-test)                   -1.1  (-14.8)  ...   -1.06  (-15.2)
B_TIME (t-test)                  -1.67  (-21.3)  ...    -1.7  (-21.5)
lambda_time (t-test)              0.382  (5.18)  ...
ASC_CAR_male (t-test)                            ...    0.377  (3.65)
ASC_TRAIN_male (t-test)                          ...    -1.2  (-14.1)
B_COST_GA (t-test)                               ...
B_COST_inc-100+ (t-test)                         ...
B_COST_inc-50-100 (t-test)                       ...
B_COST_inc-under50 (t-test)                      ...
B_COST_inc-unknown (t-test)                      ...

[19 rows x 8 columns]

print(f'Summary table available in {CSV_FILE}')
summary.to_csv(CSV_FILE, sep=SEP_CSV)

Summary table available in b21process_pareto.csv

Explanation of the short names of the models.

with open(CSV_FILE, 'a', encoding='utf-8') as f:
    print('\n\n', file=f)
    for k, v in description.items():
        if k != v:
            print(f'{k}: {v}')
            print(f'{k}{SEP_CSV}{v}', file=f)

Model_000000: ASC:GA;B_COST:no_seg;TRAIN_TT:boxcox
Model_000001: ASC:MALE-GA;B_COST:no_seg;TRAIN_TT:boxcox
Model_000002: ASC:GA;B_COST:no_seg;TRAIN_TT:log
Model_000003: ASC:MALE-GA;B_COST:GA;TRAIN_TT:boxcox
Model_000004: ASC:MALE-GA;B_COST:INCOME;TRAIN_TT:boxcox
Model_000005: ASC:no_seg;B_COST:no_seg;TRAIN_TT:linear
Model_000006: ASC:no_seg;B_COST:no_seg;TRAIN_TT:boxcox
Model_000007: ASC:MALE-GA;B_COST:no_seg;TRAIN_TT:log

The following plot illustrates all models that have been estimated. Each dot corresponds to a model. The x-coordinate corresponds to the negative log-likelihood. The y-coordinate corresponds to the number of parameters. If the shape of the dot is a circle, it means that it corresponds to a Pareto optimal model. If the shape is a cross, it means that the model has been Pareto optimal at some point during the algorithm and later removed as a new model dominating it has been found.

if can_plot:
    try:
        _ = the_pareto_post.plot(
            label_x='Negative loglikelihood', label_y='Number of parameters'
        )
        plt.show()
    except OptimizationError as e:
        print(f'No plot available: {e}')

Total running time of the script: (0 minutes 1.547 seconds)

Gallery generated by Sphinx-Gallery