Re-estimate the Pareto optimal modelsΒΆ

The assisted specification algorithm generates a file containing the pareto optimal specification. This script is designed to re-estimate the Pareto optimal models. The catalog of specifications is defined in Specification of a catalog of models .

Michel Bierlaire, EPFL Sat Jun 28 2025, 12:35:57m

from biogeme.results_processing import compile_estimation_results

try:
    import matplotlib.pyplot as plt

    can_plot = True
except ModuleNotFoundError:
    can_plot = False
from biogeme.assisted import ParetoPostProcessing
from plot_b22b_multiple_models_spec import the_biogeme, PARETO_FILE_NAME


PATH_PARETO_FILE_NAME = f'saved_results/{PARETO_FILE_NAME}'

CSV_FILE = 'b22_process_pareto.csv'
SEP_CSV = ','

The constructor of the Pareto post-processing object takes two arguments:

  • the biogeme object,

  • the name of the file where the algorithm has stored the estimated models.

the_pareto_post = ParetoPostProcessing(
    biogeme_object=the_biogeme,
    pareto_file_name=PATH_PARETO_FILE_NAME,
)
the_pareto_post.log_statistics()
all_results = the_pareto_post.reestimate(recycle=True)
summary, description = compile_estimation_results(all_results, use_short_names=True)
print(summary)
                                        Model_000000  ...      Model_000006
Number of estimated parameters                    10  ...                 7
Sample size                                     6768  ...              6768
Final log likelihood                       -594197.4  ...         -5728.627
Akaike Information Criterion                 1188415  ...          11471.25
Bayesian Information Criterion               1188483  ...          11518.99
asc_train_ref (t-test)                1.83  (0.0638)  ...    -1.26  (-6.12)
asc_train_diff_male (t-test)          0.663  (0.303)  ...
asc_train_diff_with_ga (t-test)       4.09  (0.0426)  ...      2.12  (9.55)
b_time (t-test)                      -0.274  (-3.03)  ...       -3  (-19.3)
lambda_tt (t-test)                      2.39  (7.74)  ...
b_cost (t-test)                  -0.104  (-0.000497)  ...    -1.45  (-14.9)
b_headway (t-test)                  1.72  (1.15e-07)  ...  -0.0117  (-3.54)
asc_car_ref (t-test)             -2.63  (-1.46e-308)  ...   -0.233  (-2.53)
asc_car_diff_male (t-test)       -1.39  (-7.72e-309)  ...
asc_car_diff_with_ga (t-test)       -4  (-2.22e-308)  ...     -1.9  (-9.78)
asc_train (t-test)                                    ...
asc_car (t-test)                                      ...

[17 rows x 7 columns]
print(f'Summary table available in {CSV_FILE}')
summary.to_csv(CSV_FILE, sep=SEP_CSV)
Summary table available in b22_process_pareto.csv

Explanation of the short names of the models.

with open(CSV_FILE, 'a', encoding='utf-8') as f:
    print('\n\n', file=f)
    for k, v in description.items():
        if k != v:
            print(f'{k}: {v}')
            print(f'{k}{SEP_CSV}{v}', file=f)
Model_000000: asc:MALE-GA;train_cost_catalog:log;train_headway_catalog:with_headway;train_tt_catalog:boxcox
Model_000001: asc:MALE-GA;train_cost_catalog:log;train_headway_catalog:with_headway;train_tt_catalog:log
Model_000002: asc:MALE-GA;train_cost_catalog:log;train_headway_catalog:without_headway;train_tt_catalog:log
Model_000003: asc:no_seg;train_cost_catalog:linear;train_headway_catalog:without_headway;train_tt_catalog:sqrt
Model_000004: asc:no_seg;train_cost_catalog:sqrt;train_headway_catalog:with_headway;train_tt_catalog:sqrt
Model_000005: asc:GA;train_cost_catalog:log;train_headway_catalog:without_headway;train_tt_catalog:sqrt
Model_000006: asc:GA;train_cost_catalog:log;train_headway_catalog:with_headway;train_tt_catalog:log

The following plot illustrates all models that have been estimated. Each dot corresponds to a model. The x-coordinate corresponds to the Akaike Information Criterion (AIC). The y-coordinate corresponds to the Bayesian Information Criterion (BIC). Note that there is a third objective that does not appear on this picture: the number of parameters. If the shape of the dot is a circle, it means that it corresponds to a Pareto optimal model. If the shape is a cross, it means that the model has been Pareto optimal at some point during the algorithm and later removed as a new model dominating it has been found. If the shape is a start, it means that the model has been deemed invalid.

if can_plot:
    _ = the_pareto_post.plot(label_x='AIC', label_y='BIC')
    plt.show()
plot b22c process pareto

It is possible to plot two different objectives: AIC and number of parameters.

if can_plot:
    _ = the_pareto_post.plot(
        objective_x=0, objective_y=2, label_x='AIC', label_y='Number of parameters'
    )
    plt.show()
plot b22c process pareto

It is possible to plot two different objectives: BIC and number of parameters.

if can_plot:
    _ = the_pareto_post.plot(
        objective_x=1, objective_y=2, label_x='BIC', label_y='Number of parameters'
    )
    plt.show()
plot b22c process pareto

Total running time of the script: (0 minutes 10.330 seconds)

Gallery generated by Sphinx-Gallery