Re-estimation of best modelsΒΆ

After running the assisted specification algorithm for the 432 specifications in Combination of many specifications, we use post-processing to re-estimate all Pareto optimal models, and display some information about the algorithm. See Bierlaire and Ortelli (2023).

Michel Bierlaire, EPFL Sun Apr 27 2025, 18:38:57

from IPython.core.display_functions import display

from biogeme.biogeme import BIOGEME
from biogeme.results_processing import get_pandas_estimated_parameters

try:
    import matplotlib.pyplot as plt

    can_plot = True
except ModuleNotFoundError:
    can_plot = False
import biogeme.biogeme_logging as blog
from biogeme.assisted import ParetoPostProcessing

from everything_spec import model_catalog, database

logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example b08selected_specification')

PARETO_FILE_NAME = 'saved_results/b07everything_assisted.pareto'
Example b08selected_specification

Create the biogeme object from the catalog.

the_biogeme = BIOGEME(database, model_catalog)
the_biogeme.model_name = 'b09post_processing'
Biogeme parameters read from biogeme.toml.

Create the post-processing object.

post_processing = ParetoPostProcessing(
    biogeme_object=the_biogeme, pareto_file_name=PARETO_FILE_NAME
)
Pareto set initialized from file with 162 elements [13 Pareto] and 0 invalid elements.

Re-estimate the models.

all_results = post_processing.reestimate(recycle=True)
Biogeme parameters provided by the user.
Estimation results read from b09post_processing_000000.yaml. There is no guarantee that they correspond to the specified model.
Biogeme parameters provided by the user.
Estimation results read from b09post_processing_000001.yaml. There is no guarantee that they correspond to the specified model.
Biogeme parameters provided by the user.
Estimation results read from b09post_processing_000002.yaml. There is no guarantee that they correspond to the specified model.
Biogeme parameters provided by the user.
Estimation results read from b09post_processing_000003.yaml. There is no guarantee that they correspond to the specified model.
Biogeme parameters provided by the user.
Estimation results read from b09post_processing_000004.yaml. There is no guarantee that they correspond to the specified model.
Biogeme parameters provided by the user.
Estimation results read from b09post_processing_000005.yaml. There is no guarantee that they correspond to the specified model.
Biogeme parameters provided by the user.
Estimation results read from b09post_processing_000006.yaml. There is no guarantee that they correspond to the specified model.
Biogeme parameters provided by the user.
Estimation results read from b09post_processing_000007.yaml. There is no guarantee that they correspond to the specified model.
Biogeme parameters provided by the user.
Estimation results read from b09post_processing_000008.yaml. There is no guarantee that they correspond to the specified model.
Biogeme parameters provided by the user.
Estimation results read from b09post_processing_000009.yaml. There is no guarantee that they correspond to the specified model.
Biogeme parameters provided by the user.
Estimation results read from b09post_processing_000010.yaml. There is no guarantee that they correspond to the specified model.
Biogeme parameters provided by the user.
Estimation results read from b09post_processing_000011.yaml. There is no guarantee that they correspond to the specified model.
Biogeme parameters provided by the user.
Estimation results read from b09post_processing_000012.yaml. There is no guarantee that they correspond to the specified model.

We retrieve the first estimation results for illustration.

spec, results = next(iter(all_results.items()))
print(spec)
asc:GA-LUGGAGE;b_cost_gen_altspec:altspec;b_time:COMMUTERS;b_time_gen_altspec:altspec;model_catalog:nested existing;train_tt_catalog:power
print(results.short_summary())
Results for model b09post_processing_000000
Nbr of parameters:              16
Sample size:                    10719
Excluded data:                  9
Final log likelihood:           -8062.586
Akaike Information Criterion:   16157.17
Bayesian Information Criterion: 16273.65
estimated_parameters = get_pandas_estimated_parameters(estimation_results=results)
display(estimated_parameters)
                                Name     Value  ...  Robust t-stat.  Robust p-value
0                      asc_train_ref -0.855115  ...       -8.524096    0.000000e+00
1                  asc_train_diff_GA  0.972271  ...       12.327532    0.000000e+00
2            asc_train_diff_one_lugg  0.326023  ...        4.900363    9.565975e-07
3        asc_train_diff_several_lugg  0.239988  ...        1.540490    1.234409e-01
4                   b_time_train_ref -1.349547  ...      -21.225551    0.000000e+00
5        b_time_train_diff_commuters  0.441164  ...        3.234451    1.218768e-03
6                             b_cost -0.635630  ...      -13.634236    0.000000e+00
7                        mu_existing  1.728921  ...       15.527111    0.000000e+00
8                        asc_car_ref -0.497087  ...       -5.543355    2.967311e-08
9                    asc_car_diff_GA -0.314420  ...       -2.495324    1.258422e-02
10             asc_car_diff_one_lugg -0.070352  ...       -1.485807    1.373301e-01
11         asc_car_diff_several_lugg -0.355340  ...       -1.999446    4.556013e-02
12                    b_time_car_ref -1.005688  ...      -17.517694    0.000000e+00
13         b_time_car_diff_commuters  0.681442  ...        3.543630    3.946583e-04
14             b_time_swissmetro_ref -1.648466  ...      -23.863946    0.000000e+00
15  b_time_swissmetro_diff_commuters  1.639449  ...        8.447529    0.000000e+00

[16 rows x 5 columns]

The following plot illustrates all models that have been estimated. Each dot corresponds to a model. The x-coordinate corresponds to the Akaike Information Criterion (AIC). The y-coordinate corresponds to the Bayesian Information Criterion (BIC). Note that there is a third objective that does not appear on this picture: the number of parameters. If the shape of the dot is a circle, it means that it corresponds to a Pareto optimal model. If the shape is a cross, it means that the model has been Pareto optimal at some point during the algorithm and later removed as a new model dominating it has been found. If the shape is a start, it means that the model has been deemed invalid.

if can_plot:
    _ = post_processing.plot(
        label_x='Nbr of parameters',
        label_y='Negative log likelihood',
        objective_x=1,
        objective_y=0,
    )
plot b09post processing

Total running time of the script: (0 minutes 1.024 seconds)

Gallery generated by Sphinx-Gallery