.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/swissmetro/plot_b04validation.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_swissmetro_plot_b04validation.py: Out-of-sample validation ======================== Example of the out-of-sample validation of a logit model. :author: Michel Bierlaire, EPFL :date: Sun Apr 9 17:24:32 2023 .. GENERATED FROM PYTHON SOURCE LINES 12-16 .. code-block:: default import biogeme.biogeme as bio from biogeme import models from biogeme.expressions import Beta .. GENERATED FROM PYTHON SOURCE LINES 17-18 See the data processing script: :ref:`swissmetro_data`. .. GENERATED FROM PYTHON SOURCE LINES 18-32 .. code-block:: default from swissmetro_data import ( database, CHOICE, SM_AV, CAR_AV_SP, TRAIN_AV_SP, TRAIN_TT_SCALED, TRAIN_COST_SCALED, SM_TT_SCALED, SM_COST_SCALED, CAR_TT_SCALED, CAR_CO_SCALED, ) .. GENERATED FROM PYTHON SOURCE LINES 33-34 Parameters to be estimated. .. GENERATED FROM PYTHON SOURCE LINES 34-40 .. code-block:: default ASC_CAR = Beta('ASC_CAR', 0, None, None, 0) ASC_TRAIN = Beta('ASC_TRAIN', 0, None, None, 0) ASC_SM = Beta('ASC_SM', 0, None, None, 1) B_TIME = Beta('B_TIME', 0, None, None, 0) B_COST = Beta('B_COST', 0, None, None, 0) .. GENERATED FROM PYTHON SOURCE LINES 41-42 Definition of the utility functions. .. GENERATED FROM PYTHON SOURCE LINES 42-46 .. code-block:: default V1 = ASC_TRAIN + B_TIME * TRAIN_TT_SCALED + B_COST * TRAIN_COST_SCALED V2 = ASC_SM + B_TIME * SM_TT_SCALED + B_COST * SM_COST_SCALED V3 = ASC_CAR + B_TIME * CAR_TT_SCALED + B_COST * CAR_CO_SCALED .. GENERATED FROM PYTHON SOURCE LINES 47-48 Associate utility functions with the numbering of alternatives. .. GENERATED FROM PYTHON SOURCE LINES 48-50 .. code-block:: default V = {1: V1, 2: V2, 3: V3} .. GENERATED FROM PYTHON SOURCE LINES 51-52 Associate the availability conditions with the alternatives. .. GENERATED FROM PYTHON SOURCE LINES 52-54 .. code-block:: default av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP} .. GENERATED FROM PYTHON SOURCE LINES 55-57 Definition of the model. This is the contribution of each observation to the log likelihood function. .. GENERATED FROM PYTHON SOURCE LINES 57-59 .. code-block:: default logprob = models.loglogit(V, av, CHOICE) .. GENERATED FROM PYTHON SOURCE LINES 60-61 Create the Biogeme object. .. GENERATED FROM PYTHON SOURCE LINES 61-64 .. code-block:: default the_biogeme = bio.BIOGEME(database, logprob) the_biogeme.modelName = 'b04validation' .. GENERATED FROM PYTHON SOURCE LINES 65-66 Estimate the parameters. .. GENERATED FROM PYTHON SOURCE LINES 66-68 .. code-block:: default results = the_biogeme.estimate() .. GENERATED FROM PYTHON SOURCE LINES 69-77 The validation consists in organizing the data into several slices of about the same size, randomly defined. Each slice is considered as a validation dataset. The model is then re-estimated using all the data except the slice, and the estimated model is applied on the validation set (i.e. the slice). The value of the log likelihood for each observation in the validation set is reported in a dataframe. As this is done for each slice, the output is a list of dataframes, each corresponding to one of these exercises. .. GENERATED FROM PYTHON SOURCE LINES 77-87 .. code-block:: default validation_data = database.split(slices=5) validation_results = the_biogeme.validate(results, validation_data) for slide in validation_results: print( f'Log likelihood for {slide.shape[0]} validation data: ' f'{slide["Loglikelihood"].sum()}' ) .. rst-class:: sphx-glr-script-out .. code-block:: none /Users/bierlair/venv312/lib/python3.12/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead. return bound(*args, **kwds) Log likelihood for 1354 validation data: -1072.160299473448 Log likelihood for 1354 validation data: -1040.1429748703445 Log likelihood for 1354 validation data: -1023.5165744383794 Log likelihood for 1353 validation data: -1123.2285680838186 Log likelihood for 1353 validation data: -1085.467451885923 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.092 seconds) .. _sphx_glr_download_auto_examples_swissmetro_plot_b04validation.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_b04validation.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_b04validation.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_