.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/swissmetro/plot_b14nested_endogenous_sampling.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_swissmetro_plot_b14nested_endogenous_sampling.py: Nested logit with corrections for endogeneous sampling ====================================================== The sample is said to be endogenous if the probability for an individual to be in the sample depends on the choice that has been made. In that case, the ESML estimator is not appropriate anymore, and corrections need to be made. See `Bierlaire, Bolduc, McFadden (2008) `_. This is illustrated in this example. Michel Bierlaire, EPFL Sat Jun 21 2025, 17:13:33 .. GENERATED FROM PYTHON SOURCE LINES 17-27 .. code-block:: Python import biogeme.biogeme_logging as blog import numpy as np from IPython.core.display_functions import display from biogeme.biogeme import BIOGEME from biogeme.expressions import Beta from biogeme.models import get_mev_for_nested, logmev_endogenous_sampling from biogeme.nests import NestsForNestedLogit, OneNestForNestedLogit from biogeme.results_processing import get_pandas_estimated_parameters .. GENERATED FROM PYTHON SOURCE LINES 28-29 See the data processing script: :ref:`swissmetro_data`. .. GENERATED FROM PYTHON SOURCE LINES 29-46 .. code-block:: Python from swissmetro_data import ( CAR_AV_SP, CAR_CO_SCALED, CAR_TT_SCALED, CHOICE, SM_AV, SM_COST_SCALED, SM_TT_SCALED, TRAIN_AV_SP, TRAIN_COST_SCALED, TRAIN_TT_SCALED, database, ) logger = blog.get_screen_logger(level=blog.INFO) logger.info('Example b14nested_endogenous_sampling.py') .. rst-class:: sphx-glr-script-out .. code-block:: none Example b14nested_endogenous_sampling.py .. GENERATED FROM PYTHON SOURCE LINES 47-48 Parameters to be estimated. .. GENERATED FROM PYTHON SOURCE LINES 48-55 .. code-block:: Python asc_car = Beta('asc_car', 0, None, None, 0) asc_train = Beta('asc_train', 0, None, None, 0) asc_sm = Beta('asc_sm', 0, None, None, 1) b_time = Beta('b_time', 0, None, None, 0) b_cost = Beta('b_cost', 0, None, None, 0) nest_parameter = Beta('nest_parameter', 1, 1, 10, 0) .. GENERATED FROM PYTHON SOURCE LINES 56-59 In this example, we assume that the three modes exist, and that the sampling protocol is choice-based. The probability that a respondent belongs to the sample is R_i. .. GENERATED FROM PYTHON SOURCE LINES 59-63 .. code-block:: Python R_TRAIN = 4.42e-2 R_SM = 3.36e-3 R_CAR = 7.5e-3 .. GENERATED FROM PYTHON SOURCE LINES 64-65 The correction terms are the log of these quantities .. GENERATED FROM PYTHON SOURCE LINES 65-67 .. code-block:: Python correction = {1: np.log(R_TRAIN), 2: np.log(R_SM), 3: np.log(R_CAR)} .. GENERATED FROM PYTHON SOURCE LINES 68-69 Definition of the utility functions. .. GENERATED FROM PYTHON SOURCE LINES 69-73 .. code-block:: Python v_train = asc_train + b_time * TRAIN_TT_SCALED + b_cost * TRAIN_COST_SCALED v_swissmetro = asc_sm + b_time * SM_TT_SCALED + b_cost * SM_COST_SCALED v_car = asc_car + b_time * CAR_TT_SCALED + b_cost * CAR_CO_SCALED .. GENERATED FROM PYTHON SOURCE LINES 74-75 Associate utility functions with the numbering of alternatives. .. GENERATED FROM PYTHON SOURCE LINES 75-77 .. code-block:: Python v = {1: v_train, 2: v_swissmetro, 3: v_car} .. GENERATED FROM PYTHON SOURCE LINES 78-79 Associate the availability conditions with the alternatives. .. GENERATED FROM PYTHON SOURCE LINES 79-81 .. code-block:: Python av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP} .. GENERATED FROM PYTHON SOURCE LINES 82-86 Definition of nests. Only the non-trivial nests must be defined. A trivial nest is a nest containing exactly one alternative. In this example, we create a nest for the existing modes, that is train (1) and car (3). .. GENERATED FROM PYTHON SOURCE LINES 86-93 .. code-block:: Python existing = OneNestForNestedLogit( nest_param=nest_parameter, list_of_alternatives=[1, 3], name='existing' ) nests = NestsForNestedLogit(choice_set=list(v), tuple_of_nests=(existing,)) .. rst-class:: sphx-glr-script-out .. code-block:: none The following elements do not appear in any nest and are assumed each to be alone in a separate nest: {2}. If it is not the intention, check the assignment of alternatives to nests. .. GENERATED FROM PYTHON SOURCE LINES 94-96 The choice model is a nested logit, with corrections for endogenous sampling We first obtain the expression of the Gi function for nested logit. .. GENERATED FROM PYTHON SOURCE LINES 96-98 .. code-block:: Python probability_generating_function = get_mev_for_nested(v, av, nests) .. GENERATED FROM PYTHON SOURCE LINES 99-100 Then we calculate the MEV log probability, accounting for the correction. .. GENERATED FROM PYTHON SOURCE LINES 100-104 .. code-block:: Python log_probability = logmev_endogenous_sampling( v, probability_generating_function, av, correction, CHOICE ) .. GENERATED FROM PYTHON SOURCE LINES 105-106 Create the Biogeme object. .. GENERATED FROM PYTHON SOURCE LINES 106-109 .. code-block:: Python the_biogeme = BIOGEME(database, log_probability) the_biogeme.model_name = 'b14nested_endogenous_sampling' .. rst-class:: sphx-glr-script-out .. code-block:: none Biogeme parameters read from biogeme.toml. .. GENERATED FROM PYTHON SOURCE LINES 110-111 Estimate the parameters. .. GENERATED FROM PYTHON SOURCE LINES 111-113 .. code-block:: Python results = the_biogeme.estimate() .. rst-class:: sphx-glr-script-out .. code-block:: none *** Initial values of the parameters are obtained from the file __b14nested_endogenous_sampling.iter Parameter values restored from __b14nested_endogenous_sampling.iter Starting values for the algorithm: {'asc_train': -2.768565181199355, 'b_time': -0.9745266807210456, 'b_cost': -0.999260565958647, 'nest_parameter': 1.6310201876602868, 'asc_car': -1.1274305042219672} As the model is rather complex, we cancel the calculation of second derivatives. If you want to control the parameters, change the algorithm from "automatic" to "simple_bounds" in the TOML file. Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] ** Optimization: BFGS with trust region for simple bounds Iter. asc_train b_time b_cost nest_parameter asc_car Function Relgrad Radius Rho 0 -2.8 -0.97 -1 1.6 -1.1 5.2e+03 1.6e-05 0.017 -1.6e+03 - 1 -2.8 -0.97 -1 1.6 -1.1 5.2e+03 1.6e-05 0.0087 -1.1e+03 - 2 -2.8 -0.97 -1 1.6 -1.1 5.2e+03 1.6e-05 0.0043 -5.5e+02 - 3 -2.8 -0.97 -1 1.6 -1.1 5.2e+03 1.6e-05 0.0022 -2.5e+02 - 4 -2.8 -0.97 -1 1.6 -1.1 5.2e+03 1.6e-05 0.0011 -1.2e+02 - 5 -2.8 -0.97 -1 1.6 -1.1 5.2e+03 1.6e-05 0.00054 -56 - 6 -2.8 -0.97 -1 1.6 -1.1 5.2e+03 1.6e-05 0.00027 -27 - 7 -2.8 -0.97 -1 1.6 -1.1 5.2e+03 1.6e-05 0.00014 -13 - 8 -2.8 -0.97 -1 1.6 -1.1 5.2e+03 1.6e-05 6.8e-05 -6 - 9 -2.8 -0.97 -1 1.6 -1.1 5.2e+03 1.6e-05 3.4e-05 -2.5 - 10 -2.8 -0.97 -1 1.6 -1.1 5.2e+03 1.6e-05 1.7e-05 -0.75 - 11 -2.8 -0.97 -1 1.6 -1.1 5.2e+03 1.5e-05 1.7e-05 0.13 + 12 -2.8 -0.97 -1 1.6 -1.1 5.2e+03 1.4e-05 1.7e-05 0.79 + 13 -2.8 -0.97 -1 1.6 -1.1 5.2e+03 9.7e-06 1.7e-05 0.64 + 14 -2.8 -0.97 -1 1.6 -1.1 5.2e+03 1.1e-05 1.7e-05 0.72 + 15 -2.8 -0.97 -1 1.6 -1.1 5.2e+03 5.3e-06 1.7e-05 0.73 + Optimization algorithm has converged. Relative gradient: 5.30365030058552e-06 Cause of termination: Relative gradient = 5.3e-06 <= 6.1e-06 Number of function evaluations: 27 Number of gradient evaluations: 11 Number of hessian evaluations: 0 Algorithm: BFGS with trust region for simple bound constraints Number of iterations: 16 Proportion of Hessian calculation: 0/5 = 0.0% Optimization time: 0:00:00.405032 Calculate second derivatives and BHHH File b14nested_endogenous_sampling~00.html has been generated. File b14nested_endogenous_sampling~00.yaml has been generated. .. GENERATED FROM PYTHON SOURCE LINES 114-116 .. code-block:: Python print(results.short_summary()) .. rst-class:: sphx-glr-script-out .. code-block:: none Results for model b14nested_endogenous_sampling Nbr of parameters: 5 Sample size: 6768 Excluded data: 3960 Final log likelihood: -5202.916 Akaike Information Criterion: 10415.83 Bayesian Information Criterion: 10449.93 .. GENERATED FROM PYTHON SOURCE LINES 117-119 .. code-block:: Python pandas_results = get_pandas_estimated_parameters(estimation_results=results) display(pandas_results) .. rst-class:: sphx-glr-script-out .. code-block:: none Name Value Robust std err. Robust t-stat. Robust p-value 0 asc_train -2.768592 0.080536 -34.376951 0.0 1 b_time -0.974563 0.110775 -8.797674 0.0 2 b_cost -0.999335 0.064140 -15.580480 0.0 3 nest_parameter 1.631009 0.062667 26.026438 0.0 4 asc_car -1.127430 0.060631 -18.595046 0.0 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 2.059 seconds) .. _sphx_glr_download_auto_examples_swissmetro_plot_b14nested_endogenous_sampling.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_b14nested_endogenous_sampling.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_b14nested_endogenous_sampling.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_b14nested_endogenous_sampling.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_