.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/swissmetro/plot_b14nested_endogenous_sampling.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_swissmetro_plot_b14nested_endogenous_sampling.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_swissmetro_plot_b14nested_endogenous_sampling.py:


Nested logit with corrections for endogeneous sampling
======================================================

The sample is said to be endogenous if the probability for an
individual to be in the sample depends on the choice that has been
made. In that case, the ESML estimator is not appropriate anymore, and
corrections need to be made. See `Bierlaire, bolduc, McFadden (2008)
<https://dx.doi.org/10.1016/j.trb.2007.09.003>`_.

This is illustrated in this example.

:author: Michel Bierlaire, EPFL
:date: Sun Apr  9 18:25:03 2023

.. GENERATED FROM PYTHON SOURCE LINES 18-26

.. code-block:: Python


    import numpy as np
    import biogeme.biogeme_logging as blog
    import biogeme.biogeme as bio
    from biogeme import models
    from biogeme.expressions import Beta
    from biogeme.nests import OneNestForNestedLogit, NestsForNestedLogit


.. GENERATED FROM PYTHON SOURCE LINES 27-28

See the data processing script: :ref:`swissmetro_data`.

.. GENERATED FROM PYTHON SOURCE LINES 28-45

.. code-block:: Python

    from swissmetro_data import (
        database,
        CHOICE,
        SM_AV,
        CAR_AV_SP,
        TRAIN_AV_SP,
        TRAIN_TT_SCALED,
        TRAIN_COST_SCALED,
        SM_TT_SCALED,
        SM_COST_SCALED,
        CAR_TT_SCALED,
        CAR_CO_SCALED,
    )

    logger = blog.get_screen_logger(level=blog.INFO)
    logger.info('Example b14nested_endogenous_sampling.py')


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Example b14nested_endogenous_sampling.py 


.. GENERATED FROM PYTHON SOURCE LINES 46-47

Parameters to be estimated.

.. GENERATED FROM PYTHON SOURCE LINES 47-54

.. code-block:: Python

    ASC_CAR = Beta('ASC_CAR', 0, None, None, 0)
    ASC_TRAIN = Beta('ASC_TRAIN', 0, None, None, 0)
    ASC_SM = Beta('ASC_SM', 0, None, None, 1)
    B_TIME = Beta('B_TIME', 0, None, None, 0)
    B_COST = Beta('B_COST', 0, None, None, 0)
    MU = Beta('MU', 1, 1, 10, 0)


.. GENERATED FROM PYTHON SOURCE LINES 55-58

In this example, we assume that the three modes exist, and that the
sampling protocol is choice-based. The probability that a respondent
belongs to the sample is R_i.

.. GENERATED FROM PYTHON SOURCE LINES 58-62

.. code-block:: Python

    R_TRAIN = 4.42e-2
    R_SM = 3.36e-3
    R_CAR = 7.5e-3


.. GENERATED FROM PYTHON SOURCE LINES 63-64

The correction terms are the log of these quantities

.. GENERATED FROM PYTHON SOURCE LINES 64-66

.. code-block:: Python

    correction = {1: np.log(R_TRAIN), 2: np.log(R_SM), 3: np.log(R_CAR)}


.. GENERATED FROM PYTHON SOURCE LINES 67-68

Definition of the utility functions.

.. GENERATED FROM PYTHON SOURCE LINES 68-72

.. code-block:: Python

    V1 = ASC_TRAIN + B_TIME * TRAIN_TT_SCALED + B_COST * TRAIN_COST_SCALED
    V2 = ASC_SM + B_TIME * SM_TT_SCALED + B_COST * SM_COST_SCALED
    V3 = ASC_CAR + B_TIME * CAR_TT_SCALED + B_COST * CAR_CO_SCALED


.. GENERATED FROM PYTHON SOURCE LINES 73-74

Associate utility functions with the numbering of alternatives.

.. GENERATED FROM PYTHON SOURCE LINES 74-76

.. code-block:: Python

    V = {1: V1, 2: V2, 3: V3}


.. GENERATED FROM PYTHON SOURCE LINES 77-78

Associate the availability conditions with the alternatives.

.. GENERATED FROM PYTHON SOURCE LINES 78-80

.. code-block:: Python

    av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}


.. GENERATED FROM PYTHON SOURCE LINES 81-85

Definition of nests. Only the non-trivial nests must be defined. A
trivial nest is a nest containing exactly one alternative.  In this
example, we create a nest for the existing modes, that is train (1)
and car (3).

.. GENERATED FROM PYTHON SOURCE LINES 85-92

.. code-block:: Python


    existing = OneNestForNestedLogit(
        nest_param=MU, list_of_alternatives=[1, 3], name='existing'
    )

    nests = NestsForNestedLogit(choice_set=list(V), tuple_of_nests=(existing,))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    The following elements do not appear in any nest and are assumed each to be alone in a separate nest: {2}. If it is not the intention, check the assignment of alternatives to nests. 


.. GENERATED FROM PYTHON SOURCE LINES 93-95

The choice model is a nested logit, with corrections for endogenous sampling
We first obtain the expression of the Gi function for nested logit.

.. GENERATED FROM PYTHON SOURCE LINES 95-97

.. code-block:: Python

    Gi = models.get_mev_for_nested(V, av, nests)


.. GENERATED FROM PYTHON SOURCE LINES 98-99

Then we calculate the MEV log probability, accounting for the correction.

.. GENERATED FROM PYTHON SOURCE LINES 99-101

.. code-block:: Python

    logprob = models.logmev_endogenous_sampling(V, Gi, av, correction, CHOICE)


.. GENERATED FROM PYTHON SOURCE LINES 102-103

Create the Biogeme object.

.. GENERATED FROM PYTHON SOURCE LINES 103-106

.. code-block:: Python

    the_biogeme = bio.BIOGEME(database, logprob)
    the_biogeme.modelName = 'b14nested_endogenous_eampling'


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Biogeme parameters read from biogeme.toml. 


.. GENERATED FROM PYTHON SOURCE LINES 107-108

Estimate the parameters.

.. GENERATED FROM PYTHON SOURCE LINES 108-110

.. code-block:: Python

    results = the_biogeme.estimate()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds" 
    *** Initial values of the parameters are obtained from the file __b14nested_endogenous_eampling.iter 
    Cannot read file __b14nested_endogenous_eampling.iter. Statement is ignored. 
    As the model is not too complex, we activate the calculation of second derivatives. If you want to change it, change the name of the algorithm in the TOML file from "automatic" to "simple_bounds" 
    Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 
    ** Optimization: Newton with trust region for simple bounds 
    Iter.         ASC_CAR       ASC_TRAIN          B_COST          B_TIME              MU     Function    Relgrad   Radius      Rho      
        0               1              -1            0.47              -1               2      8.1e+03       0.15        1     0.56    + 
        1               0            -1.1          0.0036              -2             1.6      6.1e+03      0.068       10      1.1   ++ 
        2            -1.5              -3           -0.95           -0.12             2.2      5.4e+03      0.036       10     0.71    + 
        3            -1.4            -2.9           -0.91           -0.46             1.9      5.3e+03      0.015    1e+02      1.3   ++ 
        4            -1.2            -2.8           -0.98           -0.86             1.7      5.2e+03     0.0048    1e+03      1.1   ++ 
        5            -1.1            -2.8              -1           -0.97             1.6      5.2e+03    0.00035    1e+04        1   ++ 
        6            -1.1            -2.8              -1           -0.97             1.6      5.2e+03      2e-06    1e+04        1   ++ 
    Results saved in file b14nested_endogenous_eampling.html 
    Results saved in file b14nested_endogenous_eampling.pickle 


.. GENERATED FROM PYTHON SOURCE LINES 111-113

.. code-block:: Python

    print(results.short_summary())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Results for model b14nested_endogenous_eampling
    Nbr of parameters:              5
    Sample size:                    6768
    Excluded data:                  3960
    Final log likelihood:           -5202.916
    Akaike Information Criterion:   10415.83
    Bayesian Information Criterion: 10449.93


.. GENERATED FROM PYTHON SOURCE LINES 114-116

.. code-block:: Python

    pandas_results = results.get_estimated_parameters()
    pandas_results


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Value</th>
          <th>Rob. Std err</th>
          <th>Rob. t-test</th>
          <th>Rob. p-value</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>ASC_CAR</th>
          <td>-1.127438</td>
          <td>0.060630</td>
          <td>-18.595448</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>ASC_TRAIN</th>
          <td>-2.768608</td>
          <td>0.080535</td>
          <td>-34.377505</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>B_COST</th>
          <td>-0.999353</td>
          <td>0.064141</td>
          <td>-15.580617</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>B_TIME</th>
          <td>-0.974561</td>
          <td>0.110774</td>
          <td>-8.797759</td>
          <td>0.0</td>
        </tr>
        <tr>
          <th>MU</th>
          <td>1.630982</td>
          <td>0.062661</td>
          <td>26.028460</td>
          <td>0.0</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.383 seconds)


.. _sphx_glr_download_auto_examples_swissmetro_plot_b14nested_endogenous_sampling.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_b14nested_endogenous_sampling.ipynb <plot_b14nested_endogenous_sampling.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_b14nested_endogenous_sampling.py <plot_b14nested_endogenous_sampling.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_b14nested_endogenous_sampling.zip <plot_b14nested_endogenous_sampling.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_