.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/programmers/plot_database.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_programmers_plot_database.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_programmers_plot_database.py:


biogeme.database
================

Examples of use of several functions.

This is designed for programmers who need examples of use of the
functions of the module. The examples are designed to illustrate the
syntax. They do not correspond to any meaningful model.

:author: Michel Bierlaire
:date: Thu Nov 16 18:36:59 2023

.. GENERATED FROM PYTHON SOURCE LINES 15-26

.. code-block:: Python


    import biogeme.version as ver
    import pandas as pd
    import numpy as np
    import biogeme.database as db
    from biogeme.expressions import Variable, exp, bioDraws
    from biogeme.expressions import TypeOfElementaryExpression
    from biogeme.native_draws import description_of_native_draws, RandomNumberGeneratorTuple
    from biogeme.segmentation import DiscreteSegmentationTuple
    from biogeme.exceptions import BiogemeError


.. GENERATED FROM PYTHON SOURCE LINES 27-28

Version of Biogeme.

.. GENERATED FROM PYTHON SOURCE LINES 28-30

.. code-block:: Python

    print(ver.get_text())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    biogeme 3.2.14 [2024-08-05]
    Home page: http://biogeme.epfl.ch
    Submit questions to https://groups.google.com/d/forum/biogeme
    Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)


.. GENERATED FROM PYTHON SOURCE LINES 31-32

We set the seed so that the outcome of random operations is always the same.

.. GENERATED FROM PYTHON SOURCE LINES 32-34

.. code-block:: Python

    np.random.seed(90267)


.. GENERATED FROM PYTHON SOURCE LINES 35-36

Create a database from a pandas data frame.

.. GENERATED FROM PYTHON SOURCE LINES 36-51

.. code-block:: Python

    df = pd.DataFrame(
        {
            'Person': [1, 1, 1, 2, 2],
            'Exclude': [0, 0, 1, 0, 1],
            'Variable1': [1, 2, 3, 4, 5],
            'Variable2': [10, 20, 30, 40, 50],
            'Choice': [1, 2, 3, 1, 2],
            'Av1': [0, 1, 1, 1, 1],
            'Av2': [1, 1, 1, 1, 1],
            'Av3': [0, 1, 1, 1, 1],
        }
    )
    my_data = db.Database('test', df)
    print(my_data)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    biogeme database test:
       Person  Exclude  Variable1  Variable2  Choice  Av1  Av2  Av3
    0       1        0          1         10       1    0    1    0
    1       1        0          2         20       2    1    1    1
    2       1        1          3         30       3    1    1    1
    3       2        0          4         40       1    1    1    1
    4       2        1          5         50       2    1    1    1


.. GENERATED FROM PYTHON SOURCE LINES 52-56

`valuesFromDatabase`: evaluates an expression for each entry of the
database. Takes as argument an expression, and returns a numpy
series, long as the number of entries in the database, containing
the calculated quantities.

.. GENERATED FROM PYTHON SOURCE LINES 58-64

.. code-block:: Python

    Variable1 = Variable('Variable1')
    Variable2 = Variable('Variable2')
    expr = Variable1 + Variable2
    result = my_data.values_from_database(expr)
    print(result)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [11. 22. 33. 44. 55.]


.. GENERATED FROM PYTHON SOURCE LINES 65-69

`check_segmentation`: checks that the segmentation covers the complete database.
A segmentation is a partition of the dataset based on the value of
one of the variables. For instance, we can segment on the Choice
variable.

.. GENERATED FROM PYTHON SOURCE LINES 71-76

.. code-block:: Python

    correct_mapping = {1: 'Alt. 1', 2: 'Alt. 2', 3: 'Alt. 3'}
    correct_segmentation = DiscreteSegmentationTuple(
        variable='Choice', mapping=correct_mapping
    )


.. GENERATED FROM PYTHON SOURCE LINES 77-79

If the segmentation is well defined, the function returns the size
of each segment in the database.

.. GENERATED FROM PYTHON SOURCE LINES 81-83

.. code-block:: Python

    my_data.check_segmentation(correct_segmentation)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    {'Alt. 1': np.int64(2), 'Alt. 2': np.int64(2), 'Alt. 3': np.int64(1)}


.. GENERATED FROM PYTHON SOURCE LINES 84-89

.. code-block:: Python

    incorrect_mapping = {1: 'Alt. 1', 2: 'Alt. 2'}
    incorrect_segmentation = DiscreteSegmentationTuple(
        variable='Choice', mapping=incorrect_mapping
    )


.. GENERATED FROM PYTHON SOURCE LINES 90-91

If the segmentation is incorrect, an exception is raised.

.. GENERATED FROM PYTHON SOURCE LINES 93-98

.. code-block:: Python

    try:
        my_data.check_segmentation(incorrect_segmentation)
    except BiogemeError as e:
        print(e)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Variable Choice takes the value 3 [1 times], and it does not define any segment.


.. GENERATED FROM PYTHON SOURCE LINES 99-104

.. code-block:: Python

    another_incorrect_mapping = {1: 'Alt. 1', 2: 'Alt. 2', 4: 'Does not exist'}
    another_incorrect_segmentation = DiscreteSegmentationTuple(
        variable='Choice', mapping=another_incorrect_mapping
    )


.. GENERATED FROM PYTHON SOURCE LINES 105-110

.. code-block:: Python

    try:
        my_data.check_segmentation(another_incorrect_segmentation)
    except BiogemeError as e:
        print(e)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Variable Choice does not take the value 4 representing segment "Does not exist"


.. GENERATED FROM PYTHON SOURCE LINES 111-114

`checkAvailabilityOfChosenAlt`: check if the chosen alternative
is available for each entry in the database.
%%

.. GENERATED FROM PYTHON SOURCE LINES 114-122

.. code-block:: Python

    Av1 = Variable('Av1')
    Av2 = Variable('Av2')
    Av3 = Variable('Av3')
    Choice = Variable('Choice')
    avail = {1: Av1, 2: Av2, 3: Av3}
    result = my_data.check_availability_of_chosen_alt(avail, Choice)
    print(result)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [False  True  True  True  True]


.. GENERATED FROM PYTHON SOURCE LINES 123-125

`choiceAvailabilityStatistics`: calculates the number of time an
alternative is chosen and available.

.. GENERATED FROM PYTHON SOURCE LINES 125-127

.. code-block:: Python

    my_data.choice_availability_statistics(avail, Choice)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    {np.float64(1.0): (2, np.float64(4.0)), np.float64(2.0): (2, np.float64(5.0)), np.float64(3.0): (1, np.float64(4.0))}


.. GENERATED FROM PYTHON SOURCE LINES 128-130

Suggest a scaling of the variables in the database
%%

.. GENERATED FROM PYTHON SOURCE LINES 130-132

.. code-block:: Python

    my_data.data.columns


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Index(['Person', 'Exclude', 'Variable1', 'Variable2', 'Choice', 'Av1', 'Av2',
           'Av3'],
          dtype='object')


.. GENERATED FROM PYTHON SOURCE LINES 133-135

.. code-block:: Python

    my_data.suggest_scaling()


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Column</th>
          <th>Scale</th>
          <th>Largest</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>3</th>
          <td>Variable2</td>
          <td>0.01</td>
          <td>50</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 136-138

.. code-block:: Python

    my_data.suggest_scaling(columns=['Variable1', 'Variable2'])


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Column</th>
          <th>Scale</th>
          <th>Largest</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>1</th>
          <td>Variable2</td>
          <td>0.01</td>
          <td>50</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 139-142

`scaleColumn`: divide an entire column by a scale value
%%
Before.

.. GENERATED FROM PYTHON SOURCE LINES 142-144

.. code-block:: Python

    my_data.data


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Person</th>
          <th>Exclude</th>
          <th>Variable1</th>
          <th>Variable2</th>
          <th>Choice</th>
          <th>Av1</th>
          <th>Av2</th>
          <th>Av3</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>10</td>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>0</td>
        </tr>
        <tr>
          <th>1</th>
          <td>1</td>
          <td>0</td>
          <td>2</td>
          <td>20</td>
          <td>2</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
        <tr>
          <th>2</th>
          <td>1</td>
          <td>1</td>
          <td>3</td>
          <td>30</td>
          <td>3</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
        <tr>
          <th>3</th>
          <td>2</td>
          <td>0</td>
          <td>4</td>
          <td>40</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
        <tr>
          <th>4</th>
          <td>2</td>
          <td>1</td>
          <td>5</td>
          <td>50</td>
          <td>2</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 145-147

.. code-block:: Python

    my_data.scale_column('Variable2', 0.01)


.. GENERATED FROM PYTHON SOURCE LINES 148-149

After.

.. GENERATED FROM PYTHON SOURCE LINES 149-151

.. code-block:: Python

    my_data.data


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Person</th>
          <th>Exclude</th>
          <th>Variable1</th>
          <th>Variable2</th>
          <th>Choice</th>
          <th>Av1</th>
          <th>Av2</th>
          <th>Av3</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>0.1</td>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>0</td>
        </tr>
        <tr>
          <th>1</th>
          <td>1</td>
          <td>0</td>
          <td>2</td>
          <td>0.2</td>
          <td>2</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
        <tr>
          <th>2</th>
          <td>1</td>
          <td>1</td>
          <td>3</td>
          <td>0.3</td>
          <td>3</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
        <tr>
          <th>3</th>
          <td>2</td>
          <td>0</td>
          <td>4</td>
          <td>0.4</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
        <tr>
          <th>4</th>
          <td>2</td>
          <td>1</td>
          <td>5</td>
          <td>0.5</td>
          <td>2</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 152-154

`addColumn`: add a new column in the database, calculated from an expression.
%%

.. GENERATED FROM PYTHON SOURCE LINES 154-161

.. code-block:: Python

    Variable1 = Variable('Variable1')
    Variable2 = Variable('Variable2')
    expression = exp(0.5 * Variable2) / Variable1
    # expression = Variable2 * Variable1
    result = my_data.add_column(expression, 'NewVariable')
    print(my_data.data['NewVariable'].tolist())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [1.0512710963760241, 0.5525854590378239, 0.38727808090942767, 0.30535068954004246, 0.25680508333754826]


.. GENERATED FROM PYTHON SOURCE LINES 162-164

.. code-block:: Python

    my_data.data


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Person</th>
          <th>Exclude</th>
          <th>Variable1</th>
          <th>Variable2</th>
          <th>Choice</th>
          <th>Av1</th>
          <th>Av2</th>
          <th>Av3</th>
          <th>NewVariable</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>0.1</td>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>0</td>
          <td>1.051271</td>
        </tr>
        <tr>
          <th>1</th>
          <td>1</td>
          <td>0</td>
          <td>2</td>
          <td>0.2</td>
          <td>2</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>0.552585</td>
        </tr>
        <tr>
          <th>2</th>
          <td>1</td>
          <td>1</td>
          <td>3</td>
          <td>0.3</td>
          <td>3</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>0.387278</td>
        </tr>
        <tr>
          <th>3</th>
          <td>2</td>
          <td>0</td>
          <td>4</td>
          <td>0.4</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>0.305351</td>
        </tr>
        <tr>
          <th>4</th>
          <td>2</td>
          <td>1</td>
          <td>5</td>
          <td>0.5</td>
          <td>2</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>0.256805</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 165-169

`split`: shuffle the data, and split the data into slices. For each
slide, an estimation and a validation sets are generated. The
validation set is the slice itself. The estimation set is the rest
of the data.

.. GENERATED FROM PYTHON SOURCE LINES 171-180

.. code-block:: Python

    dataSets = my_data.split(3)
    for i in dataSets:
        print("==========")
        print("Estimation:")
        print(type(i[0]))
        print(i[0])
        print("Validation:")
        print(i[1])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    /Users/bierlair/venv312/lib/python3.12/site-packages/numpy/_core/fromnumeric.py:57: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
      return bound(*args, **kwds)
    ==========
    Estimation:
    <class 'pandas.core.frame.DataFrame'>
       Person  Exclude  Variable1  Variable2  Choice  Av1  Av2  Av3  NewVariable
    0       1        0          1        0.1       1    0    1    0     1.051271
    3       2        0          4        0.4       1    1    1    1     0.305351
    4       2        1          5        0.5       2    1    1    1     0.256805
    Validation:
       Person  Exclude  Variable1  Variable2  Choice  Av1  Av2  Av3  NewVariable
    1       1        0          2        0.2       2    1    1    1     0.552585
    2       1        1          3        0.3       3    1    1    1     0.387278
    ==========
    Estimation:
    <class 'pandas.core.frame.DataFrame'>
       Person  Exclude  Variable1  Variable2  Choice  Av1  Av2  Av3  NewVariable
    1       1        0          2        0.2       2    1    1    1     0.552585
    2       1        1          3        0.3       3    1    1    1     0.387278
    4       2        1          5        0.5       2    1    1    1     0.256805
    Validation:
       Person  Exclude  Variable1  Variable2  Choice  Av1  Av2  Av3  NewVariable
    0       1        0          1        0.1       1    0    1    0     1.051271
    3       2        0          4        0.4       1    1    1    1     0.305351
    ==========
    Estimation:
    <class 'pandas.core.frame.DataFrame'>
       Person  Exclude  Variable1  Variable2  Choice  Av1  Av2  Av3  NewVariable
    1       1        0          2        0.2       2    1    1    1     0.552585
    2       1        1          3        0.3       3    1    1    1     0.387278
    0       1        0          1        0.1       1    0    1    0     1.051271
    3       2        0          4        0.4       1    1    1    1     0.305351
    Validation:
       Person  Exclude  Variable1  Variable2  Choice  Av1  Av2  Av3  NewVariable
    4       2        1          5        0.5       2    1    1    1     0.256805


.. GENERATED FROM PYTHON SOURCE LINES 181-183

`count`: counts the number of observations that have a specific
value in a given column.

.. GENERATED FROM PYTHON SOURCE LINES 185-187

For instance, counts the number of entries for individual 1.
%%

.. GENERATED FROM PYTHON SOURCE LINES 187-189

.. code-block:: Python

    my_data.count('Person', 1)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    np.int64(3)


.. GENERATED FROM PYTHON SOURCE LINES 190-193

`remove`: removes from the database all entries such that the value
of the expression is not 0.
%%

.. GENERATED FROM PYTHON SOURCE LINES 193-197

.. code-block:: Python

    exclude = Variable('Exclude')
    my_data.remove(exclude)
    my_data.data


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Person</th>
          <th>Exclude</th>
          <th>Variable1</th>
          <th>Variable2</th>
          <th>Choice</th>
          <th>Av1</th>
          <th>Av2</th>
          <th>Av3</th>
          <th>NewVariable</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>0.1</td>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>0</td>
          <td>1.051271</td>
        </tr>
        <tr>
          <th>1</th>
          <td>1</td>
          <td>0</td>
          <td>2</td>
          <td>0.2</td>
          <td>2</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>0.552585</td>
        </tr>
        <tr>
          <th>3</th>
          <td>2</td>
          <td>0</td>
          <td>4</td>
          <td>0.4</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>0.305351</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 198-199

`dumpOnFile`: dumps the database in a CSV formatted file.

.. GENERATED FROM PYTHON SOURCE LINES 199-201

.. code-block:: Python

    my_data.dump_on_file()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    'test_dumped.dat'


.. GENERATED FROM PYTHON SOURCE LINES 202-204

%%bash
cat test_dumped.dat

.. GENERATED FROM PYTHON SOURCE LINES 206-220

.. code-block:: Python


    # `generateDraws`: generate draws for each variable. Takes as argument
    #                  a dict indexed by the names of the variables,
    #                  describing the types of draws. Each of them can be
    #                  a native type or any type defined by the function
    #                  database.setRandomNumberGenerators, as well as the
    #                  list of names of the variables that require draws
    #                  to be generated.  It returns a 3-dimensional table
    #                  with draws. The 3 dimensions are
    #
    #               1. number of individuals
    #               2. number of draws
    #               3. number of variables


.. GENERATED FROM PYTHON SOURCE LINES 221-222

List of native types and their description

.. GENERATED FROM PYTHON SOURCE LINES 222-224

.. code-block:: Python

    description_of_native_draws()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    {'UNIFORM': 'Uniform U[0, 1]', 'UNIFORM_ANTI': 'Antithetic uniform U[0, 1]', 'UNIFORM_HALTON2': 'Halton draws with base 2, skipping the first 10', 'UNIFORM_HALTON3': 'Halton draws with base 3, skipping the first 10', 'UNIFORM_HALTON5': 'Halton draws with base 5, skipping the first 10', 'UNIFORM_MLHS': 'Modified Latin Hypercube Sampling on [0, 1]', 'UNIFORM_MLHS_ANTI': 'Antithetic Modified Latin Hypercube Sampling on [0, 1]', 'UNIFORMSYM': 'Uniform U[-1, 1]', 'UNIFORMSYM_ANTI': 'Antithetic uniform U[-1, 1]', 'UNIFORMSYM_HALTON2': 'Halton draws on [-1, 1] with base 2, skipping the first 10', 'UNIFORMSYM_HALTON3': 'Halton draws on [-1, 1] with base 3, skipping the first 10', 'UNIFORMSYM_HALTON5': 'Halton draws on [-1, 1] with base 5, skipping the first 10', 'UNIFORMSYM_MLHS': 'Modified Latin Hypercube Sampling on [-1, 1]', 'UNIFORMSYM_MLHS_ANTI': 'Antithetic Modified Latin Hypercube Sampling on [-1, 1]', 'NORMAL': 'Normal N(0, 1) draws', 'NORMAL_ANTI': 'Antithetic normal draws', 'NORMAL_HALTON2': 'Normal draws from Halton base 2 sequence', 'NORMAL_HALTON3': 'Normal draws from Halton base 3 sequence', 'NORMAL_HALTON5': 'Normal draws from Halton base 5 sequence', 'NORMAL_MLHS': 'Normal draws from Modified Latin Hypercube Sampling', 'NORMAL_MLHS_ANTI': 'Antithetic normal draws from Modified Latin Hypercube Sampling'}


.. GENERATED FROM PYTHON SOURCE LINES 225-229

.. code-block:: Python

    random_draws1 = bioDraws('random_draws1', 'NORMAL_MLHS_ANTI')
    random_draws2 = bioDraws('random_draws2', 'UNIFORM_MLHS_ANTI')
    random_draws3 = bioDraws('random_draws3', 'UNIFORMSYM_MLHS_ANTI')


.. GENERATED FROM PYTHON SOURCE LINES 230-231

We build an expression that involves the three random variables

.. GENERATED FROM PYTHON SOURCE LINES 231-236

.. code-block:: Python

    x = random_draws1 + random_draws2 + random_draws3
    dict_of_draws = x.dict_of_elementary_expression(TypeOfElementaryExpression.DRAWS)
    types = {name: expression.drawType for name, expression in dict_of_draws.items()}
    print(types)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    {'random_draws1': 'NORMAL_MLHS_ANTI', 'random_draws2': 'UNIFORM_MLHS_ANTI', 'random_draws3': 'UNIFORMSYM_MLHS_ANTI'}


.. GENERATED FROM PYTHON SOURCE LINES 237-238

Generation of the draws.

.. GENERATED FROM PYTHON SOURCE LINES 238-243

.. code-block:: Python

    the_draws_table = my_data.generate_draws(
        types, ['random_draws1', 'random_draws2', 'random_draws3'], 10
    )
    the_draws_table


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    array([[[-0.5605896 ,  0.17260212, -0.35933972],
            [-0.13811324,  0.53162299,  0.85919231],
            [ 1.82908818,  0.04835596,  0.13484656],
            [ 0.38628367,  0.78402643, -0.65819673],
            [ 1.1505448 ,  0.6848117 ,  0.68832765],
            [ 0.5605896 ,  0.82739788,  0.35933972],
            [ 0.13811324,  0.46837701, -0.85919231],
            [-1.82908818,  0.95164404, -0.13484656],
            [-0.38628367,  0.21597357,  0.65819673],
            [-1.1505448 ,  0.3151883 , -0.68832765]],

           [[-0.64437973,  0.2080917 , -0.46801586],
            [ 0.00796208,  0.12040568, -0.10798735],
            [-1.6477843 ,  0.99704115,  0.21589817],
            [-1.19741369,  0.83479683,  0.95349066],
            [ 0.45912504,  0.6264498 , -0.29536472],
            [ 0.64437973,  0.7919083 ,  0.46801586],
            [-0.00796208,  0.87959432,  0.10798735],
            [ 1.6477843 ,  0.00295885, -0.21589817],
            [ 1.19741369,  0.16520317, -0.95349066],
            [-0.45912504,  0.3735502 ,  0.29536472]],

           [[ 0.79534986,  0.59656342,  0.03577992],
            [ 0.94867479,  0.35276091, -0.77071189],
            [-1.04456302,  0.88297895,  0.44668319],
            [ 0.15248012,  0.43261985,  0.51955543],
            [-0.35858978,  0.33330463, -0.98648324],
            [-0.79534986,  0.40343658, -0.03577992],
            [-0.94867479,  0.64723909,  0.77071189],
            [ 1.04456302,  0.11702105, -0.44668319],
            [-0.15248012,  0.56738015, -0.51955543],
            [ 0.35858978,  0.66669537,  0.98648324]]])


.. GENERATED FROM PYTHON SOURCE LINES 244-252

`setRandomNumberGenerators`: defines user-defined random numbers
generators. It takes as argumentsa dictionary of generators. The
keys of the dictionary characterize the name of the generators, and
must be different from the pre-defined generators in Biogeme:
NORMAL, UNIFORM and UNIFORMSYM. The elements of the dictionary are
functions that take two arguments: the number of series to generate
(typically, the size of the database), and the number of draws per
series.

.. GENERATED FROM PYTHON SOURCE LINES 254-256

We first define functions returning draws, given the number of
observations, and the number of draws

.. GENERATED FROM PYTHON SOURCE LINES 259-260

A lognormal distribution.

.. GENERATED FROM PYTHON SOURCE LINES 260-264

.. code-block:: Python

    def log_normal_draws(sample_size: int, number_of_draws: int) -> np.ndarray:
        return np.exp(np.random.randn(sample_size, number_of_draws))


.. GENERATED FROM PYTHON SOURCE LINES 265-266

An exponential distribution.

.. GENERATED FROM PYTHON SOURCE LINES 266-270

.. code-block:: Python

    def exponential_draws(sample_size: int, number_of_draws: int) -> np.ndarray:
        return -1.0 * np.log(np.random.rand(sample_size, number_of_draws))


.. GENERATED FROM PYTHON SOURCE LINES 271-273

We associate these functions with a name in a dictionary.
%%

.. GENERATED FROM PYTHON SOURCE LINES 273-283

.. code-block:: Python

    rnd_dict = {
        'LOGNORMAL': RandomNumberGeneratorTuple(
            generator=log_normal_draws, description='Draws from lognormal distribution'
        ),
        'EXP': RandomNumberGeneratorTuple(
            generator=exponential_draws, description='Draws from exponential distributions'
        ),
    }
    my_data.set_random_number_generators(rnd_dict)


.. GENERATED FROM PYTHON SOURCE LINES 284-285

We can now generate draws from these distributions.

.. GENERATED FROM PYTHON SOURCE LINES 285-295

.. code-block:: Python

    random_draws1 = bioDraws('random_draws1', 'LOGNORMAL')
    random_draws2 = bioDraws('random_draws2', 'EXP')
    x = random_draws1 + random_draws2
    the_draws = x.dict_of_elementary_expression(TypeOfElementaryExpression.DRAWS)
    the_types = {name: expression.drawType for name, expression in the_draws.items()}
    the_draws_table = my_data.generate_draws(
        draw_types=the_types, names=['random_draws1', 'random_draws2'], number_of_draws=10
    )
    print(the_draws_table)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [[[2.15336577 0.35541854]
      [0.92036707 0.38330687]
      [1.35125462 2.83842826]
      [0.27817501 0.46249413]
      [0.5007549  0.6961861 ]
      [1.11902088 1.05840875]
      [0.6539865  0.15909907]
      [0.11955894 0.38886736]
      [0.60108954 0.40525196]
      [3.93153651 0.35868107]]

     [[4.60723253 0.18021421]
      [1.27062239 2.2373742 ]
      [2.73460167 1.17203962]
      [5.61600938 1.8920716 ]
      [2.54756523 0.07930524]
      [0.77284243 2.56028383]
      [5.16153268 0.59225528]
      [0.58972275 0.67940422]
      [0.88324351 0.63497716]
      [3.67625403 3.030641  ]]

     [[2.24536739 0.70518133]
      [0.46930501 0.67990918]
      [4.86579395 0.4097506 ]
      [2.14129298 0.8086017 ]
      [0.20614091 0.06963184]
      [0.2096891  0.02382351]
      [1.70933977 0.78170648]
      [0.63660909 1.83653019]
      [1.14977308 0.75890102]
      [0.26832114 4.20117546]]]


.. GENERATED FROM PYTHON SOURCE LINES 296-298

`sampleWithReplacement`: extracts a random sample from the database,
with replacement. Useful for bootstrapping.

.. GENERATED FROM PYTHON SOURCE LINES 300-302

.. code-block:: Python

    my_data.sample_with_replacement()


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Person</th>
          <th>Exclude</th>
          <th>Variable1</th>
          <th>Variable2</th>
          <th>Choice</th>
          <th>Av1</th>
          <th>Av2</th>
          <th>Av3</th>
          <th>NewVariable</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>3</th>
          <td>2</td>
          <td>0</td>
          <td>4</td>
          <td>0.4</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>0.305351</td>
        </tr>
        <tr>
          <th>3</th>
          <td>2</td>
          <td>0</td>
          <td>4</td>
          <td>0.4</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>0.305351</td>
        </tr>
        <tr>
          <th>1</th>
          <td>1</td>
          <td>0</td>
          <td>2</td>
          <td>0.2</td>
          <td>2</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>0.552585</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 303-305

.. code-block:: Python

    my_data.sample_with_replacement(6)


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Person</th>
          <th>Exclude</th>
          <th>Variable1</th>
          <th>Variable2</th>
          <th>Choice</th>
          <th>Av1</th>
          <th>Av2</th>
          <th>Av3</th>
          <th>NewVariable</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>3</th>
          <td>2</td>
          <td>0</td>
          <td>4</td>
          <td>0.4</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>0.305351</td>
        </tr>
        <tr>
          <th>0</th>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>0.1</td>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>0</td>
          <td>1.051271</td>
        </tr>
        <tr>
          <th>0</th>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>0.1</td>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>0</td>
          <td>1.051271</td>
        </tr>
        <tr>
          <th>0</th>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>0.1</td>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>0</td>
          <td>1.051271</td>
        </tr>
        <tr>
          <th>0</th>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>0.1</td>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>0</td>
          <td>1.051271</td>
        </tr>
        <tr>
          <th>1</th>
          <td>1</td>
          <td>0</td>
          <td>2</td>
          <td>0.2</td>
          <td>2</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>0.552585</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 306-308

`panel`: defines the data as panel data. Takes as argument the name
of the column that identifies individuals.

.. GENERATED FROM PYTHON SOURCE LINES 308-310

.. code-block:: Python

    my_panel_data = db.Database('test', df)


.. GENERATED FROM PYTHON SOURCE LINES 311-312

Data is not considered panel yet

.. GENERATED FROM PYTHON SOURCE LINES 312-314

.. code-block:: Python

    my_panel_data.is_panel()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    False


.. GENERATED FROM PYTHON SOURCE LINES 315-317

.. code-block:: Python

    my_panel_data.panel('Person')


.. GENERATED FROM PYTHON SOURCE LINES 318-319

Now it is panel.

.. GENERATED FROM PYTHON SOURCE LINES 319-321

.. code-block:: Python

    print(my_panel_data.is_panel())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    True


.. GENERATED FROM PYTHON SOURCE LINES 322-324

.. code-block:: Python

    print(my_panel_data)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    biogeme database test:
       Person  Exclude  Variable1  Variable2  Choice  Av1  Av2  Av3  NewVariable
    0       1        0          1        0.1       1    0    1    0     1.051271
    1       1        0          2        0.2       2    1    1    1     0.552585
    2       2        0          4        0.4       1    1    1    1     0.305351
    Panel data
       0  1
    1  0  1
    2  2  2


.. GENERATED FROM PYTHON SOURCE LINES 325-327

When draws are generated for panel data, a set of draws is generated
per person, not per observation.

.. GENERATED FROM PYTHON SOURCE LINES 327-330

.. code-block:: Python

    random_draws1 = bioDraws('random_draws1', 'NORMAL')
    random_draws2 = bioDraws('random_draws2', 'UNIFORM_HALTON3')


.. GENERATED FROM PYTHON SOURCE LINES 331-332

We build an expression that involves the two random variables

.. GENERATED FROM PYTHON SOURCE LINES 332-340

.. code-block:: Python

    x = random_draws1 + random_draws2
    the_draws = x.dict_of_elementary_expression(TypeOfElementaryExpression.DRAWS)
    types = {name: expression.drawType for name, expression in the_draws.items()}
    the_draws_table = my_panel_data.generate_draws(
        types, ['random_draws1', 'random_draws2'], 10
    )
    print(the_draws_table)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [[[-1.57792232  0.7037037 ]
      [ 0.10870961  0.14814815]
      [ 0.05140378  0.48148148]
      [ 1.800922    0.81481481]
      [-1.85148982  0.25925926]
      [ 0.87938314  0.59259259]
      [ 1.353763    0.92592593]
      [-0.46741631  0.07407407]
      [-1.09546279  0.40740741]
      [-0.09265338  0.74074074]]

     [[ 1.92991243  0.18518519]
      [-0.29388122  0.51851852]
      [-0.49084943  0.85185185]
      [ 0.2439256   0.2962963 ]
      [ 0.42498657  0.62962963]
      [-2.72496968  0.96296296]
      [ 2.0755831   0.01234568]
      [ 0.44793057  0.34567901]
      [-0.13185245  0.67901235]
      [-1.04344227  0.12345679]]]


.. GENERATED FROM PYTHON SOURCE LINES 341-344

`getNumberOfObservations`: reports the number of observations in the
database. Note that it returns the same value, irrespectively if the
database contains panel data or not.

.. GENERATED FROM PYTHON SOURCE LINES 344-346

.. code-block:: Python

    my_data.get_number_of_observations()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    3


.. GENERATED FROM PYTHON SOURCE LINES 347-349

.. code-block:: Python

    my_panel_data.get_number_of_observations()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    3


.. GENERATED FROM PYTHON SOURCE LINES 350-353

`getSampleSize`: reports the size of the sample. If the data is
cross-sectional, it is the number of observations in the
database. If the data is panel, it is the number of individuals.

.. GENERATED FROM PYTHON SOURCE LINES 353-355

.. code-block:: Python

    my_data.get_sample_size()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    3


.. GENERATED FROM PYTHON SOURCE LINES 356-358

.. code-block:: Python

    my_panel_data.get_sample_size()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    2


.. GENERATED FROM PYTHON SOURCE LINES 359-362

`sampleIndividualMapWithReplacement`: extracts a random sample of
the individual map from a panel data database, with
replacement. Useful for bootstrapping.

.. GENERATED FROM PYTHON SOURCE LINES 362-363

.. code-block:: Python

    my_panel_data.sample_individual_map_with_replacement(10)


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>0</th>
          <th>1</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>2</th>
          <td>2</td>
          <td>2</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0</td>
          <td>1</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0</td>
          <td>1</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0</td>
          <td>1</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0</td>
          <td>1</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0</td>
          <td>1</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0</td>
          <td>1</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0</td>
          <td>1</td>
        </tr>
        <tr>
          <th>2</th>
          <td>2</td>
          <td>2</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0</td>
          <td>1</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.036 seconds)


.. _sphx_glr_download_auto_examples_programmers_plot_database.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_database.ipynb <plot_database.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_database.py <plot_database.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_database.zip <plot_database.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_