.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/programmers/plot_database.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_programmers_plot_database.py: biogeme.database ================ Examples of use of several functions. This is designed for programmers who need examples of use of the functions of the module. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model. Michel Bierlaire Sun Jun 29 2025, 02:30:19 .. GENERATED FROM PYTHON SOURCE LINES 15-31 .. code-block:: Python import numpy as np import pandas as pd from IPython.core.display_functions import display from biogeme.database import ( Database, PanelDatabase, check_availability_of_chosen_alt, choice_availability_statistics, ) from biogeme.exceptions import BiogemeError from biogeme.expressions import Variable, exp from biogeme.segmentation import DiscreteSegmentationTuple, verify_segmentation from biogeme.version import get_text .. GENERATED FROM PYTHON SOURCE LINES 32-33 Version of Biogeme. .. GENERATED FROM PYTHON SOURCE LINES 33-35 .. code-block:: Python print(get_text()) .. rst-class:: sphx-glr-script-out .. code-block:: none biogeme 3.3.1 [2025-09-03] Home page: http://biogeme.epfl.ch Submit questions to https://groups.google.com/d/forum/biogeme Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL) .. GENERATED FROM PYTHON SOURCE LINES 36-37 We set the seed so that the outcome of random operations is always the same. .. GENERATED FROM PYTHON SOURCE LINES 37-39 .. code-block:: Python np.random.seed(90267) .. GENERATED FROM PYTHON SOURCE LINES 40-41 Create a database from a pandas data frame. .. GENERATED FROM PYTHON SOURCE LINES 41-57 .. code-block:: Python df = pd.DataFrame( { 'Person': [1, 1, 1, 2, 2], 'Exclude': [0, 0, 1, 0, 1], 'Variable1': [1, 2, 3, 4, 5], 'Variable2': [10, 20, 30, 40, 50], 'Choice': [1, 2, 3, 1, 2], 'Av1': [0, 1, 1, 1, 1], 'Av2': [1, 1, 1, 1, 1], 'Av3': [0, 1, 1, 1, 1], } ) my_data = Database('test', df) print(my_data) .. rst-class:: sphx-glr-script-out .. code-block:: none biogeme database test .. GENERATED FROM PYTHON SOURCE LINES 58-62 `check_segmentation`: checks that the segmentation covers the complete database. A segmentation is a partition of the dataset based on the value of one of the variables. For instance, we can segment on the Choice variable. .. GENERATED FROM PYTHON SOURCE LINES 64-69 .. code-block:: Python correct_mapping = {1: 'Alt. 1', 2: 'Alt. 2', 3: 'Alt. 3'} correct_segmentation = DiscreteSegmentationTuple( variable='Choice', mapping=correct_mapping ) .. GENERATED FROM PYTHON SOURCE LINES 70-72 If the segmentation is well-defined, the function returns the size of each segment in the database. .. GENERATED FROM PYTHON SOURCE LINES 74-76 .. code-block:: Python verify_segmentation(dataframe=my_data.dataframe, segmentation=correct_segmentation) .. GENERATED FROM PYTHON SOURCE LINES 77-82 .. code-block:: Python incorrect_mapping = {1: 'Alt. 1', 2: 'Alt. 2'} incorrect_segmentation = DiscreteSegmentationTuple( variable='Choice', mapping=incorrect_mapping ) .. GENERATED FROM PYTHON SOURCE LINES 83-84 If the segmentation is incorrect, an exception is raised. .. GENERATED FROM PYTHON SOURCE LINES 86-93 .. code-block:: Python try: verify_segmentation( dataframe=my_data.dataframe, segmentation=incorrect_segmentation ) except BiogemeError as e: print(e) .. rst-class:: sphx-glr-script-out .. code-block:: none The following entries are missing in the segmentation: {np.float64(3.0)}. .. GENERATED FROM PYTHON SOURCE LINES 94-99 .. code-block:: Python another_incorrect_mapping = {1: 'Alt. 1', 2: 'Alt. 2', 4: 'Does not exist'} another_incorrect_segmentation = DiscreteSegmentationTuple( variable='Choice', mapping=another_incorrect_mapping ) .. GENERATED FROM PYTHON SOURCE LINES 100-107 .. code-block:: Python try: verify_segmentation( dataframe=my_data.dataframe, segmentation=another_incorrect_segmentation ) except BiogemeError as e: print(e) .. rst-class:: sphx-glr-script-out .. code-block:: none The following entries are missing in the segmentation: {np.float64(3.0)}. Segmentation entries do not exist in the data: {4}. .. GENERATED FROM PYTHON SOURCE LINES 108-111 `checkAvailabilityOfChosenAlt`: check if the chosen alternative is available for each entry in the database. %% .. GENERATED FROM PYTHON SOURCE LINES 111-119 .. code-block:: Python Av1 = Variable('Av1') Av2 = Variable('Av2') Av3 = Variable('Av3') Choice = Variable('Choice') avail = {1: Av1, 2: Av2, 3: Av3} result = check_availability_of_chosen_alt(database=my_data, avail=avail, choice=Choice) print(result) .. rst-class:: sphx-glr-script-out .. code-block:: none [False True True True True] .. GENERATED FROM PYTHON SOURCE LINES 120-122 `choiceAvailabilityStatistics`: calculates the number of time an alternative is chosen and available. .. GENERATED FROM PYTHON SOURCE LINES 122-131 .. code-block:: Python statistics = choice_availability_statistics( database=my_data, avail=avail, choice=Choice ) for alternative, choice_available in statistics.items(): print( f'Alternative {alternative} is chosen {choice_available.chosen} times ' f'and available {choice_available.available} times' ) .. rst-class:: sphx-glr-script-out .. code-block:: none Alternative 1.0 is chosen 2 times and available 4.0 times Alternative 2.0 is chosen 2 times and available 5.0 times Alternative 3.0 is chosen 1 times and available 4.0 times .. GENERATED FROM PYTHON SOURCE LINES 132-134 Suggest a scaling of the variables in the database %% .. GENERATED FROM PYTHON SOURCE LINES 134-136 .. code-block:: Python display(my_data.dataframe.columns) .. rst-class:: sphx-glr-script-out .. code-block:: none Index(['Person', 'Exclude', 'Variable1', 'Variable2', 'Choice', 'Av1', 'Av2', 'Av3'], dtype='object') .. GENERATED FROM PYTHON SOURCE LINES 137-140 .. code-block:: Python suggested_scaling = my_data.suggest_scaling() display(suggested_scaling) .. rst-class:: sphx-glr-script-out .. code-block:: none Column Scale Largest 3 Variable2 0.01 50.0 .. GENERATED FROM PYTHON SOURCE LINES 141-142 It is possible to obtain the scaling for selected variables .. GENERATED FROM PYTHON SOURCE LINES 142-145 .. code-block:: Python suggested_scaling = my_data.suggest_scaling(columns=['Variable1', 'Variable2']) display(suggested_scaling) .. rst-class:: sphx-glr-script-out .. code-block:: none Column Scale Largest 1 Variable2 0.01 50.0 .. GENERATED FROM PYTHON SOURCE LINES 146-149 `scale_column`: divide an entire column by a scale value %% Before. .. GENERATED FROM PYTHON SOURCE LINES 149-151 .. code-block:: Python display(my_data.dataframe) .. rst-class:: sphx-glr-script-out .. code-block:: none Person Exclude Variable1 Variable2 Choice Av1 Av2 Av3 0 1.0 0.0 1.0 10.0 1.0 0.0 1.0 0.0 1 1.0 0.0 2.0 20.0 2.0 1.0 1.0 1.0 2 1.0 1.0 3.0 30.0 3.0 1.0 1.0 1.0 3 2.0 0.0 4.0 40.0 1.0 1.0 1.0 1.0 4 2.0 1.0 5.0 50.0 2.0 1.0 1.0 1.0 .. GENERATED FROM PYTHON SOURCE LINES 152-154 .. code-block:: Python my_data.scale_column('Variable2', 0.01) .. GENERATED FROM PYTHON SOURCE LINES 155-156 After. .. GENERATED FROM PYTHON SOURCE LINES 156-158 .. code-block:: Python display(my_data.dataframe) .. rst-class:: sphx-glr-script-out .. code-block:: none Person Exclude Variable1 Variable2 Choice Av1 Av2 Av3 0 1.0 0.0 1.0 0.1 1.0 0.0 1.0 0.0 1 1.0 0.0 2.0 0.2 2.0 1.0 1.0 1.0 2 1.0 1.0 3.0 0.3 3.0 1.0 1.0 1.0 3 2.0 0.0 4.0 0.4 1.0 1.0 1.0 1.0 4 2.0 1.0 5.0 0.5 2.0 1.0 1.0 1.0 .. GENERATED FROM PYTHON SOURCE LINES 159-161 `define_variable`: add a new column in the database, calculated from an expression. %% .. GENERATED FROM PYTHON SOURCE LINES 161-167 .. code-block:: Python Variable1 = Variable('Variable1') Variable2 = Variable('Variable2') expression = exp(0.5 * Variable2) / Variable1 result = my_data.define_variable(name='NewVariable', expression=expression) print(my_data.dataframe['NewVariable'].tolist()) .. rst-class:: sphx-glr-script-out .. code-block:: none [1.051271096376024, 0.5525854590378239, 0.38727808090942767, 0.30535068954004246, 0.2568050833375483] .. GENERATED FROM PYTHON SOURCE LINES 168-171 .. code-block:: Python display(my_data.dataframe) .. rst-class:: sphx-glr-script-out .. code-block:: none Person Exclude Variable1 Variable2 Choice Av1 Av2 Av3 NewVariable 0 1.0 0.0 1.0 0.1 1.0 0.0 1.0 0.0 1.051271 1 1.0 0.0 2.0 0.2 2.0 1.0 1.0 1.0 0.552585 2 1.0 1.0 3.0 0.3 3.0 1.0 1.0 1.0 0.387278 3 2.0 0.0 4.0 0.4 1.0 1.0 1.0 1.0 0.305351 4 2.0 1.0 5.0 0.5 2.0 1.0 1.0 1.0 0.256805 .. GENERATED FROM PYTHON SOURCE LINES 172-175 `remove`: removes from the database all entries such that the value of the expression is not 0. %% .. GENERATED FROM PYTHON SOURCE LINES 175-180 .. code-block:: Python exclude = Variable('Exclude') my_data.remove(exclude) display(my_data.dataframe) .. rst-class:: sphx-glr-script-out .. code-block:: none Person Exclude Variable1 Variable2 Choice Av1 Av2 Av3 NewVariable 0 1.0 0.0 1.0 0.1 1.0 0.0 1.0 0.0 1.051271 1 1.0 0.0 2.0 0.2 2.0 1.0 1.0 1.0 0.552585 2 2.0 0.0 4.0 0.4 1.0 1.0 1.0 1.0 0.305351 .. GENERATED FROM PYTHON SOURCE LINES 181-183 `sample_with_replacement`: extracts a random sample from the database, with replacement. Useful for bootstrapping. .. GENERATED FROM PYTHON SOURCE LINES 185-186 One bootstrap sample .. GENERATED FROM PYTHON SOURCE LINES 186-189 .. code-block:: Python bootstrap_sample = my_data.bootstrap_sample() display(bootstrap_sample.dataframe) .. rst-class:: sphx-glr-script-out .. code-block:: none Person Exclude Variable1 Variable2 Choice Av1 Av2 Av3 NewVariable 0 1.0 0.0 1.0 0.1 1.0 0.0 1.0 0.0 1.051271 1 1.0 0.0 1.0 0.1 1.0 0.0 1.0 0.0 1.051271 2 1.0 0.0 1.0 0.1 1.0 0.0 1.0 0.0 1.051271 .. GENERATED FROM PYTHON SOURCE LINES 190-191 Another bootstrap sample .. GENERATED FROM PYTHON SOURCE LINES 191-194 .. code-block:: Python bootstrap_sample = my_data.bootstrap_sample() display(bootstrap_sample.dataframe) .. rst-class:: sphx-glr-script-out .. code-block:: none Person Exclude Variable1 Variable2 Choice Av1 Av2 Av3 NewVariable 0 1.0 0.0 2.0 0.2 2.0 1.0 1.0 1.0 0.552585 1 2.0 0.0 4.0 0.4 1.0 1.0 1.0 1.0 0.305351 2 1.0 0.0 2.0 0.2 2.0 1.0 1.0 1.0 0.552585 .. GENERATED FROM PYTHON SOURCE LINES 195-197 If the database is organised for panel data, where several observations are available for each individual, the database must be flattened so that each row corresponds to an individual .. GENERATED FROM PYTHON SOURCE LINES 197-203 .. code-block:: Python my_panel_data = PanelDatabase(database=my_data, panel_column='Person') flattened_dataframe, largest_group = my_panel_data.flatten_database( missing_data='999999' ) print(f'The size of the largest group of data per individual is {largest_group}') .. rst-class:: sphx-glr-script-out .. code-block:: none The size of the largest group of data per individual is 2 .. GENERATED FROM PYTHON SOURCE LINES 204-209 The name of the columns of the flat dataframe are the name of the original columns, with a suffix. For each variable column in the original DataFrame (excluding the column identifying the individuals), the output contains multiple columns named `columnname__panel__XX`, where `XX` is the zero-padded observation index (starting at 01). Additionally, for each observation index, a `relevant_XX` column indicates whether the observation is relevant (1) or padded with a missing value (0). .. GENERATED FROM PYTHON SOURCE LINES 209-213 .. code-block:: Python print('The columns of the flat dataframe are:') for col in flattened_dataframe.columns: print(f'\t{col}') display(flattened_dataframe) .. rst-class:: sphx-glr-script-out .. code-block:: none The columns of the flat dataframe are: Person Exclude Av2 relevant___panel__01 Exclude__panel__01 Variable1__panel__01 Variable2__panel__01 Choice__panel__01 Av1__panel__01 Av2__panel__01 Av3__panel__01 NewVariable__panel__01 relevant___panel__02 Exclude__panel__02 Variable1__panel__02 Variable2__panel__02 Choice__panel__02 Av1__panel__02 Av2__panel__02 Av3__panel__02 NewVariable__panel__02 Person Exclude Av2 ... Av2__panel__02 Av3__panel__02 NewVariable__panel__02 0 1.0 0.0 1.0 ... 1.0 1.0 0.552585 1 2.0 0.0 1.0 ... 999999 999999 999999 [2 rows x 21 columns] .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.204 seconds) .. _sphx_glr_download_auto_examples_programmers_plot_database.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_database.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_database.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_database.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_