biogeme.tools.database module

biogeme.tools.database.countNumberOfGroups(df, column)[source]

Warning

This function is deprecated. Use count_number_of_groups() instead.

Return type:

int

Parameters:
  • df (DataFrame)

  • column (str)

biogeme.tools.database.count_number_of_groups(df, column)[source]

This function counts the number of groups of same value in a column. For instance: 1,2,2,3,3,3,4,1,1 would give 5.

Example:

>>>df = pd.DataFrame({'ID': [1, 1, 2, 3, 3, 1, 2, 3],
                      'value':[1000,
                               2000,
                               3000,
                               4000,
                               5000,
                               5000,
                               10000,
                               20000]})
>>>count_number_of_groups(df,'ID')
6

>>>count_number_of_groups(df,'value')
7
Return type:

int

Parameters:
  • df (DataFrame)

  • column (str)

biogeme.tools.database.flatten_database(df, merge_id, row_name=None, identical_columns=None)[source]

Combine several rows of a Pandas database into one. For instance, consider the following database:

   ID  Age  Cost   Name
0   1   23    34  Item3
1   1   23    45  Item4
2   1   23    12  Item7
3   2   45    65  Item3
4   2   45    34  Item7

If row_name is ‘Name’, the function generates the same data in the following format:

    Age  Item3_Cost  Item4_Cost  Item7_Cost
ID
1    23          34        45.0          12
2    45          65         NaN          34

If row_name is None, the function generates the same data in the following format:

    Age  1_Cost 1_Name  2_Cost 2_Name  3_Cost 3_Name
ID
1    23      34  Item3      45  Item4    12.0  Item7
2    45      65  Item3      34  Item7     NaN    NaN
Parameters:
  • df (pandas.DataFrame) – initial data frame

  • merge_id (str) – name of the column that identifies rows that should be merged. In the above example: ‘ID’

  • row_name (str) – name of the columns that provides the name of the rows in the new dataframe. In the example above: ‘Name’. If None, the rows are numbered sequentially.

  • identical_columns (list(str)) – name of the columns that contain identical values across the rows of a group. In the example above: [‘Age’]. If None, these columns are automatically detected. On large database, there may be a performance issue.

Returns:

reformatted database

Return type:

pandas.DataFrame