biogeme.tools.database module
- biogeme.tools.database.countNumberOfGroups(df, column)[source]
Warning
This function is deprecated. Use
count_number_of_groups()
instead.- Return type:
int
- Parameters:
df (DataFrame)
column (str)
- biogeme.tools.database.count_number_of_groups(df, column)[source]
This function counts the number of groups of same value in a column. For instance: 1,2,2,3,3,3,4,1,1 would give 5.
Example:
>>>df = pd.DataFrame({'ID': [1, 1, 2, 3, 3, 1, 2, 3], 'value':[1000, 2000, 3000, 4000, 5000, 5000, 10000, 20000]}) >>>count_number_of_groups(df,'ID') 6 >>>count_number_of_groups(df,'value') 7
- Return type:
int
- Parameters:
df (DataFrame)
column (str)
- biogeme.tools.database.flatten_database(df, merge_id, row_name=None, identical_columns=None)[source]
Combine several rows of a Pandas database into one. For instance, consider the following database:
ID Age Cost Name 0 1 23 34 Item3 1 1 23 45 Item4 2 1 23 12 Item7 3 2 45 65 Item3 4 2 45 34 Item7
If row_name is ‘Name’, the function generates the same data in the following format:
Age Item3_Cost Item4_Cost Item7_Cost ID 1 23 34 45.0 12 2 45 65 NaN 34
If row_name is None, the function generates the same data in the following format:
Age 1_Cost 1_Name 2_Cost 2_Name 3_Cost 3_Name ID 1 23 34 Item3 45 Item4 12.0 Item7 2 45 65 Item3 34 Item7 NaN NaN
- Parameters:
df (pandas.DataFrame) – initial data frame
merge_id (str) – name of the column that identifies rows that should be merged. In the above example: ‘ID’
row_name (str) – name of the columns that provides the name of the rows in the new dataframe. In the example above: ‘Name’. If None, the rows are numbered sequentially.
identical_columns (list(str)) – name of the columns that contain identical values across the rows of a group. In the example above: [‘Age’]. If None, these columns are automatically detected. On large database, there may be a performance issue.
- Returns:
reformatted database
- Return type:
pandas.DataFrame