biogeme.database.sampling module¶
This module provides utility functions for performing sampling operations on pandas DataFrames, including standard bootstrapping and panel-based sampling.
Michel Bierlaire Wed Mar 26 19:39:21 2025
- biogeme.database.sampling.sample_panel_with_replacement(df, individual_map, size=None)[source]¶
Draws a sample of individuals with replacement from a panel dataset.
- Parameters:
df (
DataFrame
) – The input DataFrame representing the full dataset.individual_map (
DataFrame
) – A DataFrame mapping each individual ID to (start, end) row indices.size (
int
|None
) – The number of individuals to sample. Defaults to the number of individuals in the map.
- Return type:
DataFrame
- Returns:
A new DataFrame with the sampled individuals’ rows, with reset index.
- Raises:
BiogemeError – if the individual_map is missing or empty.
- biogeme.database.sampling.sample_with_replacement(df, size=None)[source]¶
- Return type:
DataFrame
- Parameters:
df (DataFrame)
size (int | None)
- biogeme.database.sampling.split_validation_sets(df, slices, group_column=None)[source]¶
Splits a DataFrame into multiple (estimation, validation) pairs for cross-validation.
- Parameters:
df (
DataFrame
) – The input DataFrame to split.slices (
int
) – The number of folds (must be >= 2).group_column (
str
|None
) – Optional column name used to group rows (e.g., individual ID). If provided, groups are kept together in folds.
- Return type:
list
[tuple
[DataFrame
,DataFrame
]]- Returns:
A list of (estimation, validation) DataFrame tuples.
- Raises:
BiogemeError – if the number of slices is less than 2 or group column is not found.