biogeme.database.sampling module¶

This module provides utility functions for performing sampling operations on pandas DataFrames, including standard bootstrapping and panel-based sampling.

Michel Bierlaire Wed Mar 26 19:39:21 2025

biogeme.database.sampling.sample_panel_with_replacement(df, individual_map, size=None)[source]¶

Draws a sample of individuals with replacement from a panel dataset.

Parameters:

df (DataFrame) – The input DataFrame representing the full dataset.
individual_map (DataFrame) – A DataFrame mapping each individual ID to (start, end) row indices.
size (int | None) – The number of individuals to sample. Defaults to the number of individuals in the map.

Return type:

DataFrame

Returns:

A new DataFrame with the sampled individuals’ rows, with reset index.

Raises:

BiogemeError – if the individual_map is missing or empty.

biogeme.database.sampling.sample_with_replacement(df, size=None)[source]¶

Return type:

DataFrame

Parameters:

df (DataFrame)
size (int | None)

biogeme.database.sampling.split_validation_sets(df, slices, group_column=None)[source]¶

Splits a DataFrame into multiple (estimation, validation) pairs for cross-validation.

Parameters:

df (DataFrame) – The input DataFrame to split.
slices (int) – The number of folds (must be >= 2).
group_column (str | None) – Optional column name used to group rows (e.g., individual ID). If provided, groups are kept together in folds.

Return type:

list[tuple[DataFrame, DataFrame]]

Returns:

A list of (estimation, validation) DataFrame tuples.

Raises:

BiogemeError – if the number of slices is less than 2 or group column is not found.