biogeme.validation.prepare_validation module¶

Split data into validation and estimation samples

class biogeme.validation.prepare_validation.EstimationValidationIndices(estimation, validation)[source]¶

Bases: NamedTuple

Parameters:

estimation (Index)
validation (Index)

estimation: Index¶: Alias for field number 0

validation: Index¶: Alias for field number 1

biogeme.validation.prepare_validation.split(dataframe, slices, groups=None)[source]¶

Splits a DataFrame into multiple training and validation index sets for cross-validation.

This function returns a list of EstimationValidationIndices named tuples, each containing the indices for an estimation (training) set and a validation set. If a grouping column is specified, the split ensures that all entries with the same group ID remain in the same fold.

Parameters:

dataframe (DataFrame) – The full dataset to split.
slices (int) – The number of folds/slices. Must be >= 2.
groups (str | None) – Optional name of the column containing group identifiers. If provided, all rows with the same group ID are kept in the same fold.

Return type:

list[EstimationValidationIndices]

Returns:

A list of EstimationValidationIndices tuples containing index sets, one per fold.

Raises:

ValueError – If slices is less than 2.