biogeme.validation.prepare_validation module

Split data into validation and estimation samples

class biogeme.validation.prepare_validation.EstimationValidationIndices(estimation, validation)[source]

Bases: NamedTuple

Parameters:
  • estimation (Index)

  • validation (Index)

estimation: Index

Alias for field number 0

validation: Index

Alias for field number 1

biogeme.validation.prepare_validation.split(dataframe, slices, groups=None)[source]

Splits a DataFrame into multiple training and validation index sets for cross-validation.

This function returns a list of EstimationValidationIndices named tuples, each containing the indices for an estimation (training) set and a validation set. If a grouping column is specified, the split ensures that all entries with the same group ID remain in the same fold.

Parameters:
  • dataframe (DataFrame) – The full dataset to split.

  • slices (int) – The number of folds/slices. Must be >= 2.

  • groups (str | None) – Optional name of the column containing group identifiers. If provided, all rows with the same group ID are kept in the same fold.

Return type:

list[EstimationValidationIndices]

Returns:

A list of EstimationValidationIndices tuples containing index sets, one per fold.

Raises:

ValueError – If slices is less than 2.