biogeme.sampling_of_alternatives.sampling_context module

Defines a class that characterized the context to apply sampling of alternatives

author:

Michel Bierlaire

date:

Wed Sep 6 14:38:31 2023

class biogeme.sampling_of_alternatives.sampling_context.CrossVariableTuple(name, formula)[source]

Bases: NamedTuple

A cross variable is a variable that involves socio-economic attributes of the individuals, and attributes of the alternatives. It can only be calculated after the sampling has been made.

Parameters:
formula: Expression

Alias for field number 1

name: str

Alias for field number 0

class biogeme.sampling_of_alternatives.sampling_context.SamplingContext(the_partition, sample_sizes, individuals, choice_column, alternatives, id_column, biogeme_file_name, utility_function, combined_variables, mev_partition=None, mev_sample_sizes=None, cnl_nests=None)[source]

Bases: object

Class gathering the data needed to perform an estimation with samples of alternatives

Parameters:
  • the_partition (Partition) – Partition used for the sampling.

  • sample_sizes (Iterable[int]) – number of alternative to draw from each segment.

  • individuals (DataFrame) – Pandas data frame containing all the individuals as rows. One column must contain the choice of each individual.

  • choice_column (str) – name of the column containing the choice of each individual.

  • alternatives (DataFrame) – Pandas data frame containing all the alternatives as rows. One column must contain a unique ID identifying the alternatives. The other columns contain variables to include in the data file.

  • id_column (str) – name of the column containing the Ids of the alternatives.

  • utility_function (Expression) – definition of the generic utility function

  • combined_variables (list[CrossVariableTuple]) – definition of interaction variables

  • mev_partition (Optional[Partition]) – If a second choice set need to be sampled for the MEV terms, the corresponding partitition is provided here.

  • biogeme_file_name (str)

  • mev_sample_sizes (Iterable[int] | None)

  • cnl_nests (NestsForCrossNestedLogit | None)

alternatives: DataFrame
biogeme_file_name: str
check_expression(expression)[source]

Verifies if the variables contained in the expression can be found in the databases

Return type:

None

Parameters:

expression (Expression)

check_mev_partition()[source]

Check if the partition is a partition of the MEV alternatives. It does not need to cover the full choice set

Return type:

None

check_partition()[source]

Check if the partition is truly a partition. If not, an exception is raised

Raises:
  • BiogemeError – if some elements are present in more than one subset.

  • BiogemeError – if the size of the union of the subsets does not match the expected total size

  • BiogemeError – if an alternative in the partition does not appear in the database of alternatives

  • BiogemeError – if a segment is empty

  • BiogemeError – if the number of sampled alternatives in a stratum is incorrect , that is zero, or larger than the stratum size..

Return type:

None

check_valid_alternatives(set_of_ids)[source]
Check if the IDs in set are indeed valid

alternatives. Typically used to check if a nest is well defined

Parameters:

set_of_ids (set[int]) – set of identifiers to check

Raises:

BiogemeError – if at least one id is invalid.

Return type:

None

choice_column: str
cnl_nests: Optional[NestsForCrossNestedLogit] = None
combined_variables: list[CrossVariableTuple]
id_column: str
include_cnl_alphas()[source]
Return type:

None

individuals: DataFrame
mev_partition: Optional[Partition] = None
mev_sample_sizes: Optional[Iterable[int]] = None
reporting()[source]

Summarizes the configuration specified by the context object.

Return type:

str

sample_sizes: Iterable[int]
the_partition: Partition
utility_function: Expression
class biogeme.sampling_of_alternatives.sampling_context.StratumTuple(subset, sample_size)[source]

Bases: NamedTuple

A stratum is an element of a partition of the full choice set, combined with the number of alternatives that must be sampled.

Parameters:
  • subset (set[int])

  • sample_size (int)

sample_size: int

Alias for field number 1

subset: set[int]

Alias for field number 0