biogeme.sampling_of_alternatives.sampling_context module

Defines a class that characterized the context to apply sampling of alternatives

author:: Michel Bierlaire
date:: Wed Sep 6 14:38:31 2023

class biogeme.sampling_of_alternatives.sampling_context.CrossVariableTuple(name, formula)[source]

Bases: NamedTuple

A cross variable is a variable that involves socio-economic attributes of the individuals, and attributes of the alternatives. It can only be calculated after the sampling has been made.

Parameters:

name (str)
formula (Expression)

formula: Expression: Alias for field number 1

name: str: Alias for field number 0

class biogeme.sampling_of_alternatives.sampling_context.SamplingContext(the_partition, sample_sizes, individuals, choice_column, alternatives, id_column, biogeme_file_name, utility_function, combined_variables, mev_partition=None, mev_sample_sizes=None, cnl_nests=None)[source]

Bases: object

Class gathering the data needed to perform an estimation with samples of alternatives

Parameters:

the_partition (Partition) – Partition used for the sampling.
sample_sizes (Iterable[int]) – number of alternative to draw from each segment.
individuals (DataFrame) – Pandas data frame containing all the individuals as rows. One column must contain the choice of each individual.
choice_column (str) – name of the column containing the choice of each individual.
alternatives (DataFrame) – Pandas data frame containing all the alternatives as rows. One column must contain a unique ID identifying the alternatives. The other columns contain variables to include in the data file.
id_column (str) – name of the column containing the Ids of the alternatives.
utility_function (Expression) – definition of the generic utility function
combined_variables (list[CrossVariableTuple]) – definition of interaction variables
mev_partition (Optional[Partition]) – If a second choice set need to be sampled for the MEV terms, the corresponding partitition is provided here.
biogeme_file_name (str)
mev_sample_sizes (Iterable[int] | None)
cnl_nests (NestsForCrossNestedLogit | None)

alternatives: DataFrame

biogeme_file_name: str

check_expression(expression)[source]

Verifies if the variables contained in the expression can be found in the databases

Return type:: None
Parameters:: expression (Expression)

check_mev_partition()[source]

Check if the partition is a partition of the MEV alternatives. It does not need to cover the full choice set

Return type:: None

check_partition()[source]

Check if the partition is truly a partition. If not, an exception is raised

Raises:

BiogemeError – if some elements are present in more than one subset.
BiogemeError – if the size of the union of the subsets does not match the expected total size
BiogemeError – if an alternative in the partition does not appear in the database of alternatives
BiogemeError – if a segment is empty
BiogemeError – if the number of sampled alternatives in a stratum is incorrect , that is zero, or larger than the stratum size..

Return type:

None

check_valid_alternatives(set_of_ids)[source]

Check if the IDs in set are indeed valid: alternatives. Typically used to check if a nest is well defined

Parameters:: set_of_ids (set[int]) – set of identifiers to check
Raises:: BiogemeError – if at least one id is invalid.
Return type:: None

choice_column: str

cnl_nests: Optional[NestsForCrossNestedLogit] = None

combined_variables: list[CrossVariableTuple]

id_column: str

include_cnl_alphas()[source]

Return type:: None

individuals: DataFrame

mev_partition: Optional[Partition] = None

mev_sample_sizes: Optional[Iterable[int]] = None

reporting()[source]

Summarizes the configuration specified by the context object.

Return type:: str

sample_sizes: Iterable[int]

the_partition: Partition

utility_function: Expression

class biogeme.sampling_of_alternatives.sampling_context.StratumTuple(subset, sample_size)[source]

Bases: NamedTuple

A stratum is an element of a partition of the full choice set, combined with the number of alternatives that must be sampled.

Parameters:

subset (set[int])
sample_size (int)

sample_size: int: Alias for field number 1

subset: set[int]: Alias for field number 0