Sampling

Module in charge of functionalities related to the sampling of alternatives. The code is organized into sub-modules.

biogeme.sampling_of_alternatives.choice_set_generation module

Module in charge of functionalities related to the choice set generation

For thew main sample, all alternatives except the last one must be used: 0 to J-1. For MEV models, the approximation of the sum capturing the nests requires another sample not based on the choice.

author:: Michel Bierlaire
date:: Fri Oct 27 12:50:06 2023

class biogeme.sampling_of_alternatives.choice_set_generation.ChoiceSetsGeneration(context)[source]

Bases: object

Class in charge of generationg the choice sets for each individual.

Parameters:: context (SamplingContext) –

__init__(context)[source]

Constructor

Parameters:: context (SamplingContext) – contains all the information that is needed to perform the sampling of alternatives.

define_new_variables(database)[source]

Create the new variables

Parameters:: database (Database) – database, in Biogeme format.

get_attributes_from_expression(expression)[source]

Extract the names of the attributes of alternatives from an expression

Return type:: set[str]
Parameters:: expression (Expression) –

process_row(individual_row)[source]

Process one row of the individual database

Parameters:: individual_row (Series) – row corresponding to one individual
Return type:: dict
Returns:: a dictionnary containing the data for the extended row

sample_and_merge(recycle=False)[source]

Loops on the individuals and generate a choice set for each of them

Parameters:: recycle (bool) – if True, if the data file already exisits, it is not re-created.
Return type:: Database
Returns:: database for Biogeme

biogeme.sampling_of_alternatives.generate_model module

Generation of models estimated with samples of alternatives

author:: Michel Bierlaire
date:: Fri Sep 22 12:14:59 2023

class biogeme.sampling_of_alternatives.generate_model.GenerateModel(context)[source]

Bases: object

Class in charge of generating the biogeme expression for the loglikelihood function

Parameters:: context (SamplingContext) –

__init__(context)[source]

Constructor

Parameters:: context (SamplingContext) – contains all the information that is needed to perform the sampling of alternatives.

generate_utility(prefix, suffix)[source]

Generate the utility function for one alternative

Parameters:

prefix (str) – prefix to add to the attributes
suffix (str) – suffix to add to the attributes

Return type:

Expression

get_cross_nested_logit()[source]

Returns the expression for the log likelihood of the nested logit model

Return type:: Expression

get_logit()[source]

Returns the expression for the log likelihood of the logit model

Return type:: Expression

get_nested_logit(nests)[source]

Returns the expression for the log likelihood of the nested logit model

Parameters:: nests (NestsForNestedLogit) – A tuple containing as many items as nests. Each item is also a tuple containing two items:
Return type:: Expression

an object of type biogeme.expressions.expr.Expression representing the nest parameter,
a list containing the list of identifiers of the alternatives belonging to the nest.

Example:

nesta = MUA ,[1, 2, 3]
nestb = MUB ,[4, 5, 6]
nests = nesta, nestb

biogeme.sampling_of_alternatives.sampling_context module

Defines a class that characterized the context to apply sampling of alternatives

author:: Michel Bierlaire
date:: Wed Sep 6 14:38:31 2023

class biogeme.sampling_of_alternatives.sampling_context.CrossVariableTuple(name, formula)[source]

Bases: NamedTuple

A cross variable is a variable that involves socio-economic attributes of the individuals, and attributes of the alternatives. It can only be calculated after the sampling has been made.

Parameters:

name (str) –
formula (Expression) –

formula: Expression: Alias for field number 1

name: str: Alias for field number 0

class biogeme.sampling_of_alternatives.sampling_context.SamplingContext(the_partition, sample_sizes, individuals, choice_column, alternatives, id_column, biogeme_file_name, utility_function, combined_variables, mev_partition=None, mev_sample_sizes=None, cnl_nests=None)[source]

Bases: object

Class gathering the data needed to perform an estimation with samples of alternatives

Parameters:

partition – Partition used for the sampling.
sample_sizes (Iterable[int]) – number of alternative to draw from each segment.
individuals (DataFrame) – Pandas data frame containing all the individuals as rows. One column must contain the choice of each individual.
choice_column (str) – name of the column containing the choice of each individual.
alternatives (DataFrame) – Pandas data frame containing all the alternatives as rows. One column must contain a unique ID identifying the alternatives. The other columns contain variables to include in the data file.
id_column (str) – name of the column containing the Ids of the alternatives.
utility_function (Expression) – definition of the generic utility function
combined_variables (list[CrossVariableTuple]) – definition of interaction variables
mev_partition (Optional[Partition]) – If a second choice set need to be sampled for the MEV terms, the corresponding partitition is provided here.
the_partition (Partition) –
biogeme_file_name (str) –
mev_sample_sizes (Iterable[int] | None) –
cnl_nests (NestsForCrossNestedLogit | None) –

__eq__(other): Return self==value.

__init__(the_partition, sample_sizes, individuals, choice_column, alternatives, id_column, biogeme_file_name, utility_function, combined_variables, mev_partition=None, mev_sample_sizes=None, cnl_nests=None)

Parameters:

the_partition (Partition) –
sample_sizes (Iterable[int]) –
individuals (DataFrame) –
choice_column (str) –
alternatives (DataFrame) –
id_column (str) –
biogeme_file_name (str) –
utility_function (Expression) –
combined_variables (list[CrossVariableTuple]) –
mev_partition (Partition | None) –
mev_sample_sizes (Iterable[int] | None) –
cnl_nests (NestsForCrossNestedLogit | None) –

Return type:

None

alternatives: DataFrame

biogeme_file_name: str

check_expression(expression)[source]

Verifies if the variables contained in the expression can be found in the databases

Return type:: None
Parameters:: expression (Expression) –

check_mev_partition()[source]

Check if the partition is a partition of the MEV alternatives. It does not need to cover the full choice set

Return type:: None

check_partition()[source]

Check if the partition is truly a partition. If not, an exception is raised

Raises:

BiogemeError – if some elements are present in more than one subset.
BiogemeError – if the size of the union of the subsets does not match the expected total size
BiogemeError – if an alternative in the partition does not appear in the database of alternatives
BiogemeError – if a segment is empty
BiogemeError – if the number of sampled alternatives in a stratum is incorrect , that is zero, or larger than the stratum size..

Return type:

None

check_valid_alternatives(set_of_ids)[source]

Check if the IDs in set are indeed valid: alternatives. Typically used to check if a nest is well defined

Parameters:: set_of_ids (set[int]) – set of identifiers to check
Raises:: BiogemeError – if at least one id is invalid.
Return type:: None

choice_column: str

cnl_nests: Optional[NestsForCrossNestedLogit] = None

combined_variables: list[CrossVariableTuple]

id_column: str

individuals: DataFrame

mev_partition: Optional[Partition] = None

mev_sample_sizes: Optional[Iterable[int]] = None

reporting()[source]

Summarizes the configuration specificed by the contect object.

Return type:: None

sample_sizes: Iterable[int]

the_partition: Partition

utility_function: Expression

class biogeme.sampling_of_alternatives.sampling_context.StratumTuple(subset, sample_size)[source]

Bases: NamedTuple

A stratum is an element of a partition of the full choice set, combined with the number of alternatives that must be sampled.

Parameters:

subset (set[int]) –
sample_size (int) –

sample_size: int: Alias for field number 1

subset: set[int]: Alias for field number 0

biogeme.sampling_of_alternatives.sampling_of_alternatives module

Module in charge of functionalities related to the sampling of alternatives

author:: Michel Bierlaire
date:: Thu Sep 7 10:14:54 2023

class biogeme.sampling_of_alternatives.sampling_of_alternatives.SamplingOfAlternatives(context)[source]

Bases: object

Class dealing with the various methods needed to estimate models with samples of alternatives

Parameters:: context (SamplingContext) –

__init__(context)[source]

Constructor

Parameters:: context (SamplingContext) – contains all the information that is needed to perform the sampling of alternatives.

sample_alternatives(chosen)[source]

Performing the sampling of alternatives

Parameters:: chosen (int) – ID of the chosen alternative, that must be included in the choice set.
Return type:: DataFrame
Returns:: data frame containing a sample of alternatives. The first one is the chosen alternative
Raises:: BiogemeError – if the chosen alternative is unknown.

sample_mev_alternatives()[source]

Performing the sampling of alternatives for the MEV terms. Here, the chosen alternative is ignored.

Return type:: DataFrame
Returns:: data frame containing a sample of alternatives

biogeme.sampling_of_alternatives.sampling_of_alternatives.generate_segment_size(sample_size, number_of_segments)[source]

This function calculates the size of each segment, so that they are as close to each other as possible, and cover the full sample

Parameters:

sample_size (int) – total size of the sample
number_of_segments (int) – number of segments

Returns:

list of length number_of_segments, containing the segment sizes

Return type:

list[int]