Sampling
Module in charge of functionalities related to the sampling of alternatives. The code is organized into sub-modules.
biogeme.sampling_of_alternatives.choice_set_generation module
Module in charge of functionalities related to the choice set generation
For thew main sample, all alternatives except the last one must be used: 0 to J-1. For MEV models, the approximation of the sum capturing the nests requires another sample not based on the choice.
- author:
Michel Bierlaire
- date:
Fri Oct 27 12:50:06 2023
- class biogeme.sampling_of_alternatives.choice_set_generation.ChoiceSetsGeneration(context)[source]
Bases:
object
Class in charge of generationg the choice sets for each individual.
- Parameters:
context (SamplingContext) –
- __init__(context)[source]
Constructor
- Parameters:
context (
SamplingContext
) – contains all the information that is needed to perform the sampling of alternatives.
- define_new_variables(database)[source]
Create the new variables
- Parameters:
database (
Database
) – database, in Biogeme format.
- get_attributes_from_expression(expression)[source]
Extract the names of the attributes of alternatives from an expression
- Return type:
set
[str
]- Parameters:
expression (Expression) –
biogeme.sampling_of_alternatives.generate_model module
Generation of models estimated with samples of alternatives
- author:
Michel Bierlaire
- date:
Fri Sep 22 12:14:59 2023
- class biogeme.sampling_of_alternatives.generate_model.GenerateModel(context)[source]
Bases:
object
Class in charge of generating the biogeme expression for the loglikelihood function
- Parameters:
context (SamplingContext) –
- __init__(context)[source]
Constructor
- Parameters:
context (
SamplingContext
) – contains all the information that is needed to perform the sampling of alternatives.
- generate_utility(prefix, suffix)[source]
Generate the utility function for one alternative
- Parameters:
prefix (
str
) – prefix to add to the attributessuffix (
str
) – suffix to add to the attributes
- Return type:
- get_cross_nested_logit()[source]
Returns the expression for the log likelihood of the nested logit model
- Return type:
- get_nested_logit(nests)[source]
Returns the expression for the log likelihood of the nested logit model
- Parameters:
nests (
NestsForNestedLogit
) – A tuple containing as many items as nests. Each item is also a tuple containing two items:- Return type:
an object of type biogeme.expressions.expr.Expression representing the nest parameter,
a list containing the list of identifiers of the alternatives belonging to the nest.
Example:
nesta = MUA ,[1, 2, 3] nestb = MUB ,[4, 5, 6] nests = nesta, nestb
biogeme.sampling_of_alternatives.sampling_context module
Defines a class that characterized the context to apply sampling of alternatives
- author:
Michel Bierlaire
- date:
Wed Sep 6 14:38:31 2023
- class biogeme.sampling_of_alternatives.sampling_context.CrossVariableTuple(name, formula)[source]
Bases:
NamedTuple
A cross variable is a variable that involves socio-economic attributes of the individuals, and attributes of the alternatives. It can only be calculated after the sampling has been made.
- Parameters:
name (str) –
formula (Expression) –
-
formula:
Expression
Alias for field number 1
-
name:
str
Alias for field number 0
- class biogeme.sampling_of_alternatives.sampling_context.SamplingContext(the_partition, sample_sizes, individuals, choice_column, alternatives, id_column, biogeme_file_name, utility_function, combined_variables, mev_partition=None, mev_sample_sizes=None, cnl_nests=None)[source]
Bases:
object
Class gathering the data needed to perform an estimation with samples of alternatives
- Parameters:
partition – Partition used for the sampling.
sample_sizes (
Iterable
[int
]) – number of alternative to draw from each segment.individuals (
DataFrame
) – Pandas data frame containing all the individuals as rows. One column must contain the choice of each individual.choice_column (
str
) – name of the column containing the choice of each individual.alternatives (
DataFrame
) – Pandas data frame containing all the alternatives as rows. One column must contain a unique ID identifying the alternatives. The other columns contain variables to include in the data file.id_column (
str
) – name of the column containing the Ids of the alternatives.utility_function (
Expression
) – definition of the generic utility functioncombined_variables (
list
[CrossVariableTuple
]) – definition of interaction variablesmev_partition (
Optional
[Partition
]) – If a second choice set need to be sampled for the MEV terms, the corresponding partitition is provided here.the_partition (Partition) –
biogeme_file_name (str) –
mev_sample_sizes (Iterable[int] | None) –
cnl_nests (NestsForCrossNestedLogit | None) –
- __eq__(other)
Return self==value.
- __init__(the_partition, sample_sizes, individuals, choice_column, alternatives, id_column, biogeme_file_name, utility_function, combined_variables, mev_partition=None, mev_sample_sizes=None, cnl_nests=None)
- Parameters:
the_partition (Partition) –
sample_sizes (Iterable[int]) –
individuals (DataFrame) –
choice_column (str) –
alternatives (DataFrame) –
id_column (str) –
biogeme_file_name (str) –
utility_function (Expression) –
combined_variables (list[CrossVariableTuple]) –
mev_partition (Partition | None) –
mev_sample_sizes (Iterable[int] | None) –
cnl_nests (NestsForCrossNestedLogit | None) –
- Return type:
None
-
alternatives:
DataFrame
-
biogeme_file_name:
str
- check_expression(expression)[source]
Verifies if the variables contained in the expression can be found in the databases
- Return type:
None
- Parameters:
expression (Expression) –
- check_mev_partition()[source]
Check if the partition is a partition of the MEV alternatives. It does not need to cover the full choice set
- Return type:
None
- check_partition()[source]
Check if the partition is truly a partition. If not, an exception is raised
- Raises:
BiogemeError – if some elements are present in more than one subset.
BiogemeError – if the size of the union of the subsets does not match the expected total size
BiogemeError – if an alternative in the partition does not appear in the database of alternatives
BiogemeError – if a segment is empty
BiogemeError – if the number of sampled alternatives in a stratum is incorrect , that is zero, or larger than the stratum size..
- Return type:
None
- check_valid_alternatives(set_of_ids)[source]
- Check if the IDs in set are indeed valid
alternatives. Typically used to check if a nest is well defined
- Parameters:
set_of_ids (
set
[int
]) – set of identifiers to check- Raises:
BiogemeError – if at least one id is invalid.
- Return type:
None
-
choice_column:
str
-
cnl_nests:
Optional
[NestsForCrossNestedLogit
] = None
-
combined_variables:
list
[CrossVariableTuple
]
-
id_column:
str
-
individuals:
DataFrame
-
mev_sample_sizes:
Optional
[Iterable
[int
]] = None
- reporting()[source]
Summarizes the configuration specificed by the contect object.
- Return type:
None
-
sample_sizes:
Iterable
[int
]
-
utility_function:
Expression
- class biogeme.sampling_of_alternatives.sampling_context.StratumTuple(subset, sample_size)[source]
Bases:
NamedTuple
A stratum is an element of a partition of the full choice set, combined with the number of alternatives that must be sampled.
- Parameters:
subset (set[int]) –
sample_size (int) –
-
sample_size:
int
Alias for field number 1
-
subset:
set
[int
] Alias for field number 0
biogeme.sampling_of_alternatives.sampling_of_alternatives module
Module in charge of functionalities related to the sampling of alternatives
- author:
Michel Bierlaire
- date:
Thu Sep 7 10:14:54 2023
- class biogeme.sampling_of_alternatives.sampling_of_alternatives.SamplingOfAlternatives(context)[source]
Bases:
object
Class dealing with the various methods needed to estimate models with samples of alternatives
- Parameters:
context (SamplingContext) –
- __init__(context)[source]
Constructor
- Parameters:
context (
SamplingContext
) – contains all the information that is needed to perform the sampling of alternatives.
- sample_alternatives(chosen)[source]
Performing the sampling of alternatives
- Parameters:
chosen (
int
) – ID of the chosen alternative, that must be included in the choice set.- Return type:
DataFrame
- Returns:
data frame containing a sample of alternatives. The first one is the chosen alternative
- Raises:
BiogemeError – if the chosen alternative is unknown.
- biogeme.sampling_of_alternatives.sampling_of_alternatives.generate_segment_size(sample_size, number_of_segments)[source]
This function calculates the size of each segment, so that they are as close to each other as possible, and cover the full sample
- Parameters:
sample_size (int) – total size of the sample
number_of_segments (int) – number of segments
- Returns:
list of length number_of_segments, containing the segment sizes
- Return type:
list[int]