Tools
Generic convenient tools.
biogeme.tools module
Implements some useful functions
- author
Michel Bierlaire
- date
Sun Apr 14 10:46:10 2019
- class biogeme.tools.LRTuple(message, statistic, threshold)
Bases:
tuple
- message
Alias for field number 0
- statistic
Alias for field number 1
- threshold
Alias for field number 2
- biogeme.tools.calculatePrimeNumbers(upperBound)[source]
Calculate prime numbers
- Parameters
upperBound (int) – prime numbers up to this value will be computed
- Returns
array with prime numbers
- Return type
list(int)
- Raises
biogemeError – if the upperBound is incorrectly defined (negative number, e.g.)
>>> tools.calculatePrimeNumbers(10) [2, 3, 5, 7]
- biogeme.tools.calculate_correlation(nests, results, alternative_names=None)[source]
Calculate the correlation matrix of a nested or cross-nested logit model.
- Parameters
nests (tuple(tuple(biogeme.expressions.Expression, list(int))), or tuple(tuple(biogeme.Expression, dict(int:biogeme.expressions.Expression)))) –
A tuple containing as many items as nests.
Each item is also a tuple containing two items:
an object of type biogeme.expressions. expr.Expression representing the nest parameter,
for the nested logit model, a list containing the list of identifiers of the alternatives belonging to the nest.
for the cross-nested logit model, a dictionary mapping the alternative ids with the cross-nested parameters for the corresponding nest. If an alternative is missing in the dictionary, the corresponding alpha is set to zero.
- Example for the nested logit::
nesta = MUA ,[1, 2, 3] nestb = MUB ,[4, 5, 6] nests = nesta, nestb
Example for the cross-nested logit:
alphaA = {1: alpha1a, 2: alpha2a, 3: alpha3a, 4: alpha4a, 5: alpha5a, 6: alpha6a} alphaB = {1: alpha1b, 2: alpha2b, 3: alpha3b, 4: alpha4b, 5: alpha5b, 6: alpha6b} nesta = MUA, alphaA nestb = MUB, alphaB nests = nesta, nestb
results (biogeme.results.bioResults) – estimation results
alternative_names (dict(int: str)) – a dictionary mapping the alternative IDs with their name. If None, the IDs are used as names.
- biogeme.tools.checkDerivatives(theFunction, x, names=None, logg=False)[source]
Verifies the analytical derivatives of a function by comparing them with finite difference approximations.
- Parameters
theFunction (function) –
A function object that takes a vector as an argument, and returns a tuple:
The first element of the tuple is the value of the function \(f\),
the second is the gradient of the function,
the third is the hessian.
x (numpy.array) – arguments of the function
names (list(string)) – the names of the entries of x (for reporting).
logg (bool) – if True, messages will be displayed.
- Returns
tuple f, g, h, gdiff, hdiff where
f is the value of the function at x,
g is the analytical gradient,
h is the analytical hessian,
gdiff is the difference between the analytical gradient and the finite difference approximation
hdiff is the difference between the analytical hessian and the finite difference approximation
- Return type
float, numpy.array,numpy.array, numpy.array,numpy.array
- biogeme.tools.correlation_cross_nested(nests)[source]
Calculate the correlation matrix of the error terms of all alternatives of a cross-nested logit model. It is assumed that the homogeneity parameter mu of the model has been normalized to one.
- Parameters
nests (tuple) –
a tuple containing as many items as nests. Each item is also a tuple containing two items:
an object of type biogeme.expressions. expr.Expression representing the nest parameter,
a dictionary mapping the alternative ids with the cross-nested parameters for the corresponding nest. If an alternative is missing in the dictionary, the corresponding alpha is set to zero.
Example:
alphaA = {1: alpha1a, 2: alpha2a, 3: alpha3a, 4: alpha4a, 5: alpha5a, 6: alpha6a} alphaB = {1: alpha1b, 2: alpha2b, 3: alpha3b, 4: alpha4b, 5: alpha5b, 6: alpha6b} nesta = MUA, alphaA nestb = MUB, alphaB nests = nesta, nestb
- Returns
value of the correlation
- Return type
float
- Raises
biogemeError – if the requested number is non positive or a float
- Returns
correlation matrix
- Return type
pd.DataFrame
- biogeme.tools.correlation_nested(nests)[source]
Calculate the correlation matrix of the error terms of all alternatives of a nested logit model. It is assumed that the homogeneity parameter mu of the model has been normalized to one.
- Parameters
nests (tuple) –
A tuple containing as many items as nests. Each item is also a tuple containing two items:
an object of type biogeme.expressions.expr.Expression representing the nest parameter,
a list containing the list of identifiers of the alternatives belonging to the nest.
Example:
nesta = MUA ,[1, 2, 3] nestb = MUB ,[4, 5, 6] nests = nesta, nestb
- Returns
correlation matrix
- Return type
pd.DataFrame
- biogeme.tools.countNumberOfGroups(df, column)[source]
This function counts the number of groups of same value in a column. For instance: 1,2,2,3,3,3,4,1,1 would give 5.
Example:
>>>df = pd.DataFrame({'ID': [1, 1, 2, 3, 3, 1, 2, 3], 'value':[1000, 2000, 3000, 4000, 5000, 5000, 10000, 20000]}) >>>tools.countNumberOfGroups(df,'ID') 6 >>>tools.countNumberOfGroups(df,'value') 7
- biogeme.tools.covariance_cross_nested(i, j, nests)[source]
Calculate the covariance between the error terms of two alternatives of a cross-nested logit model. It is assumed that the homogeneity parameter mu of the model has been normalized to one.
- Parameters
i (int) – first alternative
j (int) – first alternative
nests (tuple) –
a tuple containing as many items as nests. Each item is also a tuple containing two items:
an object of type biogeme.expressions. expr.Expression representing the nest parameter,
a dictionary mapping the alternative ids with the cross-nested parameters for the corresponding nest. If an alternative is missing in the dictionary, the corresponding alpha is set to zero.
Example:
alphaA = {1: alpha1a, 2: alpha2a, 3: alpha3a, 4: alpha4a, 5: alpha5a, 6: alpha6a} alphaB = {1: alpha1b, 2: alpha2b, 3: alpha3b, 4: alpha4b, 5: alpha5b, 6: alpha6b} nesta = MUA, alphaA nestb = MUB, alphaB nests = nesta, nestb
- Returns
value of the correlation
- Return type
float
- Raises
biogemeError – if the requested number is non positive or a float
- biogeme.tools.findiff_H(theFunction, x)[source]
Calculates the hessian of a function \(f\) using finite differences
- Parameters
theFunction (function) – A function object that takes a vector as an argument, and returns a tuple. The first element of the tuple is the value of the function \(f\), and the second is the gradient of the function. The other elements are not used.
x (numpy.array) – argument of the function
- Returns
numpy matrix containing the hessian calculated by finite differences.
- Return type
numpy.array
- biogeme.tools.findiff_g(theFunction, x)[source]
Calculates the gradient of a function \(f\) using finite differences
- Parameters
theFunction (function) – A function object that takes a vector as an argument, and returns a tuple. The first element of the tuple is the value of the function \(f\). The other elements are not used.
x (numpy.array) – argument of the function
- Returns
numpy vector, same dimension as x, containing the gradient calculated by finite differences.
- Return type
numpy.array
- biogeme.tools.flatten_database(df, merge_id, row_name=None, identical_columns=None)[source]
Combine several rows of a Pandas database into one. For instance, consider the following database:
ID Age Cost Name 0 1 23 34 Item3 1 1 23 45 Item4 2 1 23 12 Item7 3 2 45 65 Item3 4 2 45 34 Item7
If row_name is ‘Name’, the function generates the same data in the following format:
Age Item3_Cost Item4_Cost Item7_Cost ID 1 23 34 45.0 12 2 45 65 NaN 34
If row_name is None, the function generates the same data in the following format:
Age 1_Cost 1_Name 2_Cost 2_Name 3_Cost 3_Name ID 1 23 34 Item3 45 Item4 12.0 Item7 2 45 65 Item3 34 Item7 NaN NaN
- Parameters
df (pandas.DataFrame) – initial data frame
merge_id (str) – name of the column that identifies rows that should be merged. In the above example: ‘ID’
row_name (str) – name of the columns that provides the name of the rows in the new dataframe. In the example above: ‘Name’. If None, the rows are numbered sequentially.
identical_columns (list(str)) – name of the columns that contain identical values across the rows of a group. In the example above: [‘Age’]. If None, these columns are automatically detected. On large database, there may be a performance issue.
- Returns
reformatted database
- Return type
pandas.DataFrame
- biogeme.tools.getPrimeNumbers(n)[source]
Get a given number of prime numbers
- Parameters
n (int) – number of primes that are requested
- Returns
array with prime numbers
- Return type
list(int)
- Raises
biogemeError – if the requested number is non positive or a float
- biogeme.tools.likelihood_ratio_test(model1, model2, significance_level=0.05)[source]
This function performs a likelihood ratio test between a restricted and an unrestricted model.
- Parameters
model1 (tuple(float, int)) – the final loglikelihood of one model, and the number of estimated parameters.
model2 (tuple(float, int)) – the final loglikelihood of the other model, and the number of estimated parameters.
significance_level (float) – level of significance of the test. Default: 0.05
- Returns
a tuple containing:
a message with the outcome of the test
the statistic, that is minus two times the difference between the loglikelihood of the two models
the threshold of the chi square distribution.
- Return type
LRTuple(str, float, float)
- Raises
biogemeError – if the unrestricted model has a lower log likelihood than the restricted model.