# Tools

Generic convenient tools.

## biogeme.tools module

Implements some useful functions

author

Michel Bierlaire

date

Sun Apr 14 10:46:10 2019

class biogeme.tools.LRTuple(message, statistic, threshold)

Bases: tuple

message

Alias for field number 0

statistic

Alias for field number 1

threshold

Alias for field number 2

Calculate prime numbers

Parameters

upperBound (int) – prime numbers up to this value will be computed

Returns

array with prime numbers

Return type

list(int)

Raises

biogemeError – if the upperBound is incorrectly defined (negative number, e.g.)

>>> tools.calculatePrimeNumbers(10)
[2, 3, 5, 7]

biogeme.tools.calculate_correlation(nests, results, alternative_names=None)[source]

Calculate the correlation matrix of a nested or cross-nested logit model.

Parameters
• nests (tuple(tuple(biogeme.expressions.Expression, list(int))), or tuple(tuple(biogeme.Expression, dict(int:biogeme.expressions.Expression)))) –

A tuple containing as many items as nests.

Each item is also a tuple containing two items:

• an object of type biogeme.expressions. expr.Expression representing the nest parameter,

• for the nested logit model, a list containing the list of identifiers of the alternatives belonging to the nest.

• for the cross-nested logit model, a dictionary mapping the alternative ids with the cross-nested parameters for the corresponding nest. If an alternative is missing in the dictionary, the corresponding alpha is set to zero.

Example for the nested logit::

nesta = MUA ,[1, 2, 3] nestb = MUB ,[4, 5, 6] nests = nesta, nestb

Example for the cross-nested logit:

alphaA = {1: alpha1a,
2: alpha2a,
3: alpha3a,
4: alpha4a,
5: alpha5a,
6: alpha6a}
alphaB = {1: alpha1b,
2: alpha2b,
3: alpha3b,
4: alpha4b,
5: alpha5b,
6: alpha6b}
nesta = MUA, alphaA
nestb = MUB, alphaB
nests = nesta, nestb


• results (biogeme.results.bioResults) – estimation results

• alternative_names (dict(int: str)) – a dictionary mapping the alternative IDs with their name. If None, the IDs are used as names.

biogeme.tools.checkDerivatives(theFunction, x, names=None, logg=False)[source]

Verifies the analytical derivatives of a function by comparing them with finite difference approximations.

Parameters
• theFunction (function) –

A function object that takes a vector as an argument, and returns a tuple:

• The first element of the tuple is the value of the function $$f$$,

• the second is the gradient of the function,

• the third is the hessian.

• x (numpy.array) – arguments of the function

• names (list(string)) – the names of the entries of x (for reporting).

• logg (bool) – if True, messages will be displayed.

Returns

tuple f, g, h, gdiff, hdiff where

• f is the value of the function at x,

• g is the analytical gradient,

• h is the analytical hessian,

• gdiff is the difference between the analytical gradient and the finite difference approximation

• hdiff is the difference between the analytical hessian and the finite difference approximation

Return type

float, numpy.array,numpy.array, numpy.array,numpy.array

biogeme.tools.correlation_cross_nested(nests)[source]

Calculate the correlation matrix of the error terms of all alternatives of a cross-nested logit model. It is assumed that the homogeneity parameter mu of the model has been normalized to one.

Parameters

nests (tuple) –

a tuple containing as many items as nests. Each item is also a tuple containing two items:

• an object of type biogeme.expressions. expr.Expression representing the nest parameter,

• a dictionary mapping the alternative ids with the cross-nested parameters for the corresponding nest. If an alternative is missing in the dictionary, the corresponding alpha is set to zero.

Example:

alphaA = {1: alpha1a,
2: alpha2a,
3: alpha3a,
4: alpha4a,
5: alpha5a,
6: alpha6a}
alphaB = {1: alpha1b,
2: alpha2b,
3: alpha3b,
4: alpha4b,
5: alpha5b,
6: alpha6b}
nesta = MUA, alphaA
nestb = MUB, alphaB
nests = nesta, nestb


Returns

value of the correlation

Return type

float

Raises

biogemeError – if the requested number is non positive or a float

Returns

correlation matrix

Return type

pd.DataFrame

biogeme.tools.correlation_nested(nests)[source]

Calculate the correlation matrix of the error terms of all alternatives of a nested logit model. It is assumed that the homogeneity parameter mu of the model has been normalized to one.

Parameters

nests (tuple) –

A tuple containing as many items as nests. Each item is also a tuple containing two items:

• an object of type biogeme.expressions.expr.Expression representing the nest parameter,

• a list containing the list of identifiers of the alternatives belonging to the nest.

Example:

nesta = MUA ,[1, 2, 3]
nestb = MUB ,[4, 5, 6]
nests = nesta, nestb


Returns

correlation matrix

Return type

pd.DataFrame

biogeme.tools.countNumberOfGroups(df, column)[source]

This function counts the number of groups of same value in a column. For instance: 1,2,2,3,3,3,4,1,1 would give 5.

Example:

>>>df = pd.DataFrame({'ID': [1, 1, 2, 3, 3, 1, 2, 3],
'value':[1000,
2000,
3000,
4000,
5000,
5000,
10000,
20000]})
>>>tools.countNumberOfGroups(df,'ID')
6

>>>tools.countNumberOfGroups(df,'value')
7

biogeme.tools.covariance_cross_nested(i, j, nests)[source]

Calculate the covariance between the error terms of two alternatives of a cross-nested logit model. It is assumed that the homogeneity parameter mu of the model has been normalized to one.

Parameters
• i (int) – first alternative

• j (int) – first alternative

• nests (tuple) –

a tuple containing as many items as nests. Each item is also a tuple containing two items:

• an object of type biogeme.expressions. expr.Expression representing the nest parameter,

• a dictionary mapping the alternative ids with the cross-nested parameters for the corresponding nest. If an alternative is missing in the dictionary, the corresponding alpha is set to zero.

Example:

alphaA = {1: alpha1a,
2: alpha2a,
3: alpha3a,
4: alpha4a,
5: alpha5a,
6: alpha6a}
alphaB = {1: alpha1b,
2: alpha2b,
3: alpha3b,
4: alpha4b,
5: alpha5b,
6: alpha6b}
nesta = MUA, alphaA
nestb = MUB, alphaB
nests = nesta, nestb


Returns

value of the correlation

Return type

float

Raises

biogemeError – if the requested number is non positive or a float

biogeme.tools.findiff_H(theFunction, x)[source]

Calculates the hessian of a function $$f$$ using finite differences

Parameters
• theFunction (function) – A function object that takes a vector as an argument, and returns a tuple. The first element of the tuple is the value of the function $$f$$, and the second is the gradient of the function. The other elements are not used.

• x (numpy.array) – argument of the function

Returns

numpy matrix containing the hessian calculated by finite differences.

Return type

numpy.array

biogeme.tools.findiff_g(theFunction, x)[source]

Calculates the gradient of a function $$f$$ using finite differences

Parameters
• theFunction (function) – A function object that takes a vector as an argument, and returns a tuple. The first element of the tuple is the value of the function $$f$$. The other elements are not used.

• x (numpy.array) – argument of the function

Returns

numpy vector, same dimension as x, containing the gradient calculated by finite differences.

Return type

numpy.array

biogeme.tools.flatten_database(df, merge_id, row_name=None, identical_columns=None)[source]

Combine several rows of a Pandas database into one. For instance, consider the following database:

   ID  Age  Cost   Name
0   1   23    34  Item3
1   1   23    45  Item4
2   1   23    12  Item7
3   2   45    65  Item3
4   2   45    34  Item7


If row_name is ‘Name’, the function generates the same data in the following format:

    Age  Item3_Cost  Item4_Cost  Item7_Cost
ID
1    23          34        45.0          12
2    45          65         NaN          34


If row_name is None, the function generates the same data in the following format:

    Age  1_Cost 1_Name  2_Cost 2_Name  3_Cost 3_Name
ID
1    23      34  Item3      45  Item4    12.0  Item7
2    45      65  Item3      34  Item7     NaN    NaN

Parameters
• df (pandas.DataFrame) – initial data frame

• merge_id (str) – name of the column that identifies rows that should be merged. In the above example: ‘ID’

• row_name (str) – name of the columns that provides the name of the rows in the new dataframe. In the example above: ‘Name’. If None, the rows are numbered sequentially.

• identical_columns (list(str)) – name of the columns that contain identical values across the rows of a group. In the example above: [‘Age’]. If None, these columns are automatically detected. On large database, there may be a performance issue.

Returns

reformatted database

Return type

pandas.DataFrame

Get a given number of prime numbers

Parameters

n (int) – number of primes that are requested

Returns

array with prime numbers

Return type

list(int)

Raises

biogemeError – if the requested number is non positive or a float

biogeme.tools.likelihood_ratio_test(model1, model2, significance_level=0.05)[source]

This function performs a likelihood ratio test between a restricted and an unrestricted model.

Parameters
• model1 (tuple(float, int)) – the final loglikelihood of one model, and the number of estimated parameters.

• model2 (tuple(float, int)) – the final loglikelihood of the other model, and the number of estimated parameters.

• significance_level (float) – level of significance of the test. Default: 0.05

Returns

a tuple containing:

• a message with the outcome of the test

• the statistic, that is minus two times the difference between the loglikelihood of the two models

• the threshold of the chi square distribution.

Return type

LRTuple(str, float, float)

Raises

biogemeError – if the unrestricted model has a lower log likelihood than the restricted model.