Tools

Generic convenient tools.

biogeme.tools module

Implements some useful functions

author:

Michel Bierlaire

date:

Sun Apr 14 10:46:10 2019

class biogeme.tools.LRTuple(message, statistic, threshold)[source]

Bases: NamedTuple

Tuple for the likelihood ratio test

Parameters:
  • message (str) –

  • statistic (float) –

  • threshold (float) –

message: str

Alias for field number 0

statistic: float

Alias for field number 1

threshold: float

Alias for field number 2

class biogeme.tools.ModelNames(prefix='Model')[source]

Bases: object

Class generating model names from unique configuration string

Parameters:

prefix (str) –

__init__(prefix='Model')[source]
Parameters:

prefix (str) –

class biogeme.tools.TemporaryFile[source]

Bases: object

Class generating a temporary file, so that the user does not bother about its location, or even its name

Example:

with TemporaryFile() as filename:
    with open(filename, 'w') as f:
        print('stuff', file=f)
biogeme.tools.calculate_prime_numbers(upper_bound)[source]

Calculate prime numbers

Parameters:

upper_bound (int) – prime numbers up to this value will be computed

Return type:

list[int]

Returns:

array with prime numbers

Raises:

BiogemeError – if the upper_bound is incorrectly defined (negative number, e.g.)

>>> tools.calculate_prime_numbers(10)
[2, 3, 5, 7]
biogeme.tools.checkDerivatives(the_function, x, names=None, logg=False)[source]

Verifies the analytical derivatives of a function by comparing them with finite difference approximations.

Parameters:
  • the_function (Callable[[ndarray], tuple[float, ndarray, ndarray]]) –

    A function object that takes a vector as an argument, and returns a tuple:

    • The first element of the tuple is the value of the function \(f\),

    • the second is the gradient of the function,

    • the third is the hessian.

  • x (ndarray) – arguments of the function

  • names (Optional[list[str]]) – the names of the entries of x (for reporting).

  • logg (Optional[bool]) – if True, messages will be displayed.

Return type:

tuple[float, ndarray, ndarray, ndarray, ndarray]

Returns:

tuple f, g, h, gdiff, hdiff where

  • f is the value of the function at x,

  • g is the analytical gradient,

  • h is the analytical hessian,

  • gdiff is the difference between the analytical gradient and the finite difference approximation

  • hdiff is the difference between the analytical hessian and the finite difference approximation

biogeme.tools.countNumberOfGroups(df, column)[source]

This function counts the number of groups of same value in a column. For instance: 1,2,2,3,3,3,4,1,1 would give 5.

Example:

>>>df = pd.DataFrame({'ID': [1, 1, 2, 3, 3, 1, 2, 3],
                      'value':[1000,
                               2000,
                               3000,
                               4000,
                               5000,
                               5000,
                               10000,
                               20000]})
>>>tools.countNumberOfGroups(df,'ID')
6

>>>tools.countNumberOfGroups(df,'value')
7
Return type:

int

Parameters:
  • df (DataFrame) –

  • column (str) –

biogeme.tools.findiff_H(the_function, x)[source]

Calculates the hessian of a function \(f\) using finite differences

Parameters:
  • the_function (Callable[[ndarray], tuple[float, ndarray, Any]]) – A function object that takes a vector as an argument, and returns a tuple. The first element of the tuple is the value of the function \(f\), and the second is the gradient of the function. The other elements are not used.

  • x (ndarray) – argument of the function

Return type:

ndarray

Returns:

numpy matrix containing the hessian calculated by finite differences.

biogeme.tools.findiff_g(the_function, x)[source]

Calculates the gradient of a function \(f\) using finite differences

Parameters:
  • the_function (Callable[[ndarray], tuple[float, ...]]) – A function object that takes a vector as an argument, and returns a tuple. The first element of the tuple is the value of the function \(f\). The other elements are not used.

  • x (ndarray) – argument of the function

Return type:

ndarray

Returns:

numpy vector, same dimension as x, containing the gradient calculated by finite differences.

biogeme.tools.flatten_database(df, merge_id, row_name=None, identical_columns=None)[source]

Combine several rows of a Pandas database into one. For instance, consider the following database:

   ID  Age  Cost   Name
0   1   23    34  Item3
1   1   23    45  Item4
2   1   23    12  Item7
3   2   45    65  Item3
4   2   45    34  Item7

If row_name is ‘Name’, the function generates the same data in the following format:

    Age  Item3_Cost  Item4_Cost  Item7_Cost
ID
1    23          34        45.0          12
2    45          65         NaN          34

If row_name is None, the function generates the same data in the following format:

    Age  1_Cost 1_Name  2_Cost 2_Name  3_Cost 3_Name
ID
1    23      34  Item3      45  Item4    12.0  Item7
2    45      65  Item3      34  Item7     NaN    NaN
Parameters:
  • df (pandas.DataFrame) – initial data frame

  • merge_id (str) – name of the column that identifies rows that should be merged. In the above example: ‘ID’

  • row_name (str) – name of the columns that provides the name of the rows in the new dataframe. In the example above: ‘Name’. If None, the rows are numbered sequentially.

  • identical_columns (list(str)) – name of the columns that contain identical values across the rows of a group. In the example above: [‘Age’]. If None, these columns are automatically detected. On large database, there may be a performance issue.

Returns:

reformatted database

Return type:

pandas.DataFrame

biogeme.tools.format_timedelta(td)[source]

Format a timedelta in a “human-readable” way

Return type:

str

Parameters:

td (timedelta) –

biogeme.tools.generate_unique_ids(list_of_ids)[source]

If there are duplicates in the list, a new list is generated where there are renamed to obtain a list with unique IDs.

Parameters:

list_of_ids (list[str]) – list of ids

Returns:

a dict that maps the unique names with the original name

biogeme.tools.get_prime_numbers(n)[source]

Get a given number of prime numbers

Parameters:

n (int) – number of primes that are requested

Return type:

list[int]

Returns:

array with prime numbers

Raises:

BiogemeError – if the requested number is non positive or a float

biogeme.tools.likelihood_ratio_test(model1, model2, significance_level=0.05)[source]

This function performs a likelihood ratio test between a restricted and an unrestricted model.

Parameters:
  • model1 (tuple[float, int]) – the final loglikelihood of one model, and the number of estimated parameters.

  • model2 (tuple[float, int]) – the final loglikelihood of the other model, and the number of estimated parameters.

  • significance_level (float) – level of significance of the test. Default: 0.05

Return type:

LRTuple

Returns:

a tuple containing:

  • a message with the outcome of the test

  • the statistic, that is minus two times the difference between the loglikelihood of the two models

  • the threshold of the chi square distribution.

Raises:

BiogemeError – if the unrestricted model has a lower log likelihood than the restricted model.

biogeme.tools.unique_product(*iterables, max_memory_mb=1024)[source]

Generate the Cartesian product of multiple iterables, keeping only the unique entries. Raises a MemoryError if memory usage exceeds the specified threshold.

Parameters:
  • iterables (Iterable) – Variable number of iterables to compute the Cartesian product from.

  • max_memory_mb (int) – Maximum memory usage in megabytes (default: 1024MB).

Returns:

Yields unique entries from the Cartesian product.

Return type:

Iterator[tuple]