Note

Go to the end to download the full example code

biogeme.tools

Examples of use of several functions.

This is designed for programmers who need examples of use of the functions of the module. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model.

author:: Michel Bierlaire
date:: Sat Dec 2 13:09:42 2023

import numpy as np
import pandas as pd
from biogeme.version import getText
from biogeme import tools
import biogeme.biogeme_logging as blog
import biogeme.exceptions as excep

Version of Biogeme.

print(getText())

biogeme 3.2.13 [2023-12-23]
Home page: http://biogeme.epfl.ch
Submit questions to https://groups.google.com/d/forum/biogeme
Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)

logger = blog.get_screen_logger(level=blog.INFO)

Define a function and its derivatives:

\[f = \log(x_0) + \exp(x_1),\]

\[\begin{split}g = \left( \begin{array}{c}\frac{1}{x_0} \\ \exp(x_1)\end{array}\right),\end{split}\]

\[\begin{split}h=\left(\begin{array}{cc} -\frac{1}{x_0^2} & 0 \\ 0 & \exp(x_1)\end{array}\right).\end{split}\]

def my_function(x: np.ndarray) -> tuple[float, np.ndarray, np.ndarray]:
    """Implementation of the test function.

    :param x: point at which the function and its derivatives must be evaluated.
    """
    f = np.log(x[0]) + np.exp(x[1])
    g = np.empty(2)
    g[0] = 1.0 / x[0]
    g[1] = np.exp(x[1])
    H = np.empty((2, 2))
    H[0, 0] = -1.0 / x[0] ** 2
    H[0, 1] = 0.0
    H[1, 0] = 0.0
    H[1, 1] = np.exp(x[1])
    return f, g, H

Evaluate the function at the point

\[\begin{split}x = \left( \begin{array}{c}1.1 \\ 1.1 \end{array}\right).\end{split}\]

x = np.array([1.1, 1.1])
f, g, H = my_function(x)

3.099476203750758

We use the DataFrame for a nicer display.

pd.DataFrame(g)

	0
0	0.909091
1	3.004166

pd.DataFrame(H)

	0	1
0	-0.826446	0.000000
1	0.000000	3.004166

Calculates an approximation of the gradient by finite differences.

g_fd = tools.findiff_g(my_function, x)

pd.DataFrame(g_fd)

	0
0	0.909091
1	3.004166

Check the precision of the approximation

pd.DataFrame(g - g_fd)

	0
0	4.185955e-08
1	-1.645947e-07

Calculates an approximation of the Hessian by finite differences.

H_fd = tools.findiff_H(my_function, x)

pd.DataFrame(H_fd)

	0	1
0	-0.826446	0.000000
1	0.000000	3.004166

Check the precision of the approximation

pd.DataFrame(H - H_fd)

	0	1
0	-8.264656e-08	0.000000e+00
1	0.000000e+00	-1.645947e-07

There is a function that checks the analytical derivatives by comparing them to their finite difference approximation.

f, g, h, gdiff, hdiff = tools.checkDerivatives(my_function, x, names=None, logg=True)

x               Gradient        FinDiff         Difference
x[0]            +9.090909E-01   +9.090909E-01   +4.185955E-08
x[1]            +3.004166E+00   +3.004166E+00   -1.645947E-07
Row             Col             Hessian FinDiff         Difference
x[0]            x[0]            -8.264463E-01   -8.264462E-01   -8.264656E-08
x[0]            x[1]            +0.000000E+00   +0.000000E+00   +0.000000E+00
x[1]            x[0]            +0.000000E+00   +0.000000E+00   +0.000000E+00
x[1]            x[1]            +3.004166E+00   +3.004166E+00   -1.645947E-07

To help reading the reporting, it is possible to give names to variables.

f, g, h, gdiff, hdiff = tools.checkDerivatives(
    my_function, x, names=['First', 'Second'], logg=True
)

x               Gradient        FinDiff         Difference
First           +9.090909E-01   +9.090909E-01   +4.185955E-08
Second          +3.004166E+00   +3.004166E+00   -1.645947E-07
Row             Col             Hessian FinDiff         Difference
First           First           -8.264463E-01   -8.264462E-01   -8.264656E-08
First           Second          +0.000000E+00   +0.000000E+00   +0.000000E+00
Second          First           +0.000000E+00   +0.000000E+00   +0.000000E+00
Second          Second          +3.004166E+00   +3.004166E+00   -1.645947E-07

pd.DataFrame(gdiff)

	0
0	4.185955e-08
1	-1.645947e-07

hdiff

array([[-8.26465610e-08,  0.00000000e+00],
       [ 0.00000000e+00, -1.64594663e-07]])

Prime numbers: calculate prime numbers lesser or equal to an upper bound.

myprimes = tools.calculate_prime_numbers(10)
myprimes

[2, 3, 5, 7]

myprimes = tools.calculate_prime_numbers(100)
myprimes

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Calculate a given number of prime numbers.

myprimes = tools.get_prime_numbers(7)
myprimes

[2, 3, 5, 7, 11, 13, 17]

Counting groups of data.

alist = [1, 2, 2, 3, 3, 3, 4, 1, 1]

df = pd.DataFrame(
    {
        'ID': [1, 1, 2, 3, 3, 1, 2, 3],
        'value': [1000, 2000, 3000, 4000, 5000, 5000, 10000, 20000],
    }
)

tools.countNumberOfGroups(df, 'ID')

tools.countNumberOfGroups(df, 'value')

Likelihood ratio test.

model1 = (-1340.8, 5)
model2 = (-1338.49, 7)

A likelihood ratio test is performed. The function returns the outcome of the test, the statistic, and the threshold.

tools.likelihood_ratio_test(model1, model2)

LRTuple(message='H0 cannot be rejected at level 5.0%', statistic=4.619999999999891, threshold=5.991464547107979)

The default level of significance is 0.95. It can be changed.

tools.likelihood_ratio_test(model1, model2, significance_level=0.9)

LRTuple(message='H0 can be rejected at level 90.0%', statistic=4.619999999999891, threshold=0.21072103131565265)

The order in which the models are presented is irrelevant.

tools.likelihood_ratio_test(model2, model1)

LRTuple(message='H0 cannot be rejected at level 5.0%', statistic=4.619999999999891, threshold=5.991464547107979)

But the unrestricted model must have a higher loglikelihood than the restricted one.

model1 = (-1340.8, 7)
model2 = (-1338.49, 5)

try:
    tools.likelihood_ratio_test(model1, model2)
except excep.BiogemeError as e:
    print(e)

The unrestricted model (-1340.8, 7) has a lower log likelihood than the restricted one (-1338.49, 5)

Total running time of the script: (0 minutes 0.010 seconds)

Gallery generated by Sphinx-Gallery