biogeme.expressions

Examples of use of several functions.

This is designed for programmers who need examples of use of the functions of the module. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model.

author:

Michel Bierlaire

date:

Tue Nov 21 14:53:49 2023

from biogeme.version import getText
import numpy as np
import pandas as pd
import biogeme.expressions as ex
import biogeme.database as db
import biogeme.exceptions as excep
from biogeme import models
from biogeme import tools
from biogeme.expressions import IdManager, TypeOfElementaryExpression
import biogeme.biogeme_logging as blog

Version of Biogeme.

print(getText())
biogeme 3.2.13 [2023-12-23]
Home page: http://biogeme.epfl.ch
Submit questions to https://groups.google.com/d/forum/biogeme
Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)
logger = blog.get_screen_logger(level=blog.INFO)

Simple expressions

Simple expressions can be evaluated both with the functions getValue`(implemented in Python) and the `getValue_c (implemented in C++). They do not require a database.

x = ex.Beta('x', 2, None, None, 1)
x
Beta('x', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1)
x.getValue()
2
x.getValue_c(prepareIds=True)
2.0
y = ex.Beta('y', 3, None, None, 1)
y
Beta('y', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1)
y.getValue()
3
y.getValue_c(prepareIds=True)
3.0

Note that if the parameter has to be estimated, its value cannot be obtained.

unknown_parameter = ex.Beta('x', 2, None, None, 0)
try:
    unknown_parameter.getValue()
except excep.BiogemeError as e:
    print(e)
Parameter x must be estimated from data.
one = ex.Numeric(1)
one
`1.0`
one.getValue()
1.0
one.getValue_c(prepareIds=True)
1.0

Addition

z = x + y
z.getValue()
5
z.getValue_c(prepareIds=True)
5.0

Substraction

z = x - y
z.getValue()
-1
z.getValue_c(prepareIds=True)
-1.0

Multiplication

z = x * y
z.getValue()
6
z.getValue_c(prepareIds=True)
6.0

Division

z = x / y
z.getValue()
0.6666666666666666
z.getValue_c(prepareIds=True)
0.6666666666666666

Power

z = x**y
z.getValue()
8
z.getValue_c(prepareIds=True)
8.0

Exponential

z = ex.exp(x)
z.getValue()
7.38905609893065
z.getValue_c(prepareIds=True)
7.38905609893065

Logarithm

z = ex.log(x)
z.getValue()
0.6931471805599453
z.getValue_c(prepareIds=True)
0.6931471805599453

Minimum

z = ex.bioMin(x, y)
z.getValue()
2
z.getValue_c(prepareIds=True)
2.0

Maximum

z = ex.bioMax(x, y)
z.getValue()
3
z.getValue_c(prepareIds=True)
3.0

And

z = x & y
z.getValue()
1.0
z.getValue_c(prepareIds=True)
1.0
z = x & 0
z.getValue()
0.0
z.getValue_c(prepareIds=True)
0.0

Or

z = x | y
z.getValue()
1.0
z.getValue_c(prepareIds=True)
1.0
z = ex.Numeric(0) | ex.Numeric(0)
z.getValue()
0.0
z.getValue_c(prepareIds=True)
0.0

Equal

z = x == y
z.getValue()
0
z.getValue_c(prepareIds=True)
0.0
z = (x + 1) == y
z.getValue()
1
z.getValue_c(prepareIds=True)
1.0

Not equal

z = x != y
z.getValue()
1
z.getValue_c(prepareIds=True)
1.0
z = (x + 1) != y
z.getValue()
0
z.getValue_c(prepareIds=True)
0.0

Lesser or equal

z = x <= y
z.getValue()
1
z.getValue_c(prepareIds=True)
1.0

Greater or equal

z = x >= y
z.getValue()
0
z.getValue_c(prepareIds=True)
0.0

Lesser than

z = x < y
z.getValue()
1
z.getValue_c(prepareIds=True)
1.0

Greater than

z = x > y
z.getValue()
0
z.getValue_c(prepareIds=True)
0.0

Opposite

z = -x
z.getValue()
-2
z.getValue_c(prepareIds=True)
-2.0

Sum of multiples expressions

listOfExpressions = [x, y, 1 + x, 1 + y]
z = ex.bioMultSum(listOfExpressions)
z.getValue()
12.0
z.getValue_c(prepareIds=True)
12.0

The result is the same as the following, but it implements the sum in a more efficient way.

z = x + y + 1 + x + 1 + y
z.getValue()
12.0
z.getValue_c(prepareIds=True)
12.0

Element: this expression considers a dictionary of expressions, and an expression for the index. The index is evaluated, and the value of the corresponding expression in the dictionary is returned.

my_dict = {1: ex.exp(-1), 2: ex.log(1.2), 3: 1234}
index = x
index.getValue()
2
z = ex.Elem(my_dict, index)
z.getValue()
0.1823215567939546
z.getValue_c(prepareIds=True)
0.1823215567939546
index = x - 1
index.getValue()
1.0
z = ex.Elem(my_dict, index)
z.getValue()
0.36787944117144233
z.getValue_c(prepareIds=True)
0.36787944117144233
index = x - 2
index.getValue()
0.0

If the value returned as index does not corresponds to an entry in the dictionary, an exception is raised.

z = ex.Elem(my_dict, index)
try:
    z.getValue()
except excep.BiogemeError as e:
    print(f'Exception raised: {e}')
Exception raised: Key 0 is not present in the dictionary. Available keys: dict_keys([1, 2, 3])
z = ex.Elem(my_dict, index)
try:
    z.getValue_c(prepareIds=True)
except RuntimeError as e:
    print(f'Exception raised: {e}')
Exception raised: src/cythonbiogeme/cpp/bioExprElem.cc:58: Biogeme exception: Key (-(x lit[0],fixed[0](2),2)=0) is not present in dictionary:
  1: exp(-1)
  2: log(1.2)
  3: 1234

Complex expressions

When an expression is deemed complex in Biogeme, the getValue function is not available. Only the getValue_c function must be used. It calculates the expressions using a C++ implementation of the expression.

Normal CDF: it calculates

\[\Phi(x) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{x} e^{-\frac{1}{2}\omega^2}d\omega.\]
z = ex.bioNormalCdf(x)
z.getValue_c(prepareIds=True)
0.9772498680518218
z = ex.bioNormalCdf(0)
z.getValue_c(prepareIds=True)
0.5

Derivative

z = 30 * x + 20 * y
zx = ex.Derive(z, 'x')
zx.getValue_c(prepareIds=True)
30.0
zx = ex.Derive(z, 'y')
zx.getValue_c(prepareIds=True)
20.0

Integral: let’s calculate the integral of the pdf of a normal distribution:

\[\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty} e^{-\frac{1}{2}\omega^2}d\omega = 1.\]
omega = ex.RandomVariable('omega')
pdf = ex.exp(-omega * omega / 2)
z = ex.Integrate(pdf, 'omega') / np.sqrt(2 * np.pi)
z.getValue_c(prepareIds=True)
1.0

In order to change the bounds of integration, a change of variables must be performed. Let’s calculate

\[\int_0^1 x^2 dx=\frac{1}{3}.\]

If \(a\) is the lower bound of integration, and \(b\) is the upper bound, the change of variable is

\[x = a + \frac{b-a}{1+e^{-\omega}},\]

and

\[dx = \frac{(b-a)e^{-\omega}}{(1+e^{-\omega})^2}d\omega.\]
a = 0
b = 1
t = a + (b - a) / (1 + ex.exp(-omega))
dt = (b - a) * ex.exp(-omega) * (1 + ex.exp(-omega)) ** -2
integrand = t * t
z = ex.Integrate(integrand * dt / (b - a), 'omega')
z.getValue_c(prepareIds=True)
0.3333323120662822

Expressions using a database

df = pd.DataFrame(
    {
        'Person': [1, 1, 1, 2, 2],
        'Exclude': [0, 0, 1, 0, 1],
        'Variable1': [10, 20, 30, 40, 50],
        'Variable2': [100, 200, 300, 400, 500],
        'Choice': [2, 2, 3, 1, 2],
        'Av1': [0, 1, 1, 1, 1],
        'Av2': [1, 1, 1, 1, 1],
        'Av3': [0, 1, 1, 1, 1],
    }
)
my_data = db.Database('test', df)

Linear utility: it defines a linear conbinations of parameters are variables.

beta1 = ex.Beta('beta1', 10, None, None, 0)
beta2 = ex.Beta('beta2', 20, None, None, 0)
v1 = ex.Variable('Variable1')
v2 = ex.Variable('Variable2')
listOfTerms = [
    (beta1, v1),
    (beta2, v2),
]
z = ex.bioLinearUtility(listOfTerms)
z.getValue_c(database=my_data, prepareIds=True)
array([ 2100.,  4200.,  6300.,  8400., 10500.])

It is equivalent to the following, but implemented in a more efficient way.

z = beta1 * v1 + beta2 * v2
z.getValue_c(database=my_data, prepareIds=True)
array([ 2100.,  4200.,  6300.,  8400., 10500.])

Monte Carlo: we approximate the integral

\[\int_0^1 x^2 dx=\frac{1}{3}\]

using Monte-Carlo integration. As draws require a database, it is calculated for each entry in the database.

draws = ex.bioDraws('draws', 'UNIFORM')
z = ex.MonteCarlo(draws * draws)
z.getValue_c(database=my_data, prepareIds=True)
array([0.32299649, 0.33650473, 0.33501673, 0.35513668, 0.33056744])

Panel Trajectory: we first calculate a quantity for each entry in the database.

v1 = ex.Variable('Variable1')
v2 = ex.Variable('Variable2')
p = v1 / (v1 + v2)
p.getValue_c(database=my_data, prepareIds=True)
array([0.09090909, 0.09090909, 0.09090909, 0.09090909, 0.09090909])

We now declare the data as “panel”, based on the identified Person. It means that the first three rows correspond to a sequence of three observations for individual 1, and the last two, a sequence of two observations for individual 2. The panel trajectory calculates the expression for each row associated with an individual, and calculate the product.

my_data.panel('Person')

In this case, we expect the following for individual 1:

0.09090909**3
0.0007513147783621339

And the following for individual 2:

0.09090909**2
0.0082644626446281

We verify that it is indeed the case:

z = ex.PanelLikelihoodTrajectory(p)
z.getValue_c(database=my_data, prepareIds=True)
array([0.00075131, 0.00826446])

More complex expressions

We set the number of draws for Monte-Carlo integration. It should be a large number. For the sake of computational efficiency, as this notebook is designed to illustrate the various function, we use a low value.

NUMBER_OF_DRAWS = 100

We first create a small database

df = pd.DataFrame(
    {
        'Person': [1, 1, 1, 2, 2],
        'Exclude': [0, 0, 1, 0, 1],
        'Variable1': [10, 20, 30, 40, 50],
        'Variable2': [100, 200, 300, 400, 500],
        'Choice': [2, 2, 3, 1, 2],
        'Av1': [0, 1, 1, 1, 1],
        'Av2': [1, 1, 1, 1, 1],
        'Av3': [0, 1, 1, 1, 1],
    }
)
df
Person Exclude Variable1 Variable2 Choice Av1 Av2 Av3
0 1 0 10 100 2 0 1 0
1 1 0 20 200 2 1 1 1
2 1 1 30 300 3 1 1 1
3 2 0 40 400 1 1 1 1
4 2 1 50 500 2 1 1 1


my_data = db.Database('test', df)

The following type of expression is a literal called Variable that corresponds to an entry in the database.

Person = ex.Variable('Person')
Variable1 = ex.Variable('Variable1')
Variable2 = ex.Variable('Variable2')
Choice = ex.Variable('Choice')
Av1 = ex.Variable('Av1')
Av2 = ex.Variable('Av2')
Av3 = ex.Variable('Av3')

It is possible to add a new column to the database, that creates a new variable that can be used in expressions.

newvar_b = my_data.DefineVariable('newvar_b', Variable1 + Variable2)
my_data.data
Person Exclude Variable1 Variable2 Choice Av1 Av2 Av3 newvar_b
0 1 0 10 100 2 0 1 0 110.0
1 1 0 20 200 2 1 1 1 220.0
2 1 1 30 300 3 1 1 1 330.0
3 2 0 40 400 1 1 1 1 440.0
4 2 1 50 500 2 1 1 1 550.0


It is equivalent to the following Pandas statement.

my_data.data['newvar_p'] = my_data.data['Variable1'] + my_data.data['Variable2']
my_data.data
Person Exclude Variable1 Variable2 Choice Av1 Av2 Av3 newvar_b newvar_p
0 1 0 10 100 2 0 1 0 110.0 110
1 1 0 20 200 2 1 1 1 220.0 220
2 1 1 30 300 3 1 1 1 330.0 330
3 2 0 40 400 1 1 1 1 440.0 440
4 2 1 50 500 2 1 1 1 550.0 550


Do not use chaining comparison expressions with Biogeme. Not only it does not provide the expected expression, but it does not trigger a warning or an exception.

my_expression = 200 <= Variable2 <= 400
print(my_expression)
(Variable2 <= `400.0`)

The reason is that Python executes 200 <= Variable2 <= 400 as (200 <= Variable2) and (Variable2 <= 400). The and operator cannot be overloaded in Python. Therefore, it does not return a Biogeme expression. Note that Pandas does not allow chaining either, and has implemented a between function instead.

my_data.data['chaining_p'] = my_data.data['Variable2'].between(200, 400)
my_data.data
Person Exclude Variable1 Variable2 Choice Av1 Av2 Av3 newvar_b newvar_p chaining_p
0 1 0 10 100 2 0 1 0 110.0 110 False
1 1 0 20 200 2 1 1 1 220.0 220 True
2 1 1 30 300 3 1 1 1 330.0 330 True
3 2 0 40 400 1 1 1 1 440.0 440 True
4 2 1 50 500 2 1 1 1 550.0 550 False


The following type of expression is another literal, corresponding to an unknown parameter. Note that the value is just a starting value for the algorithm.

beta1 = ex.Beta('beta1', 0.2, None, None, 0)
beta2 = ex.Beta('beta2', 0.4, None, None, 0)

The last argument allows to fix the value of the parameter to the value.

beta3 = ex.Beta('beta3', 1, None, None, 1)
beta4 = ex.Beta('beta4', 0, None, None, 1)

Arithmetic operators are overloaded to allow standard manipulations of expressions.

expr0 = beta3 + beta4
print(expr0)
(Beta('beta3', 1, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) + Beta('beta4', 0, -1.3407807929942596e+154, 1.3407807929942596e+154, 1))

The evaluation of expressions can be done in two ways. For simple expressions, the fonction getValue, implemented in Python, returns the value of the expression.

expr0.getValue()
1

It is possible to modify the values of the parameters.

newvalues = {'beta1': 1, 'beta2': 2, 'beta3': 3, 'beta4': 2}
expr0.change_init_values(newvalues)
expr0.getValue()
5

Consider another expression:

\[e_1 = 2 \beta_1 - \frac{\exp(-\beta_2)}{\beta_2 (\beta_3 \geq \beta_4) + \beta_1 (\beta_3 < \beta_4)},\]

where \((\beta_2 \geq \beta_1)\) equals 1 if \(\beta_2 \geq \beta_1\) and 0 otherwise.

expr1 = 2 * beta1 - ex.exp(-beta2) / (
    beta2 * (beta3 >= beta4) + beta1 * (beta3 < beta4)
)
print(expr1)
((`2.0` * Beta('beta1', 0.2, -1.3407807929942596e+154, 1.3407807929942596e+154, 0)) - (exp((-Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0))) / ((Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * (Beta('beta3', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) >= Beta('beta4', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1))) + (Beta('beta1', 0.2, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * (Beta('beta3', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) < Beta('beta4', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1))))))

The function getValue_c is implemented in C++, and works for any expression. When use outside a specific context, the IDs must be explicitly prepared.

expr1.getValue_c(prepareIds=True)
-1.275800115089098

It actually calls the function getValueAndDerivates, and returns its first output (without calculating the derivatives).

f, g, h, bhhh = expr1.getValueAndDerivatives(prepareIds=True)
f
-1.275800115089098

We create a pandas DataFrame just to have a nicer display of the results.

pd.DataFrame(g)
0
0 2.0000
1 5.8653


pd.DataFrame(h)
0 1
0 0.0 0.000000
1 0.0 -31.002302


pd.DataFrame(bhhh)
0 1
0 4.000000 11.730601
1 11.730601 34.401749


Note that the BHHH matrix is the outer product of the gradient with itself.

pd.DataFrame(np.outer(g, g))
0 1
0 4.000000 11.730601
1 11.730601 34.401749


If the derivatives are not needed, their calculation can be skipped. Here, we calculate the gradient, but not the hessian.

expr1.getValueAndDerivatives(gradient=True, hessian=False, bhhh=False, prepareIds=True)
(-1.275800115089098, array([2.       , 5.8653004]), None, None)

It can also generate a function that takes the value of the parameters as argument, and provides a tuple with the value of the expression and its derivatives. By default, it returns the value of the function, its gradient and its hessian.

the_function = expr1.createFunction()

We evaluate it at one point…

the_function([1, 2])
(1.9323323583816936, array([2.        , 0.10150146]), array([[ 0.       ,  0.       ],
       [ 0.       , -0.1691691]]))

… and at another point.

the_function([10, -2])
(23.694528049465326, array([ 2.        , -1.84726402]), array([[0.        , 0.        ],
       [0.        , 1.84726402]]))

We can use it to check the derivatives.

tools.checkDerivatives(the_function, [1, 2], logg=True)
x               Gradient        FinDiff         Difference
x[0]            +2.000000E+00   +2.000000E+00   -1.167734E-09
x[1]            +1.015015E-01   +1.015014E-01   +1.629049E-08
Row             Col             Hessian FinDiff         Difference
x[0]            x[0]            +0.000000E+00   +0.000000E+00   +0.000000E+00
x[0]            x[1]            +0.000000E+00   +0.000000E+00   +0.000000E+00
x[1]            x[0]            +0.000000E+00   +0.000000E+00   +0.000000E+00
x[1]            x[1]            -1.691691E-01   -1.691691E-01   -3.203118E-08

(1.9323323583816936, array([2.        , 0.10150146]), array([[ 0.       ,  0.       ],
       [ 0.       , -0.1691691]]), array([-1.16773435e-09,  1.62904950e-08]), array([[ 0.00000000e+00,  0.00000000e+00],
       [ 0.00000000e+00, -3.20311803e-08]]))

And it is possible to also obtain the BHHH matrix.

the_function = expr1.createFunction(bhhh=True)
the_function([1, 2])
(1.9323323583816936, array([2.        , 0.10150146]), array([[ 0.       ,  0.       ],
       [ 0.       , -0.1691691]]), array([[4.        , 0.20300292],
       [0.20300292, 0.01030255]]))

It can take a database as input, and evaluate the expression and its derivatives for each entry in the database. In the following example, as no variable of the database is involved in the expression, the output of the expression is the same for each entry.

results = expr1.getValueAndDerivatives(database=my_data, aggregation=False)
print(len(results))
4
f_array, g_array, h_array, bhhh_array = results

for f, g, h, bhhh in zip(f_array, g_array, h_array, bhhh_array):
    print('******')
    print(f'{f=}')
    print(f'{g=}')
    print(f'{h=}')
    print(f'{bhhh=}')
******
f=1.9323323583816936
g=array([2.        , 0.10150146])
h=array([[ 0.       ,  0.       ],
       [ 0.       , -0.1691691]])
bhhh=array([[4.        , 0.20300292],
       [0.20300292, 0.01030255]])
******
f=1.9323323583816936
g=array([2.        , 0.10150146])
h=array([[ 0.       ,  0.       ],
       [ 0.       , -0.1691691]])
bhhh=array([[4.        , 0.20300292],
       [0.20300292, 0.01030255]])
******
f=1.9323323583816936
g=array([2.        , 0.10150146])
h=array([[ 0.       ,  0.       ],
       [ 0.       , -0.1691691]])
bhhh=array([[4.        , 0.20300292],
       [0.20300292, 0.01030255]])
******
f=1.9323323583816936
g=array([2.        , 0.10150146])
h=array([[ 0.       ,  0.       ],
       [ 0.       , -0.1691691]])
bhhh=array([[4.        , 0.20300292],
       [0.20300292, 0.01030255]])
******
f=1.9323323583816936
g=array([2.        , 0.10150146])
h=array([[ 0.       ,  0.       ],
       [ 0.       , -0.1691691]])
bhhh=array([[4.        , 0.20300292],
       [0.20300292, 0.01030255]])

If aggregation is set to True, the results are accumulated as a sum.

f, g, h, bhhh = expr1.getValueAndDerivatives(database=my_data, aggregation=True)
print(f'{f=}')
print(f'{g=}')
print(f'{h=}')
print(f'{bhhh=}')
f=9.661661791908468
g=array([10.        ,  0.50750731])
h=array([[ 0.        ,  0.        ],
       [ 0.        , -0.84584552]])
bhhh=array([[20.        ,  1.01501462],
       [ 1.01501462,  0.05151273]])

The following function scans the expression and extracts a dict with all free parameters.

expr1.set_of_elementary_expression(TypeOfElementaryExpression.FREE_BETA)
{'beta2', 'beta1'}

Options can be set to extract free parameters, fixed parameters, or both.

expr1.set_of_elementary_expression(TypeOfElementaryExpression.FIXED_BETA)
{'beta4', 'beta3'}
expr1.set_of_elementary_expression(TypeOfElementaryExpression.BETA)
{'beta2', 'beta4', 'beta1', 'beta3'}

It is possible also to extract an elementary expression from its name.

expr1.getElementaryExpression('beta2')
Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0)

Let’s consider an expression involving two variables \(V_1\) and \(V_2\):

\[e_2 = 2 \beta_1 V_1 - \frac{\exp(-\beta_2 V_2)}{\beta_2 (\beta_3 \geq \beta_4) + \beta_1 (\beta_3 < \beta_4)},\]

where \((\beta_2 \geq \beta_1)\) equals 1 if \(\beta_2 \geq \beta_1\) and 0 otherwise. Note that, in our example, the second term is numerically negligible with respect to the first one.

expr2 = 2 * beta1 * Variable1 - ex.exp(-beta2 * Variable2) / (
    beta2 * (beta3 >= beta4) + beta1 * (beta3 < beta4)
)
print(expr2)
(((`2.0` * Beta('beta1', 0.2, -1.3407807929942596e+154, 1.3407807929942596e+154, 0)) * Variable1) - (exp(((-Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0)) * Variable2)) / ((Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * (Beta('beta3', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) >= Beta('beta4', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1))) + (Beta('beta1', 0.2, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * (Beta('beta3', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) < Beta('beta4', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1))))))

It is not a simple expression anymore, and only the function getValue_c can be invoked. If we try the getValue function, it raises an exception.

try:
    expr2.getValue()
except excep.BiogemeError as e:
    print(f'Exception raised: {e}')
Exception raised: Parameter beta1 must be estimated from data.

As the expression is called out of a specific context, it should be instructed to prepare its IDs. Note that if no database is provided, an exception is raised when the formula contains variables. Indeed, the values of these variables cannot be found anywhere.

try:
    expr2.getValue_c(prepareIds=True)
except excep.BiogemeError as e:
    print(f'Exception raised: {e}')
Exception raised: No database is provided and an expression contains variables: {'Variable1', 'Variable2'}
expr2.getValue_c(database=my_data, aggregation=False, prepareIds=True)
array([ 4.,  8., 12., 16., 20.])

The following function extracts the names of the parameters apprearing in the expression.

expr2.set_of_elementary_expression(TypeOfElementaryExpression.BETA)
{'beta2', 'beta4', 'beta1', 'beta3'}

The list of parameters can also be obtained in the form of a dictionary.

expr2.dict_of_elementary_expression(TypeOfElementaryExpression.BETA)
{'beta1': Beta('beta1', 0.2, -1.3407807929942596e+154, 1.3407807929942596e+154, 0), 'beta2': Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0), 'beta3': Beta('beta3', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1), 'beta4': Beta('beta4', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1)}

The list of variables can also be obtained in the form of a dictionary.

expr2.dict_of_elementary_expression(TypeOfElementaryExpression.VARIABLE)
{'Variable1': Variable1, 'Variable2': Variable2}

or a set…

expr2.set_of_elementary_expression(TypeOfElementaryExpression.VARIABLE)
{'Variable1', 'Variable2'}

Expressions are defined recursively, using a tree representation. The following function describes the type of the upper most node of the tree.

expr2.getClassName()
'Minus'

The signature is a formal representation of the expression, assigning identifiers to each node of the tree, and representing them starting from the leaves. It is easy to parse, and is passed to the C++ implementation.

As the expression is used out of a specific context, it must be prepared before using it.

expr2.prepare(database=my_data, numberOfDraws=0)
expr2.getStatusIdManager()
print(expr2)
(((`2.0` * Beta('beta1', 0.2, -1.3407807929942596e+154, 1.3407807929942596e+154, 0)) * Variable1) - (exp(((-Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0)) * Variable2)) / ((Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * (Beta('beta3', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) >= Beta('beta4', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1))) + (Beta('beta1', 0.2, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * (Beta('beta3', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) < Beta('beta4', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1))))))
expr2.getSignature()
[b'<Numeric>{5751161296},2.0', b'<Beta>{5750675920}"beta1"[0],0,0', b'<Times>{5750670544}(2),5751161296,5750675920', b'<Variable>{5682743680}"Variable1",6,2', b'<Times>{5751161344}(2),5750670544,5682743680', b'<Beta>{5750672224}"beta2"[0],1,1', b'<UnaryMinus>{5751165568}(1),5750672224', b'<Variable>{5682752224}"Variable2",7,3', b'<Times>{5751165808}(2),5751165568,5682752224', b'<exp>{5751165040}(1),5751165808', b'<Beta>{5750672224}"beta2"[0],1,1', b'<Beta>{5750672848}"beta3"[1],2,0', b'<Beta>{5750670736}"beta4"[1],3,1', b'<GreaterOrEqual>{5752127712}(2),5750672848,5750670736', b'<Times>{5752133232}(2),5750672224,5752127712', b'<Beta>{5750675920}"beta1"[0],0,0', b'<Beta>{5750672848}"beta3"[1],2,0', b'<Beta>{5750670736}"beta4"[1],3,1', b'<Less>{5752129776}(2),5750672848,5750670736', b'<Times>{5752128096}(2),5750675920,5752129776', b'<Plus>{5752143840}(2),5752133232,5752128096', b'<Divide>{5752143312}(2),5751165040,5752143840', b'<Minus>{5752140912}(2),5751161344,5752143312']

The elementary expressions are

  • free parameters,

  • fixed parameters,

  • random variables (for numerical integration),

  • draws (for Monte-Carlo integration), and

  • variables from the database.

The following function extracts all elementary expressions from a list of formulas, give them a unique numbering, and return them organized by group, as defined above (with the exception of the variables, that are directly available in the database).

collection_of_formulas = [expr1, expr2]
formulas = IdManager(collection_of_formulas, my_data, None)

Unique numbering for all elementary expressions.

formulas.elementary_expressions.indices
{'beta1': 0, 'beta2': 1, 'beta3': 2, 'beta4': 3, 'Person': 4, 'Exclude': 5, 'Variable1': 6, 'Variable2': 7, 'Choice': 8, 'Av1': 9, 'Av2': 10, 'Av3': 11, 'newvar_b': 12, 'newvar_p': 13, 'chaining_p': 14}
formulas.free_betas
ElementsTuple(expressions={'beta1': Beta('beta1', 0.2, -1.3407807929942596e+154, 1.3407807929942596e+154, 0), 'beta2': Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0)}, indices={'beta1': 0, 'beta2': 1}, names=['beta1', 'beta2'])

Each elementary expression has two ids. One unique index across all elementary expressions, and one unique within each specific group.

[(i.elementaryIndex, i.betaId) for k, i in formulas.free_betas.expressions.items()]
[(0, 0), (1, 1)]
formulas.free_betas.names
['beta1', 'beta2']
formulas.fixed_betas
ElementsTuple(expressions={'beta3': Beta('beta3', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1), 'beta4': Beta('beta4', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1)}, indices={'beta3': 0, 'beta4': 1}, names=['beta3', 'beta4'])
[(i.elementaryIndex, i.betaId) for k, i in formulas.fixed_betas.expressions.items()]
[(2, 0), (3, 1)]
formulas.fixed_betas.names
['beta3', 'beta4']
formulas.random_variables
ElementsTuple(expressions={}, indices={}, names=[])

Monte Carlo integration is based on draws.

my_draws = ex.bioDraws('my_draws', 'UNIFORM')
expr3 = ex.MonteCarlo(my_draws * my_draws)
print(expr3)
MonteCarlo((bioDraws("my_draws", "UNIFORM") * bioDraws("my_draws", "UNIFORM")))

Note that draws are different from random variables, used for numerical integration.

expr3.set_of_elementary_expression(TypeOfElementaryExpression.RANDOM_VARIABLE)
set()

The following function reports the draws involved in an expression.

expr3.set_of_elementary_expression(TypeOfElementaryExpression.DRAWS)
{'my_draws'}

The following function checks if draws are defined outside MonteCarlo, and return their names.

wrong_expression = my_draws + ex.MonteCarlo(my_draws * my_draws)
wrong_expression.check_draws()
{'my_draws'}

Checking the correct expression returns an empty set.

expr3.check_draws()
set()

The expression is a Monte-Carlo integration.

expr3.getClassName()
'MonteCarlo'

Note that the draws are associated with a database. Therefore, the evaluation of expressions involving Monte Carlo integration can only be done on a database. If none is provided, an exception is raised.

try:
    expr3.getValue_c(numberOfDraws=NUMBER_OF_DRAWS)
except excep.BiogemeError as e:
    print(f'Exception raised: {e}')
Exception raised: An expression involving MonteCarlo integration must be associated with a database.

Here is its value. It is an approximation of

\[\int_0^1 x^2 dx=\frac{1}{3}.\]
expr3.getValue_c(database=my_data, numberOfDraws=NUMBER_OF_DRAWS, prepareIds=True)
array([0.34528706, 0.32486975, 0.32311505, 0.31370743, 0.40247583])

Here is its signature.

expr3.prepare(database=my_data, numberOfDraws=NUMBER_OF_DRAWS)
expr3.getSignature()
[b'<bioDraws>{5752139904}"my_draws",0,0', b'<bioDraws>{5752139904}"my_draws",0,0', b'<Times>{5752137408}(2),5752139904,5752139904', b'<MonteCarlo>{5752129728}(1),5752137408']

The same integral can be calculated using numerical integration, declaring a random variable.

omega = ex.RandomVariable('omega')

Numerical integration calculates integrals between \(-\infty\) and \(+\infty\). Here, the interval being \([0,1]\), a change of variables is required.

a = 0
b = 1
x = a + (b - a) / (1 + ex.exp(-omega))
dx = (b - a) * ex.exp(-omega) * (1 + ex.exp(-omega)) ** (-2)
integrand = x * x
expr4 = ex.Integrate(integrand * dx / (b - a), 'omega')

In this case, omega is a random variable.

expr4.dict_of_elementary_expression(TypeOfElementaryExpression.RANDOM_VARIABLE)
{'omega': omega}
print(expr4)
Integrate(((((`0.0` + (`1.0` / (`1.0` + exp((-omega))))) * (`0.0` + (`1.0` / (`1.0` + exp((-omega)))))) * ((`1.0` * exp((-omega))) * ((`1.0` + exp((-omega))) ** `-2.0`))) / `1.0`), "omega")

The folllowing function checks if random variables are defined outside an Integrate statement.

wrong_expression = x * x
wrong_expression.check_rv()
{'omega'}

The same function called from the correct expression returns an empty set.

expr4.check_rv()
set()

Calculating its value requires the C++ implementation.

expr4.getValue_c(my_data, prepareIds=True)
array([0.33333231, 0.33333231, 0.33333231, 0.33333231, 0.33333231])

We illustrate now the Elem function. It takes two arguments: a dictionary, and a formula for the key. For each entry in the database, the formula is evaluated, and its result identifies which formula in the dictionary should be evaluated. Here is ‘Person’ is 1, the expression is

\[e_1=2 \beta_1 - \frac{\exp(-\beta_2)}{\beta_3 (\beta_2 \geq \beta_1)},\]

and if ‘Person’ is 2, the expression is

\[e_2=2 \beta_1 V_1 - \frac{\exp(-\beta_2 V_2) }{ \beta_3 (\beta_2 \geq \beta_1)}.\]

As it is a regular expression, it can be included in any formula. Here, we illustrate it by dividing the result by 10.

elemExpr = ex.Elem({1: expr1, 2: expr2}, Person)
expr5 = elemExpr / 10
print(expr5)
({{1:((`2.0` * Beta('beta1', 0.2, -1.3407807929942596e+154, 1.3407807929942596e+154, 0)) - (exp((-Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0))) / ((Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * (Beta('beta3', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) >= Beta('beta4', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1))) + (Beta('beta1', 0.2, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * (Beta('beta3', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) < Beta('beta4', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1)))))), 2:(((`2.0` * Beta('beta1', 0.2, -1.3407807929942596e+154, 1.3407807929942596e+154, 0)) * Variable1) - (exp(((-Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0)) * Variable2)) / ((Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * (Beta('beta3', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) >= Beta('beta4', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1))) + (Beta('beta1', 0.2, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * (Beta('beta3', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) < Beta('beta4', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1))))))}[Person] / `10.0`)
expr5.dict_of_elementary_expression(TypeOfElementaryExpression.VARIABLE)
{'Person': Person, 'Variable1': Variable1, 'Variable2': Variable2}

Note that Variable1 and Variable2 have previously been involved in another formula. Therefore, they have been numbered according to this formula, and this numbering is invalid for the new expression expr5. An error is triggered

try:
    expr5.getValue_c(database=my_data)
except excep.BiogemeError as e:
    print(e)
Expression evaluated out of context. Set prepareIds to True.
expr5.getValue_c(database=my_data, prepareIds=True)
array([-0.12758001, -0.12758001, -0.12758001,  1.6       ,  2.        ])
testElem = ex.MonteCarlo(ex.Elem({1: my_draws * my_draws}, 1))
testElem.audit()
([], [])

The next expression is simply the sum of multiple expressions. The argument is a list of expressions.

expr6 = ex.bioMultSum([expr1, expr2, expr4])
print(expr6)
bioMultSum(((`2.0` * Beta('beta1', 0.2, -1.3407807929942596e+154, 1.3407807929942596e+154, 0)) - (exp((-Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0))) / ((Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * (Beta('beta3', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) >= Beta('beta4', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1))) + (Beta('beta1', 0.2, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * (Beta('beta3', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) < Beta('beta4', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1)))))), (((`2.0` * Beta('beta1', 0.2, -1.3407807929942596e+154, 1.3407807929942596e+154, 0)) * Variable1) - (exp(((-Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0)) * Variable2)) / ((Beta('beta2', 0.4, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * (Beta('beta3', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) >= Beta('beta4', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1))) + (Beta('beta1', 0.2, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * (Beta('beta3', 3, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) < Beta('beta4', 2, -1.3407807929942596e+154, 1.3407807929942596e+154, 1)))))), Integrate(((((`0.0` + (`1.0` / (`1.0` + exp((-omega))))) * (`0.0` + (`1.0` / (`1.0` + exp((-omega)))))) * ((`1.0` * exp((-omega))) * ((`1.0` + exp((-omega))) ** `-2.0`))) / `1.0`), "omega"))
expr6.getValue_c(database=my_data, numberOfDraws=NUMBER_OF_DRAWS, prepareIds=True)
array([ 3.0575322,  7.0575322, 11.0575322, 15.0575322, 19.0575322])

We now illustrate how to calculate a logit model, that is

\[\frac{y_1 e^{V_1}}{y_0 e^{V_0}+y_1 e^{V_1}+y_2 e^{V_2}}\]

where \(V_0=-\beta_1\), \(V_1=-\beta_2\) and \(V_2=-\beta_1\), and \(y_i = 1\), \(i=1,2,3\).

V = {0: -beta1, 1: -beta2, 2: -beta1}
av = {0: 1, 1: 1, 2: 1}
expr7 = ex._bioLogLogit(V, av, 1)
expr7.getValue_c(prepareIds=True)
-1.2362866960692136

If the alternative is not in the choice set, an exception is raised.

expr7_wrong = ex.LogLogit(V, av, 3)
try:
    expr7_wrong.getValue()
except excep.BiogemeError as e:
    print(f'Exception: {e}')
Exception: Alternative 3 does not appear in the list of utility functions: dict_keys([0, 1, 2])

It is actually better to use the C++ implementation, available in the module models.

expr8 = models.loglogit(V, av, 1)
expr8.getValue_c(database=my_data, prepareIds=True)
array([-1.2362867, -1.2362867, -1.2362867, -1.2362867, -1.2362867])

As the result is a numpy array, it can be used for any calculation. Here, we show how to calculate the logsum.

for v in V.values():
    print(v.getValue_c(database=my_data, prepareIds=True))
[-0.2 -0.2 -0.2 -0.2 -0.2]
[-0.4 -0.4 -0.4 -0.4 -0.4]
[-0.2 -0.2 -0.2 -0.2 -0.2]
logsum = np.log(
    np.sum(
        [np.exp(v.getValue_c(database=my_data, prepareIds=True)) for v in V.values()],
        axis=1,
    )
)
logsum
array([1.40943791, 1.20943791, 1.40943791])

It is possible to calculate the derivative of a formula with respect to a literal:

\[e_9=\frac{\partial e_8}{\partial \beta_2}.\]
expr9 = ex.Derive(expr8, 'beta2')
expr9.getValue_c(database=my_data, prepareIds=True)
array([-0.70953921, -0.70953921, -0.70953921, -0.70953921, -0.70953921])
expr9.elementaryName
'beta2'

Biogeme also provides an approximation of the CDF of the normal distribution:

\[e_{10}= \frac{1}{{\sigma \sqrt {2\pi } }}\int_{-\infty}^t e^{{{ - \left( {x - \mu } \right)^2 } \mathord{\left/ {\vphantom {{ - \left( {x - \mu } \right)^2 } {2\sigma ^2 }}} \right. } {2\sigma ^2 }}}dx.\]
expr10 = ex.bioNormalCdf(Variable1 / 10 - 1)
expr10.getValue_c(database=my_data, prepareIds=True)
array([0.5       , 0.84134475, 0.97724987, 0.9986501 , 0.99996833])

Min and max operators are also available. To avoid any ambiguity with the Python operator, they are called bioMin and bioMax.

expr11 = ex.bioMin(expr5, expr10)
expr11.getValue_c(database=my_data, prepareIds=True)
array([-0.12758001, -0.12758001, -0.12758001,  0.9986501 ,  0.99996833])
expr12 = ex.bioMax(expr5, expr10)
expr12.getValue_c(database=my_data, prepareIds=True)
array([0.5       , 0.84134475, 0.97724987, 1.6       , 2.        ])

For the sake of efficiency, it is possible to specify explicitly a linear function, where each term is the product of a parameter and a variable.

terms = [
    (beta1, ex.Variable('Variable1')),
    (beta2, ex.Variable('Variable2')),
    (beta3, ex.Variable('newvar_b')),
]
expr13 = ex.bioLinearUtility(terms)
expr13.getValue_c(database=my_data, prepareIds=True)
array([ 372.,  744., 1116., 1488., 1860.])

In terms of specification, it is equivalent to the expression below. But the calculation of the derivatives is more efficient, as the linear structure of the specification is exploited.

expr13bis = beta1 * Variable1 + beta2 * Variable2 + beta3 * newvar_b
expr13bis.getValue_c(database=my_data, prepareIds=True)
array([ 372.,  744., 1116., 1488., 1860.])

A Pythonic way to write a linear utility function.

variables = ['v1', 'v2', 'v3', 'cost', 'time', 'headway']
coefficients = {f'{v}': ex.Beta(f'beta_{v}', 0, None, None, 0) for v in variables}
terms = [coefficients[v] * ex.Variable(v) for v in variables]
util = sum(terms)
print(util)
((((((`0.0` + (Beta('beta_v1', 0, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * v1)) + (Beta('beta_v2', 0, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * v2)) + (Beta('beta_v3', 0, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * v3)) + (Beta('beta_cost', 0, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * cost)) + (Beta('beta_time', 0, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * time)) + (Beta('beta_headway', 0, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * headway))

If the data is organized a panel data, it means that several rows correspond to the same individual. The expression PanelLikelihoodTrajectory calculates the product of the expression evaluated for each row. If Monte Carlo integration is involved, the same draws are used for each them.

Our database contains 5 observations.

my_data.getSampleSize()
5
my_data.panel('Person')

Once the data has been labeled as “panel”, it is considered that there are only two series of observations, corresponding to each person. Each of these observations is associated with several rows of observations.

my_data.getSampleSize()
2

If we try to evaluate again the integral \(\int_0^1 x^2 dx=\frac{1}{3}\), an exception is raised.

try:
    expr3.getValue_c(database=my_data)
except excep.BiogemeError as e:
    print(f'Exception: {e}')
As the database is panel, the argument of MonteCarlo must contain a PanelLikelihoodTrajectory: MonteCarlo((bioDraws("my_draws", "UNIFORM") * bioDraws("my_draws", "UNIFORM")))
Exception: As the database is panel, the argument of MonteCarlo must contain a PanelLikelihoodTrajectory: MonteCarlo((bioDraws("my_draws", "UNIFORM") * bioDraws("my_draws", "UNIFORM")))

This is detected by the audit function, called before the expression is evaluated.

expr3.audit(database=my_data)
(['As the database is panel, the argument of MonteCarlo must contain a PanelLikelihoodTrajectory: MonteCarlo((bioDraws("my_draws", "UNIFORM") * bioDraws("my_draws", "UNIFORM")))'], [])

We now evaluate an expression for panel data.

c1 = ex.bioDraws('draws1', 'NORMAL_HALTON2')
c2 = ex.bioDraws('draws2', 'NORMAL_HALTON2')
U1 = ex.Beta('beta1', 0, None, None, 0) * Variable1 + 10 * c1
U2 = ex.Beta('beta2', 0, None, None, 0) * Variable2 + 10 * c2
U3 = 0
U = {1: U1, 2: U2, 3: U3}
av = {1: Av1, 2: Av2, 3: Av3}
expr14 = ex.log(
    ex.MonteCarlo(ex.PanelLikelihoodTrajectory(models.logit(U, av, Choice)))
)
expr14.prepare(database=my_data, numberOfDraws=NUMBER_OF_DRAWS)
expr14
log(MonteCarlo(PanelLikelihoodTrajectory(exp(_bioLogLogit[choice=Choice]U=(1:((Beta('beta1', 0, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * Variable1) + (`10.0` * bioDraws("draws1", "NORMAL_HALTON2"))), 2:((Beta('beta2', 0, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * Variable2) + (`10.0` * bioDraws("draws2", "NORMAL_HALTON2"))), 3:`0.0`)av=(1:Av1, 2:Av2, 3:Av3)))))
expr14.getValue_c(database=my_data, numberOfDraws=NUMBER_OF_DRAWS, prepareIds=True)
array([-3.91914292, -2.11209896])
expr14.getValueAndDerivatives(
    database=my_data,
    numberOfDraws=NUMBER_OF_DRAWS,
    gradient=True,
    hessian=True,
    aggregation=False,
)
(array([-3.91914292, -2.11209896]), array([[-12.31921998,  76.80780015],
       [ -3.14130423,  68.58695767]]), array([[[  -165.65755306,   1546.42166536],
        [  1546.42166536, -16565.75530623]],

       [[  -987.62129533,   9777.04786414],
        [  9777.04786414, -98762.12953279]]]), array([[[ 151.76318103, -946.21218663],
        [-946.21218663, 5899.4381646 ]],

       [[   9.86779229, -215.45250047],
        [-215.45250047, 4704.17076195]]]))
expr14.getValueAndDerivatives(
    database=my_data,
    numberOfDraws=NUMBER_OF_DRAWS,
    gradient=True,
    hessian=True,
    aggregation=True,
)
(-6.0312418791725335, array([-15.46052422, 145.39475782]), array([[  -1153.27884839,   11323.46952949],
       [  11323.46952949, -115327.88483902]]), array([[  161.63097331, -1161.6646871 ],
       [-1161.6646871 , 10603.60892655]]))

A Python function can also be obtained for this expression. Note that it is available only for the aggregated version, summing over the database.

the_function = expr14.createFunction(
    database=my_data, numberOfDraws=NUMBER_OF_DRAWS, gradient=True, hessian=True
)
the_function([0, 0])
(-6.0312418791725335, array([-15.46052422, 145.39475782]), array([[  -1153.27884839,   11323.46952949],
       [  11323.46952949, -115327.88483902]]))
the_function([0.1, 0.1])
(-49.645666583910895, array([  39.99999992, -553.04056325]), array([[-1.62802916e-06,  1.46098013e-05],
       [ 1.46098013e-05, -1.18518603e+03]]))

It is possible to fix the value of some (or all) beta parameters

print(expr14)
log(MonteCarlo(PanelLikelihoodTrajectory(exp(_bioLogLogit[choice=Choice]U=(1:((Beta('beta1', 0, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * Variable1) + (`10.0` * bioDraws("draws1", "NORMAL_HALTON2"))), 2:((Beta('beta2', 0, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * Variable2) + (`10.0` * bioDraws("draws2", "NORMAL_HALTON2"))), 3:`0.0`)av=(1:Av1, 2:Av2, 3:Av3)))))
expr14.fix_betas({'beta2': 0.123})
print(expr14)
log(MonteCarlo(PanelLikelihoodTrajectory(exp(_bioLogLogit[choice=Choice]U=(1:((Beta('beta1', 0, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * Variable1) + (`10.0` * bioDraws("draws1", "NORMAL_HALTON2"))), 2:((Beta('beta2', 0.123, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) * Variable2) + (`10.0` * bioDraws("draws2", "NORMAL_HALTON2"))), 3:`0.0`)av=(1:Av1, 2:Av2, 3:Av3)))))

The name of the parameter can also be changed while fixing its value.

expr14.fix_betas({'beta2': 123}, prefix='prefix_', suffix='_suffix')
print(expr14)
log(MonteCarlo(PanelLikelihoodTrajectory(exp(_bioLogLogit[choice=Choice]U=(1:((Beta('beta1', 0, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * Variable1) + (`10.0` * bioDraws("draws1", "NORMAL_HALTON2"))), 2:((Beta('prefix_beta2_suffix', 123, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) * Variable2) + (`10.0` * bioDraws("draws2", "NORMAL_HALTON2"))), 3:`0.0`)av=(1:Av1, 2:Av2, 3:Av3)))))

It can also be renamed using the following function.

expr14.rename_elementary(['prefix_beta2_suffix'], prefix='PREFIX_', suffix='_SUFFIX')
print(expr14)
log(MonteCarlo(PanelLikelihoodTrajectory(exp(_bioLogLogit[choice=Choice]U=(1:((Beta('beta1', 0, -1.3407807929942596e+154, 1.3407807929942596e+154, 0) * Variable1) + (`10.0` * bioDraws("draws1", "NORMAL_HALTON2"))), 2:((Beta('PREFIX_prefix_beta2_suffix_SUFFIX', 123, -1.3407807929942596e+154, 1.3407807929942596e+154, 1) * Variable2) + (`10.0` * bioDraws("draws2", "NORMAL_HALTON2"))), 3:`0.0`)av=(1:Av1, 2:Av2, 3:Av3)))))

Signatures

The Python library communicates the expressions to the C++ library using a syntax called a “signature”. We describe and illustrate now the signature for each expression. Each expression is identified by an identifier provided by Python using the function ‘id’.

id(expr1)
5751164800

Numerical expression

<Numeric>{identifier},0.0

ex.Numeric(0).getSignature()
[b'<Numeric>{11840887536},0.0']

Beta parameters

<Beta>{identifier}”name”[status],uniqueId,betaId’ where

  • status is 0 for free parameters, and non zero for fixed parameters,

  • uniqueId is a unique index given by Biogeme to all elementary expressions,

  • betaId is a unique index given by Biogeme to all free parameters, and to all fixed parameters.

As the signature requires an Id, we need to prepare the expression first.

beta1.prepare(database=my_data, numberOfDraws=0)
beta1.getSignature()
[b'<Beta>{5750675920}"beta1"[0],0,0']
beta3.prepare(database=my_data, numberOfDraws=0)
beta3.getSignature()
[b'<Beta>{5750672848}"beta3"[1],0,0']

Variables

<Variable>{identifier}”name”,uniqueId,variableId where

  • uniqueId is a unique index given by Biogeme to all elementary expressions,

  • variableId is a unique index given by Biogeme to all variables.

Variable1.getSignature()
[b'<Variable>{5682743680}"Variable1",6,2']

Random variables

<RandomVariable>{identifier}”name”,uniqueId,randomVariableId where

  • uniqueId is a unique index given by Biogeme to all elementary expressions,

  • randomVariableId is a unique index given by Biogeme to all random variables.

omega.prepare(database=my_data, numberOfDraws=0)
omega.getSignature()
[b'<RandomVariable>{5752133328}"omega",0,0']

Draws

<bioDraws>{identifier}”name”,uniqueId,drawId where

  • uniqueId is a unique index given by Biogeme to all elementary expressions,

  • drawId is a unique index given by Biogeme to all draws.

my_draws.prepare(database=my_data, numberOfDraws=NUMBER_OF_DRAWS)
my_draws.getSignature()
[b'<bioDraws>{5752139904}"my_draws",0,0']

General expression

<operator>{identifier}(numberOfChildren),idFirstChild,idSecondChild,idThirdChild, etc…

where the number of identifiers given after the comma matches the reported number of children.

Specific examples are reported below.

Binary operator

<code><operator>{identifier}(2),idFirstChild,idSecondChild </code> where operator is one of:

  • Plus

  • Minus

  • Times

  • Divide

  • Power

  • bioMin

  • bioMax

  • And

  • Or

  • Equal

  • NotEqual

  • LessOrEqual

  • GreaterOrEqual

  • Less

  • Greater

the_sum = beta1 + Variable1
the_sum.getSignature()
[b'<Beta>{5750675920}"beta1"[0],0,0', b'<Variable>{5682743680}"Variable1",6,2', b'<Plus>{11840884560}(2),5750675920,5682743680']

Unary operator

<operator>{identifier}(1),idChild, where operator is one of:

  • UnaryMinus

  • MonteCarlo

  • bioNormalCdf

  • PanelLikelihoodTrajectory

  • exp

  • log

m = -beta1
m.getSignature()
[b'<Beta>{5750675920}"beta1"[0],0,0', b'<UnaryMinus>{11840895024}(1),5750675920']

LogLogit

<LogLogit>{identifier}(nbrOfAlternatives),chosenAlt,altNumber,utility,availability,altNumber,utility,availability, etc.

expr7.prepare(database=my_data, numberOfDraws=NUMBER_OF_DRAWS)
expr7.getSignature()
[b'<Numeric>{5751137264},1.0', b'<Beta>{5750675920}"beta1"[0],0,0', b'<UnaryMinus>{5752137600}(1),5750675920', b'<Beta>{5750672224}"beta2"[0],1,1', b'<UnaryMinus>{5752131120}(1),5750672224', b'<Beta>{5750675920}"beta1"[0],0,0', b'<UnaryMinus>{5752140144}(1),5750675920', b'<Numeric>{5752140240},1.0', b'<Numeric>{5752129008},1.0', b'<Numeric>{5751129728},1.0', b'<_bioLogLogit>{5752138128}(3),5751137264,0,5752137600,5752140240,1,5752131120,5752129008,2,5752140144,5751129728']

Derive

<Derive>{identifier},id of expression to derive,unique index of elementary expression

expr9.prepare(database=my_data, numberOfDraws=NUMBER_OF_DRAWS)
expr9.getSignature()
[b'<Numeric>{5751144416},1.0', b'<Beta>{5750675920}"beta1"[0],0,0', b'<UnaryMinus>{5752137600}(1),5750675920', b'<Beta>{5750672224}"beta2"[0],1,1', b'<UnaryMinus>{5752131120}(1),5750672224', b'<Beta>{5750675920}"beta1"[0],0,0', b'<UnaryMinus>{5752140144}(1),5750675920', b'<Numeric>{5751133568},1.0', b'<Numeric>{5751137504},1.0', b'<Numeric>{5751129632},1.0', b'<_bioLogLogit>{5751129776}(3),5751144416,0,5752137600,5751133568,1,5752131120,5751137504,2,5752140144,5751129632', b'<Derive>{5751133664},5751129776,1']

Integrate

<Integrate>{identifier},id of expression to derive,index of random variable

expr4.prepare(database=my_data, numberOfDraws=NUMBER_OF_DRAWS)
expr4.getSignature()
[b'<Numeric>{5752140336},0.0', b'<Numeric>{5752134816},1.0', b'<Numeric>{5752140048},1.0', b'<RandomVariable>{5752133328}"omega",0,0', b'<UnaryMinus>{5752142016}(1),5752133328', b'<exp>{5752137456}(1),5752142016', b'<Plus>{5752140192}(2),5752140048,5752137456', b'<Divide>{5752135248}(2),5752134816,5752140192', b'<Plus>{5752134096}(2),5752140336,5752135248', b'<Numeric>{5752140336},0.0', b'<Numeric>{5752134816},1.0', b'<Numeric>{5752140048},1.0', b'<RandomVariable>{5752133328}"omega",0,0', b'<UnaryMinus>{5752142016}(1),5752133328', b'<exp>{5752137456}(1),5752142016', b'<Plus>{5752140192}(2),5752140048,5752137456', b'<Divide>{5752135248}(2),5752134816,5752140192', b'<Plus>{5752134096}(2),5752140336,5752135248', b'<Times>{5752130928}(2),5752134096,5752134096', b'<Numeric>{5752129488},1.0', b'<RandomVariable>{5752133328}"omega",0,0', b'<UnaryMinus>{5752141200}(1),5752133328', b'<exp>{5752129248}(1),5752141200', b'<Times>{5752141632}(2),5752129488,5752129248', b'<Numeric>{5752129536},1.0', b'<RandomVariable>{5752133328}"omega",0,0', b'<UnaryMinus>{5752141392}(1),5752133328', b'<exp>{5752142976}(1),5752141392', b'<Plus>{5752130736}(2),5752129536,5752142976', b'<Numeric>{5752141920},-2.0', b'<Power>{5752130016}(2),5752130736,5752141920', b'<Times>{5752131792}(2),5752141632,5752130016', b'<Times>{4926460880}(2),5752130928,5752131792', b'<Numeric>{5752143024},1.0', b'<Divide>{5752141872}(2),4926460880,5752143024', b'<Integrate>{5752131264},5752141872,0']

Elem

<Elem>{identifier}(number_of_expressions),keyId,value1,expression1,value2,expression2, etc…

where

  • keyId is the identifier of the expression calculating the key,

  • the number of pairs valuex,expressionx must correspond to the value of number_of_expressions

elemExpr.prepare(database=my_data, numberOfDraws=NUMBER_OF_DRAWS)
elemExpr.getSignature()
[b'<Variable>{5682741376}"Person",4,0', b'<Numeric>{5751165904},2.0', b'<Beta>{5750675920}"beta1"[0],0,0', b'<Times>{5751162352}(2),5751165904,5750675920', b'<Beta>{5750672224}"beta2"[0],1,1', b'<UnaryMinus>{5751177088}(1),5750672224', b'<exp>{5751176992}(1),5751177088', b'<Beta>{5750672224}"beta2"[0],1,1', b'<Beta>{5750672848}"beta3"[1],2,0', b'<Beta>{5750670736}"beta4"[1],3,1', b'<GreaterOrEqual>{5751175984}(2),5750672848,5750670736', b'<Times>{5751163744}(2),5750672224,5751175984', b'<Beta>{5750675920}"beta1"[0],0,0', b'<Beta>{5750672848}"beta3"[1],2,0', b'<Beta>{5750670736}"beta4"[1],3,1', b'<Less>{5751164896}(2),5750672848,5750670736', b'<Times>{5751161200}(2),5750675920,5751164896', b'<Plus>{5751162928}(2),5751163744,5751161200', b'<Divide>{5751175936}(2),5751176992,5751162928', b'<Minus>{5751164800}(2),5751162352,5751175936', b'<Numeric>{5751161296},2.0', b'<Beta>{5750675920}"beta1"[0],0,0', b'<Times>{5750670544}(2),5751161296,5750675920', b'<Variable>{5682743680}"Variable1",6,2', b'<Times>{5751161344}(2),5750670544,5682743680', b'<Beta>{5750672224}"beta2"[0],1,1', b'<UnaryMinus>{5751165568}(1),5750672224', b'<Variable>{5682752224}"Variable2",7,3', b'<Times>{5751165808}(2),5751165568,5682752224', b'<exp>{5751165040}(1),5751165808', b'<Beta>{5750672224}"beta2"[0],1,1', b'<Beta>{5750672848}"beta3"[1],2,0', b'<Beta>{5750670736}"beta4"[1],3,1', b'<GreaterOrEqual>{5752127712}(2),5750672848,5750670736', b'<Times>{5752133232}(2),5750672224,5752127712', b'<Beta>{5750675920}"beta1"[0],0,0', b'<Beta>{5750672848}"beta3"[1],2,0', b'<Beta>{5750670736}"beta4"[1],3,1', b'<Less>{5752129776}(2),5750672848,5750670736', b'<Times>{5752128096}(2),5750675920,5752129776', b'<Plus>{5752143840}(2),5752133232,5752128096', b'<Divide>{5752143312}(2),5751165040,5752143840', b'<Minus>{5752140912}(2),5751161344,5752143312', b'<Elem>{5752132320}(2),5682741376,1,5751164800,2,5752140912']

bioLinearUtility

<bioLinearUtility>{identifier}(numberOfTerms), beta1_exprId, beta1_uniqueId, beta1_name, variable1_exprId, variable1_uniqueId, variable1_name, etc…

where 6 entries are provided for each term:

  • beta1_exprId is the expression id of the beta parameter

  • beta1_uniqueId is the unique id of the beta parameter

  • beta1_name is the name of the parameter

  • variable1_exprId is the expression id of the variable

  • variable1_uniqueId is the unique id of the variable

  • variable1_name is the name of the variable

expr13.prepare(database=my_data, numberOfDraws=NUMBER_OF_DRAWS)
expr13.getSignature()
[b'<Beta>{5750675920}"beta1"[0],0,0', b'<Beta>{5750672224}"beta2"[0],1,1', b'<Beta>{5750672848}"beta3"[1],2,0', b'<Variable>{11840880816}"Variable1",5,2', b'<Variable>{11840884224}"Variable2",6,3', b'<Variable>{11840890752}"newvar_b",11,8', b'<bioLinearUtility>{11840885328}(3),5750675920,0,beta1,11840880816,5,Variable1,5750672224,1,beta2,11840884224,6,Variable2,5750672848,2,beta3,11840890752,11,newvar_b']

Total running time of the script: (0 minutes 0.303 seconds)

Gallery generated by Sphinx-Gallery