Note
Go to the end to download the full example code.
biogeme.expressions
Examples of use of several functions.
This is designed for programmers who need examples of use of the functions of the module. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model.
- author:
Michel Bierlaire
- date:
Tue Nov 21 14:53:49 2023
import numpy as np
import pandas as pd
import biogeme.biogeme_logging as blog
import biogeme.database as db
import biogeme.exceptions as excep
import biogeme.expressions as ex
import biogeme.tools.derivatives
from biogeme import models
from biogeme.expressions import IdManager, TypeOfElementaryExpression, LinearTermTuple
from biogeme.function_output import (
BiogemeDisaggregateFunctionOutput,
BiogemeFunctionOutput,
NamedBiogemeFunctionOutput,
NamedBiogemeDisaggregateFunctionOutput,
)
from biogeme.version import get_text
Version of Biogeme.
print(get_text())
biogeme 3.2.14 [2024-08-05]
Home page: http://biogeme.epfl.ch
Submit questions to https://groups.google.com/d/forum/biogeme
Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)
logger = blog.get_screen_logger(level=blog.DEBUG)
Simple expressions
Simple expressions can be evaluated both with the functions get_value`(implemented in Python) and the `get_value_c (implemented in C++). They do not require a database.
x = ex.Beta('x', 2, None, None, 1)
x
Beta('x', 2, None, None, 1)
x.get_value()
2
x.get_value_c(prepare_ids=True)
2.0
y = ex.Beta('y', 3, None, None, 1)
y
Beta('y', 3, None, None, 1)
y.get_value()
3
y.get_value_c(prepare_ids=True)
3.0
Note that if the parameter has to be estimated, its value cannot be obtained.
unknown_parameter = ex.Beta('x', 2, None, None, 0)
try:
unknown_parameter.get_value()
except excep.BiogemeError as e:
print(e)
one = ex.Numeric(1)
one
`1.0`
one.get_value()
1.0
one.get_value_c(prepare_ids=True)
1.0
Addition
z = x + y
z.get_value()
5
z.get_value_c(prepare_ids=True)
5.0
Substraction
z = x - y
z.get_value()
-1
z.get_value_c(prepare_ids=True)
-1.0
Multiplication
z = x * y
z.get_value()
6
z.get_value_c(prepare_ids=True)
6.0
Division
z = x / y
z.get_value()
0.6666666666666666
z.get_value_c(prepare_ids=True)
0.6666666666666666
Power
z = x**y
z.get_value()
8
z.get_value_c(prepare_ids=True)
8.0
Exponential
z = ex.exp(x)
z.get_value()
np.float64(7.38905609893065)
z.get_value_c(prepare_ids=True)
7.38905609893065
Logarithm
z = ex.log(x)
z.get_value()
np.float64(0.6931471805599453)
z.get_value_c(prepare_ids=True)
0.6931471805599453
Minimum
z = ex.bioMin(x, y)
z.get_value()
2
z.get_value_c(prepare_ids=True)
2.0
Maximum
z = ex.bioMax(x, y)
z.get_value()
3
z.get_value_c(prepare_ids=True)
3.0
And
z = x & y
z.get_value()
1.0
z.get_value_c(prepare_ids=True)
1.0
z = x & 0
z.get_value()
0.0
z.get_value_c(prepare_ids=True)
0.0
Or
z = x | y
z.get_value()
1.0
z.get_value_c(prepare_ids=True)
1.0
z = ex.Numeric(0) | ex.Numeric(0)
z.get_value()
0.0
z.get_value_c(prepare_ids=True)
0.0
Equal
z = x == y
z.get_value()
0
z.get_value_c(prepare_ids=True)
0.0
z = (x + 1) == y
z.get_value()
1
z.get_value_c(prepare_ids=True)
1.0
Not equal
z = x != y
z.get_value()
1
z.get_value_c(prepare_ids=True)
1.0
z = (x + 1) != y
z.get_value()
0
z.get_value_c(prepare_ids=True)
0.0
Lesser or equal
z = x <= y
z.get_value()
1
z.get_value_c(prepare_ids=True)
1.0
Greater or equal
z = x >= y
z.get_value()
0
z.get_value_c(prepare_ids=True)
0.0
Lesser than
z = x < y
z.get_value()
1
z.get_value_c(prepare_ids=True)
1.0
Greater than
z = x > y
z.get_value()
0
z.get_value_c(prepare_ids=True)
0.0
Opposite
z = -x
z.get_value()
-2
z.get_value_c(prepare_ids=True)
-2.0
Sum of multiples expressions
listOfExpressions = [x, y, 1 + x, 1 + y]
z = ex.bioMultSum(listOfExpressions)
z.get_value()
12.0
z.get_value_c(prepare_ids=True)
12.0
The result is the same as the following, but it implements the sum in a more efficient way.
z = x + y + 1 + x + 1 + y
z.get_value()
12.0
z.get_value_c(prepare_ids=True)
12.0
Element: this expression considers a dictionary of expressions, and an expression for the index. The index is evaluated, and the value of the corresponding expression in the dictionary is returned.
my_dict = {1: ex.exp(-1), 2: ex.log(1.2), 3: 1234}
index = x
index.get_value()
2
z = ex.Elem(my_dict, index)
z.get_value()
np.float64(0.1823215567939546)
z.get_value_c(prepare_ids=True)
0.1823215567939546
index = x - 1
index.get_value()
1.0
z = ex.Elem(my_dict, index)
z.get_value()
np.float64(0.36787944117144233)
z.get_value_c(prepare_ids=True)
0.36787944117144233
index = x - 2
index.get_value()
0.0
If the value returned as index does not corresponds to an entry in the dictionary, an exception is raised.
z = ex.Elem(my_dict, index)
try:
z.get_value()
except excep.BiogemeError as e:
print(f'Exception raised: {e}')
Exception raised: Key 0 is not present in the dictionary. Available keys: dict_keys([1, 2, 3])
z = ex.Elem(my_dict, index)
try:
z.get_value_c(prepare_ids=True)
except RuntimeError as e:
print(f'Exception raised: {e}')
Exception raised: src/cythonbiogeme/cpp/bioExprElem.cc:58: Biogeme exception: Key (-(x lit[0],fixed[0](2),2)=0) is not present in dictionary:
1: exp(-1)
2: log(1.2)
3: 1234
Complex expressions
When an expression is deemed complex in Biogeme, the get_value function is not available. Only the get_value_c function must be used. It calculates the expressions using a C++ implementation of the expression.
Normal CDF: it calculates
z = ex.bioNormalCdf(x)
z.get_value_c(prepare_ids=True)
0.9772498680518218
z = ex.bioNormalCdf(0)
z.get_value_c(prepare_ids=True)
0.5
Derivative
z = 30 * x + 20 * y
zx = ex.Derive(z, 'x')
zx.get_value_c(prepare_ids=True)
30.0
zx = ex.Derive(z, 'y')
zx.get_value_c(prepare_ids=True)
20.0
Integral: let’s calculate the integral of the pdf of a normal distribution:
omega = ex.RandomVariable('omega')
pdf = ex.exp(-omega * omega / 2)
z = ex.Integrate(pdf, 'omega') / np.sqrt(2 * np.pi)
z.get_value_c(prepare_ids=True)
1.0
In order to change the bounds of integration, a change of variables must be performed. Let’s calculate
If \(a\) is the lower bound of integration, and \(b\) is the upper bound, the change of variable is
and
a = 0
b = 1
t = a + (b - a) / (1 + ex.exp(-omega))
dt = (b - a) * ex.exp(-omega) * (1 + ex.exp(-omega)) ** -2
integrand = t * t
z = ex.Integrate(integrand * dt / (b - a), 'omega')
z.get_value_c(prepare_ids=True)
0.3333323120662822
Expressions using a database
df = pd.DataFrame(
{
'Person': [1, 1, 1, 2, 2],
'Exclude': [0, 0, 1, 0, 1],
'Variable1': [10, 20, 30, 40, 50],
'Variable2': [100, 200, 300, 400, 500],
'Choice': [2, 2, 3, 1, 2],
'Av1': [0, 1, 1, 1, 1],
'Av2': [1, 1, 1, 1, 1],
'Av3': [0, 1, 1, 1, 1],
}
)
my_data = db.Database('test', df)
Linear utility: it defines a linear conbinations of parameters are variables.
beta1 = ex.Beta('beta1', 10, None, None, 0)
beta2 = ex.Beta('beta2', 20, None, None, 0)
v1 = ex.Variable('Variable1')
v2 = ex.Variable('Variable2')
list_of_terms = [
LinearTermTuple(beta=beta1, x=v1),
LinearTermTuple(beta=beta2, x=v2),
]
z = ex.bioLinearUtility(list_of_terms)
z.get_value_c(database=my_data, prepare_ids=True)
array([ 2100., 4200., 6300., 8400., 10500.])
It is equivalent to the following, but implemented in a more efficient way.
z = beta1 * v1 + beta2 * v2
z.get_value_c(database=my_data, prepare_ids=True)
array([ 2100., 4200., 6300., 8400., 10500.])
Monte Carlo: we approximate the integral
using Monte-Carlo integration. As draws require a database, it is calculated for each entry in the database.
draws = ex.bioDraws('draws', 'UNIFORM')
z = ex.MonteCarlo(draws * draws)
z.get_value_c(database=my_data, prepare_ids=True)
array([0.32299649, 0.33650473, 0.33501673, 0.35513668, 0.33056744])
Panel Trajectory: we first calculate a quantity for each entry in the database.
v1 = ex.Variable('Variable1')
v2 = ex.Variable('Variable2')
p = v1 / (v1 + v2)
p.get_value_c(database=my_data, prepare_ids=True)
array([0.09090909, 0.09090909, 0.09090909, 0.09090909, 0.09090909])
We now declare the data as “panel”, based on the identified Person. It means that the first three rows correspond to a sequence of three observations for individual 1, and the last two, a sequence of two observations for individual 2. The panel trajectory calculates the expression for each row associated with an individual, and calculate the product.
my_data.panel('Person')
In this case, we expect the following for individual 1:
0.09090909**3
0.0007513147783621339
And the following for individual 2:
0.09090909**2
0.0082644626446281
We verify that it is indeed the case:
z = ex.PanelLikelihoodTrajectory(p)
z.get_value_c(database=my_data, prepare_ids=True)
array([0.00075131, 0.00826446])
More complex expressions
We set the number of draws for Monte-Carlo integration. It should be a large number. For the sake of computational efficiency, as this notebook is designed to illustrate the various function, we use a low value.
NUMBER_OF_DRAWS = 100
We first create a small database
df = pd.DataFrame(
{
'Person': [1, 1, 1, 2, 2],
'Exclude': [0, 0, 1, 0, 1],
'Variable1': [10, 20, 30, 40, 50],
'Variable2': [100, 200, 300, 400, 500],
'Choice': [2, 2, 3, 1, 2],
'Av1': [0, 1, 1, 1, 1],
'Av2': [1, 1, 1, 1, 1],
'Av3': [0, 1, 1, 1, 1],
}
)
df
my_data = db.Database('test', df)
The following type of expression is a literal called Variable that corresponds to an entry in the database.
Person = ex.Variable('Person')
Variable1 = ex.Variable('Variable1')
Variable2 = ex.Variable('Variable2')
Choice = ex.Variable('Choice')
Av1 = ex.Variable('Av1')
Av2 = ex.Variable('Av2')
Av3 = ex.Variable('Av3')
It is possible to add a new column to the database, that creates a new variable that can be used in expressions.
newvar_b = my_data.define_variable('newvar_b', Variable1 + Variable2)
my_data.data
It is equivalent to the following Pandas statement.
my_data.data['newvar_p'] = my_data.data['Variable1'] + my_data.data['Variable2']
my_data.data
Do not use chaining comparison expressions with Biogeme. Not only it does not provide the expected expression, but it does not trigger a warning or an exception.
try:
my_expression = 200 <= Variable2 <= 400
except excep.BiogemeError as e:
print(e)
Expression (Variable2 >= `200.0`) cannot be used in a boolean expression. Use & for "and" and | for "or"
The reason is that Python executes 200 <= Variable2 <= 400 as (200 <= Variable2) and (Variable2 <= 400). The and operator cannot be overloaded in Python. Therefore, it does not return a Biogeme expression. Note that Pandas does not allow chaining either, and has implemented a between function instead.
my_data.data['chaining_p'] = my_data.data['Variable2'].between(200, 400)
my_data.data
The following type of expression is another literal, corresponding to an unknown parameter. Note that the value is just a starting value for the algorithm.
beta1 = ex.Beta('beta1', 0.2, None, None, 0)
beta2 = ex.Beta('beta2', 0.4, None, None, 0)
The last argument allows to fix the value of the parameter to the value.
beta3 = ex.Beta('beta3', 1, None, None, 1)
beta4 = ex.Beta('beta4', 0, None, None, 1)
Arithmetic operators are overloaded to allow standard manipulations of expressions.
expr0 = beta3 + beta4
print(expr0)
(Beta('beta3', 1, None, None, 1) + Beta('beta4', 0, None, None, 1))
The evaluation of expressions can be done in two ways. For simple expressions, the fonction get_value, implemented in Python, returns the value of the expression.
expr0.get_value()
1
It is possible to modify the values of the parameters.
newvalues = {'beta1': 1, 'beta2': 2, 'beta3': 3, 'beta4': 2}
expr0.change_init_values(newvalues)
expr0.get_value()
[WARNING] 2024-08-05 19:59:53,816 Parameter beta3 is fixed, but its value is changed from 1 to 3. <beta_parameters.py:183>
[WARNING] 2024-08-05 19:59:53,817 Parameter beta4 is fixed, but its value is changed from 0 to 2. <beta_parameters.py:183>
5
Consider another expression:
where \((\beta_2 \geq \beta_1)\) equals 1 if \(\beta_2 \geq \beta_1\) and 0 otherwise.
expr1 = 2 * beta1 - ex.exp(-beta2) / (
beta2 * (beta3 >= beta4) + beta1 * (beta3 < beta4)
)
print(expr1)
((`2.0` * Beta('beta1', 0.2, None, None, 0)) - (exp((-Beta('beta2', 0.4, None, None, 0))) / ((Beta('beta2', 0.4, None, None, 0) * (Beta('beta3', 3, None, None, 1) >= Beta('beta4', 2, None, None, 1))) + (Beta('beta1', 0.2, None, None, 0) * (Beta('beta3', 3, None, None, 1) < Beta('beta4', 2, None, None, 1))))))
The function get_value_c is implemented in C++, and works for any expression. When use outside a specific context, the IDs must be explicitly prepared.
expr1.get_value_c(prepare_ids=True)
-1.275800115089098
It actually calls the function get_value_and_derivatives, and returns its first output (without calculating the derivatives).
the_output = expr1.get_value_and_derivatives(prepare_ids=True)
the_output.function
np.float64(-1.275800115089098)
We create a pandas DataFrame just to have a nicer display of the results.
pd.DataFrame(the_output.gradient)
pd.DataFrame(the_output.hessian)
pd.DataFrame(the_output.bhhh)
Note that the BHHH matrix is the outer product of the gradient with itself.
pd.DataFrame(np.outer(the_output.gradient, the_output.gradient))
If the derivatives are not needed, their calculation can be skipped. Here, we calculate the gradient, but not the hessian.
expr1.get_value_and_derivatives(
gradient=True, hessian=False, bhhh=False, prepare_ids=True
)
<biogeme.function_output.BiogemeFunctionOutputSmartOutputProxy object at 0x12486ce90>
the_output: NamedBiogemeFunctionOutput = expr1.get_value_and_derivatives(
prepare_ids=True, named_results=True
)
print(the_output.gradient)
{'beta1': np.float64(2.0), 'beta2': np.float64(5.865300402811843)}
print(the_output.hessian)
{'beta1': {'beta1': np.float64(0.0), 'beta2': np.float64(0.0)}, 'beta2': {'beta1': np.float64(0.0), 'beta2': np.float64(-31.002302129148312)}}
print(the_output.bhhh)
{'beta1': {'beta1': np.float64(4.0), 'beta2': np.float64(11.730600805623686)}, 'beta2': {'beta1': np.float64(11.730600805623686), 'beta2': np.float64(34.401748815224764)}}
It can also generate a function that takes the value of the parameters as argument, and provides a tuple with the value of the expression and its derivatives. By default, it returns the value of the function, its gradient and its hessian.
the_function = expr1.create_function()
We evaluate it at one point…
the_function([1, 2])
… and at another point.
the_function([10, -2])
We can use it to check the derivatives.
biogeme.tools.derivatives.check_derivatives(the_function, np.array([1, 2]), logg=True)
[INFO] 2024-08-05 19:59:53,821 x Gradient FinDiff Difference <derivatives.py:149>
[INFO] 2024-08-05 19:59:53,821 x[0] +2.000000E+00 +2.000000E+00 -1.167734E-09 <derivatives.py:151>
[INFO] 2024-08-05 19:59:53,822 x[1] +1.015015E-01 +1.015014E-01 +1.629049E-08 <derivatives.py:151>
[INFO] 2024-08-05 19:59:53,822 Row Col Hessian FinDiff Difference <derivatives.py:158>
[INFO] 2024-08-05 19:59:53,822 x[0] x[0] +0.000000E+00 +0.000000E+00 +0.000000E+00 <derivatives.py:161>
[INFO] 2024-08-05 19:59:53,823 x[0] x[1] +0.000000E+00 +0.000000E+00 +0.000000E+00 <derivatives.py:161>
[INFO] 2024-08-05 19:59:53,823 x[1] x[0] +0.000000E+00 +0.000000E+00 +0.000000E+00 <derivatives.py:161>
[INFO] 2024-08-05 19:59:53,823 x[1] x[1] -1.691691E-01 -1.691691E-01 -3.203118E-08 <derivatives.py:161>
(np.float64(1.9323323583816936), array([2. , 0.10150146]), array([[ 0. , 0. ],
[ 0. , -0.1691691]]), array([-1.16773435e-09, 1.62904950e-08]), array([[ 0.00000000e+00, 0.00000000e+00],
[ 0.00000000e+00, -3.20311803e-08]]))
And it is possible to also obtain the BHHH matrix.
the_function = expr1.create_function(bhhh=True)
the_function([1, 2])
It can take a database as input, and evaluate the expression and its derivatives for each entry in the database. In the following example, as no variable of the database is involved in the expression, the output of the expression is the same for each entry.
results: BiogemeDisaggregateFunctionOutput = expr1.get_value_and_derivatives(
database=my_data, aggregation=False
)
print(len(results.functions))
5
f_array = results.functions
g_array = results.gradients
h_array = results.hessians
bhhh_array = results.bhhhs
for f, g, h, bhhh in zip(f_array, g_array, h_array, bhhh_array):
print('******')
print(f'{f=}')
print(f'{g=}')
print(f'{h=}')
print(f'{bhhh=}')
******
f=np.float64(1.9323323583816936)
g=array([2. , 0.10150146])
h=array([[ 0. , 0. ],
[ 0. , -0.1691691]])
bhhh=array([[4. , 0.20300292],
[0.20300292, 0.01030255]])
******
f=np.float64(1.9323323583816936)
g=array([2. , 0.10150146])
h=array([[ 0. , 0. ],
[ 0. , -0.1691691]])
bhhh=array([[4. , 0.20300292],
[0.20300292, 0.01030255]])
******
f=np.float64(1.9323323583816936)
g=array([2. , 0.10150146])
h=array([[ 0. , 0. ],
[ 0. , -0.1691691]])
bhhh=array([[4. , 0.20300292],
[0.20300292, 0.01030255]])
******
f=np.float64(1.9323323583816936)
g=array([2. , 0.10150146])
h=array([[ 0. , 0. ],
[ 0. , -0.1691691]])
bhhh=array([[4. , 0.20300292],
[0.20300292, 0.01030255]])
******
f=np.float64(1.9323323583816936)
g=array([2. , 0.10150146])
h=array([[ 0. , 0. ],
[ 0. , -0.1691691]])
bhhh=array([[4. , 0.20300292],
[0.20300292, 0.01030255]])
results: NamedBiogemeDisaggregateFunctionOutput = expr1.get_value_and_derivatives(
database=my_data, aggregation=False, named_results=True
)
print(len(results.functions))
5
f_array = results.functions
g_array = results.gradients
h_array = results.hessians
bhhh_array = results.bhhhs
for f, g, h, bhhh in zip(f_array, g_array, h_array, bhhh_array):
print('******')
print(f'{f=}')
print(f'{g=}')
print(f'{h=}')
print(f'{bhhh=}')
******
f=np.float64(1.9323323583816936)
g={'beta1': np.float64(2.0), 'beta2': np.float64(0.10150146242745953)}
h={'beta1': {'beta1': np.float64(0.0), 'beta2': np.float64(0.0)}, 'beta2': {'beta1': np.float64(0.0), 'beta2': np.float64(-0.16916910404576588)}}
bhhh={'beta1': {'beta1': np.float64(4.0), 'beta2': np.float64(0.20300292485491905)}, 'beta2': {'beta1': np.float64(0.20300292485491905), 'beta2': np.float64(0.010302546874912978)}}
******
f=np.float64(1.9323323583816936)
g={'beta1': np.float64(2.0), 'beta2': np.float64(0.10150146242745953)}
h={'beta1': {'beta1': np.float64(0.0), 'beta2': np.float64(0.0)}, 'beta2': {'beta1': np.float64(0.0), 'beta2': np.float64(-0.16916910404576588)}}
bhhh={'beta1': {'beta1': np.float64(4.0), 'beta2': np.float64(0.20300292485491905)}, 'beta2': {'beta1': np.float64(0.20300292485491905), 'beta2': np.float64(0.010302546874912978)}}
******
f=np.float64(1.9323323583816936)
g={'beta1': np.float64(2.0), 'beta2': np.float64(0.10150146242745953)}
h={'beta1': {'beta1': np.float64(0.0), 'beta2': np.float64(0.0)}, 'beta2': {'beta1': np.float64(0.0), 'beta2': np.float64(-0.16916910404576588)}}
bhhh={'beta1': {'beta1': np.float64(4.0), 'beta2': np.float64(0.20300292485491905)}, 'beta2': {'beta1': np.float64(0.20300292485491905), 'beta2': np.float64(0.010302546874912978)}}
******
f=np.float64(1.9323323583816936)
g={'beta1': np.float64(2.0), 'beta2': np.float64(0.10150146242745953)}
h={'beta1': {'beta1': np.float64(0.0), 'beta2': np.float64(0.0)}, 'beta2': {'beta1': np.float64(0.0), 'beta2': np.float64(-0.16916910404576588)}}
bhhh={'beta1': {'beta1': np.float64(4.0), 'beta2': np.float64(0.20300292485491905)}, 'beta2': {'beta1': np.float64(0.20300292485491905), 'beta2': np.float64(0.010302546874912978)}}
******
f=np.float64(1.9323323583816936)
g={'beta1': np.float64(2.0), 'beta2': np.float64(0.10150146242745953)}
h={'beta1': {'beta1': np.float64(0.0), 'beta2': np.float64(0.0)}, 'beta2': {'beta1': np.float64(0.0), 'beta2': np.float64(-0.16916910404576588)}}
bhhh={'beta1': {'beta1': np.float64(4.0), 'beta2': np.float64(0.20300292485491905)}, 'beta2': {'beta1': np.float64(0.20300292485491905), 'beta2': np.float64(0.010302546874912978)}}
If aggregation is set to True, the results are accumulated as a sum.
the_output: BiogemeFunctionOutput = expr1.get_value_and_derivatives(
database=my_data, aggregation=True
)
print(f'{the_output.function=}')
print(f'{the_output.gradient=}')
print(f'{the_output.hessian=}')
print(f'{the_output.bhhh=}')
print('B3')
the_output.function=np.float64(9.661661791908468)
the_output.gradient=array([10. , 0.50750731])
the_output.hessian=array([[ 0. , 0. ],
[ 0. , -0.84584552]])
the_output.bhhh=array([[20. , 1.01501462],
[ 1.01501462, 0.05151273]])
B3
The following function scans the expression and extracts a dict with all free parameters.
expr1.set_of_elementary_expression(TypeOfElementaryExpression.FREE_BETA)
{'beta1', 'beta2'}
Options can be set to extract free parameters, fixed parameters, or both.
expr1.set_of_elementary_expression(TypeOfElementaryExpression.FIXED_BETA)
{'beta4', 'beta3'}
expr1.set_of_elementary_expression(TypeOfElementaryExpression.BETA)
{'beta4', 'beta3', 'beta1', 'beta2'}
It is possible also to extract an elementary expression from its name.
expr1.get_elementary_expression('beta2')
Beta('beta2', 0.4, None, None, 0)
Let’s consider an expression involving two variables \(V_1\) and \(V_2\):
where \((\beta_2 \geq \beta_1)\) equals 1 if \(\beta_2 \geq \beta_1\) and 0 otherwise. Note that, in our example, the second term is numerically negligible with respect to the first one.
expr2 = 2 * beta1 * Variable1 - ex.exp(-beta2 * Variable2) / (
beta2 * (beta3 >= beta4) + beta1 * (beta3 < beta4)
)
print(expr2)
print('B4')
(((`2.0` * Beta('beta1', 0.2, None, None, 0)) * Variable1) - (exp(((-Beta('beta2', 0.4, None, None, 0)) * Variable2)) / ((Beta('beta2', 0.4, None, None, 0) * (Beta('beta3', 3, None, None, 1) >= Beta('beta4', 2, None, None, 1))) + (Beta('beta1', 0.2, None, None, 0) * (Beta('beta3', 3, None, None, 1) < Beta('beta4', 2, None, None, 1))))))
B4
It is not a simple expression anymore, and only the function get_value_c can be invoked. If we try the get_value function, it raises an exception.
try:
expr2.get_value()
except excep.BiogemeError as e:
print(f'Exception raised: {e}')
Exception raised: getValue method undefined at this level: <class 'biogeme.expressions.elementary_expressions.Variable'>. Each expression must implement it.
As the expression is called out of a specific context, it should be instructed to prepare its IDs. Note that if no database is provided, an exception is raised when the formula contains variables. Indeed, the values of these variables cannot be found anywhere.
try:
expr2.get_value_c(prepare_ids=True)
except excep.BiogemeError as e:
print(f'Exception raised: {e}')
print('B5')
Exception raised: No database is provided and an expression contains variables: {'Variable2', 'Variable1'}
B5
expr2.get_value_c(database=my_data, aggregation=False, prepare_ids=True)
array([ 4., 8., 12., 16., 20.])
The following function extracts the names of the parameters apprearing in the expression.
expr2.set_of_elementary_expression(TypeOfElementaryExpression.BETA)
{'beta4', 'beta3', 'beta1', 'beta2'}
The list of parameters can also be obtained in the form of a dictionary.
expr2.dict_of_elementary_expression(TypeOfElementaryExpression.BETA)
{'beta1': Beta('beta1', 0.2, None, None, 0), 'beta2': Beta('beta2', 0.4, None, None, 0), 'beta3': Beta('beta3', 3, None, None, 1), 'beta4': Beta('beta4', 2, None, None, 1)}
The list of variables can also be obtained in the form of a dictionary.
expr2.dict_of_elementary_expression(TypeOfElementaryExpression.VARIABLE)
{'Variable1': Variable1, 'Variable2': Variable2}
or a set…
expr2.set_of_elementary_expression(TypeOfElementaryExpression.VARIABLE)
{'Variable2', 'Variable1'}
Expressions are defined recursively, using a tree representation. The following function describes the type of the upper most node of the tree.
expr2.get_class_name()
'Minus'
The signature is a formal representation of the expression, assigning identifiers to each node of the tree, and representing them starting from the leaves. It is easy to parse, and is passed to the C++ implementation.
As the expression is used out of a specific context, it must be prepared before using it.
expr2.prepare(database=my_data, number_of_draws=0)
expr2.get_status_id_manager()
print(expr2)
(((`2.0` * Beta('beta1', 0.2, None, None, 0)) * Variable1) - (exp(((-Beta('beta2', 0.4, None, None, 0)) * Variable2)) / ((Beta('beta2', 0.4, None, None, 0) * (Beta('beta3', 3, None, None, 1) >= Beta('beta4', 2, None, None, 1))) + (Beta('beta1', 0.2, None, None, 0) * (Beta('beta3', 3, None, None, 1) < Beta('beta4', 2, None, None, 1))))))
expr2.get_signature()
[b'<Numeric>{4907792336},2.0', b'<Beta>{4852715824}"beta1"[0],0,0', b'<Times>{4907791952}(2),4907792336,4852715824', b'<Variable>{5158109008}"Variable1",6,2', b'<Times>{4907784416}(2),4907791952,5158109008', b'<Beta>{4852723072}"beta2"[0],1,1', b'<UnaryMinus>{4907790896}(1),4852723072', b'<Variable>{5158099264}"Variable2",7,3', b'<Times>{4907783984}(2),4907790896,5158099264', b'<exp>{4907786432}(1),4907783984', b'<Beta>{4852723072}"beta2"[0],1,1', b'<Beta>{4852723504}"beta3"[1],2,0', b'<Beta>{4852716160}"beta4"[1],3,1', b'<GreaterOrEqual>{4907790608}(2),4852723504,4852716160', b'<Times>{4907791520}(2),4852723072,4907790608', b'<Beta>{4852715824}"beta1"[0],0,0', b'<Beta>{4852723504}"beta3"[1],2,0', b'<Beta>{4852716160}"beta4"[1],3,1', b'<Less>{4907791616}(2),4852723504,4852716160', b'<Times>{4907794208}(2),4852715824,4907791616', b'<Plus>{4907782832}(2),4907791520,4907794208', b'<Divide>{4907792912}(2),4907786432,4907782832', b'<Minus>{4907780432}(2),4907784416,4907792912']
The elementary expressions are
free parameters,
fixed parameters,
random variables (for numerical integration),
draws (for Monte-Carlo integration), and
variables from the database.
The following function extracts all elementary expressions from a list of formulas, give them a unique numbering, and return them organized by group, as defined above (with the exception of the variables, that are directly available in the database).
collection_of_formulas = [expr1, expr2]
formulas = IdManager(collection_of_formulas, my_data, None)
Unique numbering for all elementary expressions.
formulas.elementary_expressions.indices
{'beta1': 0, 'beta2': 1, 'beta3': 2, 'beta4': 3, 'Person': 4, 'Exclude': 5, 'Variable1': 6, 'Variable2': 7, 'Choice': 8, 'Av1': 9, 'Av2': 10, 'Av3': 11, 'newvar_b': 12, 'newvar_p': 13, 'chaining_p': 14}
formulas.free_betas
ElementsTuple(expressions={'beta1': Beta('beta1', 0.2, None, None, 0), 'beta2': Beta('beta2', 0.4, None, None, 0)}, indices={'beta1': 0, 'beta2': 1}, names=['beta1', 'beta2'])
Each elementary expression has two ids. One unique index across all elementary expressions, and one unique within each specific group.
[(i.elementaryIndex, i.betaId) for k, i in formulas.free_betas.expressions.items()]
[(0, 0), (1, 1)]
formulas.free_betas.names
['beta1', 'beta2']
formulas.fixed_betas
ElementsTuple(expressions={'beta3': Beta('beta3', 3, None, None, 1), 'beta4': Beta('beta4', 2, None, None, 1)}, indices={'beta3': 0, 'beta4': 1}, names=['beta3', 'beta4'])
[(i.elementaryIndex, i.betaId) for k, i in formulas.fixed_betas.expressions.items()]
[(2, 0), (3, 1)]
formulas.fixed_betas.names
['beta3', 'beta4']
formulas.random_variables
ElementsTuple(expressions={}, indices={}, names=[])
Monte Carlo integration is based on draws.
my_draws = ex.bioDraws('my_draws', 'UNIFORM')
expr3 = ex.MonteCarlo(my_draws * my_draws)
print(expr3)
MonteCarlo((bioDraws("my_draws", "UNIFORM") * bioDraws("my_draws", "UNIFORM")))
Note that draws are different from random variables, used for numerical integration.
expr3.set_of_elementary_expression(TypeOfElementaryExpression.RANDOM_VARIABLE)
set()
The following function reports the draws involved in an expression.
expr3.set_of_elementary_expression(TypeOfElementaryExpression.DRAWS)
{'my_draws'}
The following function checks if draws are defined outside MonteCarlo, and return their names.
wrong_expression = my_draws + ex.MonteCarlo(my_draws * my_draws)
wrong_expression.check_draws()
{'my_draws'}
Checking the correct expression returns an empty set.
expr3.check_draws()
set()
The expression is a Monte-Carlo integration.
expr3.get_class_name()
'MonteCarlo'
Note that the draws are associated with a database. Therefore, the evaluation of expressions involving Monte Carlo integration can only be done on a database. If none is provided, an exception is raised.
try:
expr3.get_value_c(number_of_draws=NUMBER_OF_DRAWS)
except excep.BiogemeError as e:
print(f'Exception raised: {e}')
Exception raised: An expression involving MonteCarlo integration must be associated with a database.
Here is its value. It is an approximation of
expr3.get_value_c(database=my_data, number_of_draws=NUMBER_OF_DRAWS, prepare_ids=True)
array([0.34528706, 0.32486975, 0.32311505, 0.31370743, 0.40247583])
Here is its signature.
expr3.prepare(database=my_data, number_of_draws=NUMBER_OF_DRAWS)
expr3.get_signature()
[b'<bioDraws>{4907793248}"my_draws",0,0', b'<bioDraws>{4907793248}"my_draws",0,0', b'<Times>{4907793872}(2),4907793248,4907793248', b'<MonteCarlo>{4907779760}(1),4907793872']
The same integral can be calculated using numerical integration, declaring a random variable.
omega = ex.RandomVariable('omega')
Numerical integration calculates integrals between \(-\infty\) and \(+\infty\). Here, the interval being \([0,1]\), a change of variables is required.
a = 0
b = 1
x = a + (b - a) / (1 + ex.exp(-omega))
dx = (b - a) * ex.exp(-omega) * (1 + ex.exp(-omega)) ** (-2)
integrand = x * x
expr4 = ex.Integrate(integrand * dx / (b - a), 'omega')
In this case, omega is a random variable.
expr4.dict_of_elementary_expression(TypeOfElementaryExpression.RANDOM_VARIABLE)
{'omega': omega}
print(expr4)
Integrate(((((`0.0` + (`1.0` / (`1.0` + exp((-omega))))) * (`0.0` + (`1.0` / (`1.0` + exp((-omega)))))) * ((`1.0` * exp((-omega))) * (`1.0` + exp((-omega)))**-2.0)) / `1.0`), "omega")
The following function checks if random variables are defined outside an Integrate statement.
wrong_expression = x * x
wrong_expression.check_rv()
{'omega'}
The same function called from the correct expression returns an empty set.
expr4.check_rv()
set()
Calculating its value requires the C++ implementation.
expr4.get_value_c(my_data, prepare_ids=True)
array([0.33333231, 0.33333231, 0.33333231, 0.33333231, 0.33333231])
We illustrate now the Elem function. It takes two arguments: a dictionary, and a formula for the key. For each entry in the database, the formula is evaluated, and its result identifies which formula in the dictionary should be evaluated. Here is ‘Person’ is 1, the expression is
and if ‘Person’ is 2, the expression is
As it is a regular expression, it can be included in any formula. Here, we illustrate it by dividing the result by 10.
elemExpr = ex.Elem({1: expr1, 2: expr2}, Person)
expr5 = elemExpr / 10
print(expr5)
({{1:((`2.0` * Beta('beta1', 0.2, None, None, 0)) - (exp((-Beta('beta2', 0.4, None, None, 0))) / ((Beta('beta2', 0.4, None, None, 0) * (Beta('beta3', 3, None, None, 1) >= Beta('beta4', 2, None, None, 1))) + (Beta('beta1', 0.2, None, None, 0) * (Beta('beta3', 3, None, None, 1) < Beta('beta4', 2, None, None, 1)))))), 2:(((`2.0` * Beta('beta1', 0.2, None, None, 0)) * Variable1) - (exp(((-Beta('beta2', 0.4, None, None, 0)) * Variable2)) / ((Beta('beta2', 0.4, None, None, 0) * (Beta('beta3', 3, None, None, 1) >= Beta('beta4', 2, None, None, 1))) + (Beta('beta1', 0.2, None, None, 0) * (Beta('beta3', 3, None, None, 1) < Beta('beta4', 2, None, None, 1))))))}[Person] / `10.0`)
expr5.dict_of_elementary_expression(TypeOfElementaryExpression.VARIABLE)
{'Person': Person, 'Variable1': Variable1, 'Variable2': Variable2}
Note that Variable1 and Variable2 have previously been involved in another formula. Therefore, they have been numbered according to this formula, and this numbering is invalid for the new expression expr5. An error is triggered
try:
expr5.get_value_c(database=my_data)
except excep.BiogemeError as e:
print(e)
Expression evaluated out of context. Set prepare_ids to True.
expr5.get_value_c(database=my_data, prepare_ids=True)
array([-0.12758001, -0.12758001, -0.12758001, 1.6 , 2. ])
testElem = ex.MonteCarlo(ex.Elem({1: my_draws * my_draws}, 1))
testElem.audit()
([], [])
The next expression is simply the sum of multiple expressions. The argument is a list of expressions.
expr6 = ex.bioMultSum([expr1, expr2, expr4])
print(expr6)
bioMultSum(((`2.0` * Beta('beta1', 0.2, None, None, 0)) - (exp((-Beta('beta2', 0.4, None, None, 0))) / ((Beta('beta2', 0.4, None, None, 0) * (Beta('beta3', 3, None, None, 1) >= Beta('beta4', 2, None, None, 1))) + (Beta('beta1', 0.2, None, None, 0) * (Beta('beta3', 3, None, None, 1) < Beta('beta4', 2, None, None, 1)))))), (((`2.0` * Beta('beta1', 0.2, None, None, 0)) * Variable1) - (exp(((-Beta('beta2', 0.4, None, None, 0)) * Variable2)) / ((Beta('beta2', 0.4, None, None, 0) * (Beta('beta3', 3, None, None, 1) >= Beta('beta4', 2, None, None, 1))) + (Beta('beta1', 0.2, None, None, 0) * (Beta('beta3', 3, None, None, 1) < Beta('beta4', 2, None, None, 1)))))), Integrate(((((`0.0` + (`1.0` / (`1.0` + exp((-omega))))) * (`0.0` + (`1.0` / (`1.0` + exp((-omega)))))) * ((`1.0` * exp((-omega))) * (`1.0` + exp((-omega)))**-2.0)) / `1.0`), "omega"))
expr6.get_value_c(database=my_data, number_of_draws=NUMBER_OF_DRAWS, prepare_ids=True)
array([ 3.0575322, 7.0575322, 11.0575322, 15.0575322, 19.0575322])
We now illustrate how to calculate a logit model, that is
where \(V_0=-\beta_1\), \(V_1=-\beta_2\) and \(V_2=-\beta_1\), and \(y_i = 1\), \(i=1,2,3\).
V = {0: -beta1, 1: -beta2, 2: -beta1}
av = {0: 1, 1: 1, 2: 1}
expr7 = ex._bioLogLogit(V, av, 1)
expr7.get_value()
np.float64(-1.2362866960692134)
If the alternative is not in the choice set, an exception is raised.
expr7_wrong = ex.LogLogit(V, av, 3)
try:
expr7_wrong.get_value()
except excep.BiogemeError as e:
print(f'Exception: {e}')
Exception: Alternative 3 does not appear in the list of utility functions: dict_keys([0, 1, 2])
It is actually better to use the C++ implementation, available in the module models.
expr8 = models.loglogit(V, av, 1)
expr8.get_value_c(database=my_data, prepare_ids=True)
array([-1.2362867, -1.2362867, -1.2362867, -1.2362867, -1.2362867])
As the result is a numpy array, it can be used for any calculation. Here, we show how to calculate the logsum.
for v in V.values():
print(v.get_value_c(database=my_data, prepare_ids=True))
[-0.2 -0.2 -0.2 -0.2 -0.2]
[-0.4 -0.4 -0.4 -0.4 -0.4]
[-0.2 -0.2 -0.2 -0.2 -0.2]
logsum = np.log(
np.sum(
[np.exp(v.get_value_c(database=my_data, prepare_ids=True)) for v in V.values()],
axis=1,
)
)
logsum
array([1.40943791, 1.20943791, 1.40943791])
It is possible to calculate the derivative of a formula with respect to a literal:
expr9 = ex.Derive(expr8, 'beta2')
expr9.get_value_c(database=my_data, prepare_ids=True)
array([-0.70953921, -0.70953921, -0.70953921, -0.70953921, -0.70953921])
expr9.elementaryName
'beta2'
Biogeme also provides an approximation of the CDF of the normal distribution:
expr10 = ex.bioNormalCdf(Variable1 / 10 - 1)
expr10.get_value_c(database=my_data, prepare_ids=True)
array([0.5 , 0.84134475, 0.97724987, 0.9986501 , 0.99996833])
Min and max operators are also available. To avoid any ambiguity with the Python operator, they are called bioMin and bioMax.
expr11 = ex.bioMin(expr5, expr10)
expr11.get_value_c(database=my_data, prepare_ids=True)
array([-0.12758001, -0.12758001, -0.12758001, 0.9986501 , 0.99996833])
expr12 = ex.bioMax(expr5, expr10)
expr12.get_value_c(database=my_data, prepare_ids=True)
array([0.5 , 0.84134475, 0.97724987, 1.6 , 2. ])
For the sake of efficiency, it is possible to specify explicitly a linear function, where each term is the product of a parameter and a variable.
terms = [
LinearTermTuple(beta=beta1, x=ex.Variable('Variable1')),
LinearTermTuple(beta=beta2, x=ex.Variable('Variable2')),
LinearTermTuple(beta=beta3, x=ex.Variable('newvar_b')),
]
expr13 = ex.bioLinearUtility(terms)
expr13.get_value_c(database=my_data, prepare_ids=True)
array([ 372., 744., 1116., 1488., 1860.])
In terms of specification, it is equivalent to the expression below. But the calculation of the derivatives is more efficient, as the linear structure of the specification is exploited.
expr13bis = beta1 * Variable1 + beta2 * Variable2 + beta3 * newvar_b
expr13bis.get_value_c(database=my_data, prepare_ids=True)
array([ 372., 744., 1116., 1488., 1860.])
A Pythonic way to write a linear utility function.
variables = ['v1', 'v2', 'v3', 'cost', 'time', 'headway']
coefficients = {
f'{v}': biogeme.expressions.beta_parameters.Beta(f'beta_{v}', 0, None, None, 0)
for v in variables
}
terms = [coefficients[v] * ex.Variable(v) for v in variables]
util = sum(terms)
print(util)
((((((`0.0` + (Beta('beta_v1', 0, None, None, 0) * v1)) + (Beta('beta_v2', 0, None, None, 0) * v2)) + (Beta('beta_v3', 0, None, None, 0) * v3)) + (Beta('beta_cost', 0, None, None, 0) * cost)) + (Beta('beta_time', 0, None, None, 0) * time)) + (Beta('beta_headway', 0, None, None, 0) * headway))
If the data is organized a panel data, it means that several rows correspond to the same individual. The expression PanelLikelihoodTrajectory calculates the product of the expression evaluated for each row. If Monte Carlo integration is involved, the same draws are used for each them.
Our database contains 5 observations.
my_data.get_sample_size()
5
my_data.panel('Person')
Once the data has been labeled as “panel”, it is considered that there are only two series of observations, corresponding to each person. Each of these observations is associated with several rows of observations.
my_data.get_sample_size()
2
If we try to evaluate again the integral \(\int_0^1 x^2 dx=\frac{1}{3}\), an exception is raised.
try:
expr3.get_value_c(database=my_data)
except excep.BiogemeError as e:
print(f'Exception: {e}')
[WARNING] 2024-08-05 19:59:53,843 As the database is panel, the argument of MonteCarlo must contain a PanelLikelihoodTrajectory: MonteCarlo((bioDraws("my_draws", "UNIFORM") * bioDraws("my_draws", "UNIFORM"))) <base_expressions.py:1064>
Exception: As the database is panel, the argument of MonteCarlo must contain a PanelLikelihoodTrajectory: MonteCarlo((bioDraws("my_draws", "UNIFORM") * bioDraws("my_draws", "UNIFORM")))
This is detected by the audit function, called before the expression is evaluated.
expr3.audit(database=my_data)
(['As the database is panel, the argument of MonteCarlo must contain a PanelLikelihoodTrajectory: MonteCarlo((bioDraws("my_draws", "UNIFORM") * bioDraws("my_draws", "UNIFORM")))'], [])
We now evaluate an expression for panel data.
c1 = ex.bioDraws('draws1', 'NORMAL_HALTON2')
c2 = ex.bioDraws('draws2', 'NORMAL_HALTON2')
U1 = (
biogeme.expressions.beta_parameters.Beta('beta1', 0, None, None, 0) * Variable1
+ 10 * c1
)
U2 = (
biogeme.expressions.beta_parameters.Beta('beta2', 0, None, None, 0) * Variable2
+ 10 * c2
)
U3 = 0
U = {1: U1, 2: U2, 3: U3}
av = {1: Av1, 2: Av2, 3: Av3}
expr14 = ex.log(
ex.MonteCarlo(ex.PanelLikelihoodTrajectory(models.logit(U, av, Choice)))
)
expr14.prepare(database=my_data, number_of_draws=NUMBER_OF_DRAWS)
expr14
log(MonteCarlo(PanelLikelihoodTrajectory(exp(_bioLogLogit[choice=Choice]U=(1:((Beta('beta1', 0, None, None, 0) * Variable1) + (`10.0` * bioDraws("draws1", "NORMAL_HALTON2"))), 2:((Beta('beta2', 0, None, None, 0) * Variable2) + (`10.0` * bioDraws("draws2", "NORMAL_HALTON2"))), 3:`0.0`)av=(1:Av1, 2:Av2, 3:Av3)))))
expr14.get_value_c(database=my_data, number_of_draws=NUMBER_OF_DRAWS, prepare_ids=True)
array([-3.91914292, -2.11209896])
expr14.get_value_and_derivatives(
database=my_data,
number_of_draws=NUMBER_OF_DRAWS,
gradient=True,
hessian=True,
aggregation=False,
)
<biogeme.function_output.BiogemeDisaggregateFunctionOutputSmartOutputProxy object at 0x1213e9a90>
expr14.get_value_and_derivatives(
database=my_data,
number_of_draws=NUMBER_OF_DRAWS,
gradient=True,
hessian=True,
aggregation=True,
)
<biogeme.function_output.BiogemeFunctionOutputSmartOutputProxy object at 0x1337f5970>
A Python function can also be obtained for this expression. Note that it is available only for the aggregated version, summing over the database.
the_function = expr14.create_function(
database=my_data, number_of_draws=NUMBER_OF_DRAWS, gradient=True, hessian=True
)
the_function([0, 0])
the_function([0.1, 0.1])
It is possible to fix the value of some (or all) Beta parameters
print(expr14)
log(MonteCarlo(PanelLikelihoodTrajectory(exp(_bioLogLogit[choice=Choice]U=(1:((Beta('beta1', 0, None, None, 0) * Variable1) + (`10.0` * bioDraws("draws1", "NORMAL_HALTON2"))), 2:((Beta('beta2', 0, None, None, 0) * Variable2) + (`10.0` * bioDraws("draws2", "NORMAL_HALTON2"))), 3:`0.0`)av=(1:Av1, 2:Av2, 3:Av3)))))
expr14.fix_betas({'beta2': 0.123})
print(expr14)
log(MonteCarlo(PanelLikelihoodTrajectory(exp(_bioLogLogit[choice=Choice]U=(1:((Beta('beta1', 0, None, None, 0) * Variable1) + (`10.0` * bioDraws("draws1", "NORMAL_HALTON2"))), 2:((Beta('beta2', 0.123, None, None, 1) * Variable2) + (`10.0` * bioDraws("draws2", "NORMAL_HALTON2"))), 3:`0.0`)av=(1:Av1, 2:Av2, 3:Av3)))))
The name of the parameter can also be changed while fixing its value.
expr14.fix_betas({'beta2': 123}, prefix='prefix_', suffix='_suffix')
print(expr14)
log(MonteCarlo(PanelLikelihoodTrajectory(exp(_bioLogLogit[choice=Choice]U=(1:((Beta('beta1', 0, None, None, 0) * Variable1) + (`10.0` * bioDraws("draws1", "NORMAL_HALTON2"))), 2:((Beta('prefix_beta2_suffix', 123, None, None, 1) * Variable2) + (`10.0` * bioDraws("draws2", "NORMAL_HALTON2"))), 3:`0.0`)av=(1:Av1, 2:Av2, 3:Av3)))))
It can also be renamed using the following function.
expr14.rename_elementary(['prefix_beta2_suffix'], prefix='PREFIX_', suffix='_SUFFIX')
print(expr14)
log(MonteCarlo(PanelLikelihoodTrajectory(exp(_bioLogLogit[choice=Choice]U=(1:((Beta('beta1', 0, None, None, 0) * Variable1) + (`10.0` * bioDraws("draws1", "NORMAL_HALTON2"))), 2:((Beta('PREFIX_prefix_beta2_suffix_SUFFIX', 123, None, None, 1) * Variable2) + (`10.0` * bioDraws("draws2", "NORMAL_HALTON2"))), 3:`0.0`)av=(1:Av1, 2:Av2, 3:Av3)))))
Signatures
The Python library communicates the expressions to the C++ library using a syntax called a “signature”. We describe and illustrate now the signature for each expression. Each expression is identified by an identifier provided by Python using the function ‘id’.
id(expr1)
4852727728
Numerical expression
<Numeric>{identifier},0.0
ex.Numeric(0).get_signature()
[b'<Numeric>{5158946768},0.0']
Beta parameters
<Beta>{identifier}”name”[status],uniqueId,betaId’ where
status is 0 for free parameters, and non zero for fixed parameters,
uniqueId is a unique index given by Biogeme to all elementary expressions,
betaId is a unique index given by Biogeme to all free parameters, and to all fixed parameters.
As the signature requires an Id, we need to prepare the expression first.
beta1.prepare(database=my_data, number_of_draws=0)
beta1.get_signature()
[b'<Beta>{4852715824}"beta1"[0],0,0']
beta3.prepare(database=my_data, number_of_draws=0)
beta3.get_signature()
[b'<Beta>{4852723504}"beta3"[1],0,0']
Variables
<Variable>{identifier}”name”,uniqueId,variableId where
uniqueId is a unique index given by Biogeme to all elementary expressions,
variableId is a unique index given by Biogeme to all variables.
Variable1.get_signature()
[b'<Variable>{5158109008}"Variable1",6,2']
Random variables
<RandomVariable>{identifier}”name”,uniqueId,randomVariableId where
uniqueId is a unique index given by Biogeme to all elementary expressions,
randomVariableId is a unique index given by Biogeme to all random variables.
omega.prepare(database=my_data, number_of_draws=0)
omega.get_signature()
[b'<RandomVariable>{4907788688}"omega",0,0']
Draws
<bioDraws>{identifier}”name”,uniqueId,drawId where
uniqueId is a unique index given by Biogeme to all elementary expressions,
drawId is a unique index given by Biogeme to all draws.
my_draws.prepare(database=my_data, number_of_draws=NUMBER_OF_DRAWS)
my_draws.get_signature()
[b'<bioDraws>{4907793248}"my_draws",0,0']
General expression
<operator>{identifier}(numberOfChildren),idFirstChild,idSecondChild,idThirdChild, etc…
where the number of identifiers given after the comma matches the reported number of children.
Specific examples are reported below.
Binary operator
<code><operator>{identifier}(2),idFirstChild,idSecondChild </code> where operator is one of:
Plus
Minus
Times
Divide
Power
bioMin
bioMax
And
Or
Equal
NotEqual
LessOrEqual
GreaterOrEqual
Less
Greater
the_sum = beta1 + Variable1
the_sum.get_signature()
[b'<Beta>{4852715824}"beta1"[0],0,0', b'<Variable>{5158109008}"Variable1",6,2', b'<Plus>{5158954976}(2),4852715824,5158109008']
Unary operator
<operator>{identifier}(1),idChild, where operator is one of:
UnaryMinus
MonteCarlo
bioNormalCdf
PanelLikelihoodTrajectory
exp
log
m = -beta1
m.get_signature()
[b'<Beta>{4852715824}"beta1"[0],0,0', b'<UnaryMinus>{5158952480}(1),4852715824']
LogLogit
<LogLogit>{identifier}(nbrOfAlternatives), chosenAlt,altNumber,utility,availability,altNumber,utility,availability, etc.
expr7.prepare(database=my_data, number_of_draws=NUMBER_OF_DRAWS)
expr7.get_signature()
[b'<Numeric>{5158250704},1.0', b'<Beta>{4852715824}"beta1"[0],0,0', b'<UnaryMinus>{5158253968}(1),4852715824', b'<Beta>{4852723072}"beta2"[0],1,1', b'<UnaryMinus>{5158249600}(1),4852723072', b'<Beta>{4852715824}"beta1"[0],0,0', b'<UnaryMinus>{5158248736}(1),4852715824', b'<Numeric>{5158256224},1.0', b'<Numeric>{5158255120},1.0', b'<Numeric>{5158244800},1.0', b'<_bioLogLogit>{5158254496}(3),5158250704,0,5158253968,5158256224,1,5158249600,5158255120,2,5158248736,5158244800']
Derive
<Derive>{identifier},id of expression to derive,unique index of elementary expression
expr9.prepare(database=my_data, number_of_draws=NUMBER_OF_DRAWS)
expr9.get_signature()
[b'<Numeric>{5158253584},1.0', b'<Beta>{4852715824}"beta1"[0],0,0', b'<UnaryMinus>{5158253968}(1),4852715824', b'<Beta>{4852723072}"beta2"[0],1,1', b'<UnaryMinus>{5158249600}(1),4852723072', b'<Beta>{4852715824}"beta1"[0],0,0', b'<UnaryMinus>{5158248736}(1),4852715824', b'<Numeric>{5158254928},1.0', b'<Numeric>{5158244848},1.0', b'<Numeric>{5158255024},1.0', b'<_bioLogLogit>{5158253728}(3),5158253584,0,5158253968,5158254928,1,5158249600,5158244848,2,5158248736,5158255024', b'<Derive>{5158788352},5158253728,1']
Integrate
<Integrate>{identifier},id of expression to derive,index of random variable
expr4.prepare(database=my_data, number_of_draws=NUMBER_OF_DRAWS)
expr4.get_signature()
[b'<Numeric>{5158245184},0.0', b'<Numeric>{5158249072},1.0', b'<Numeric>{5158246864},1.0', b'<RandomVariable>{4907788688}"omega",0,0', b'<UnaryMinus>{5158246240}(1),4907788688', b'<exp>{5158243840}(1),5158246240', b'<Plus>{5158250992}(2),5158246864,5158243840', b'<Divide>{5158251232}(2),5158249072,5158250992', b'<Plus>{5158249552}(2),5158245184,5158251232', b'<Numeric>{5158245184},0.0', b'<Numeric>{5158249072},1.0', b'<Numeric>{5158246864},1.0', b'<RandomVariable>{4907788688}"omega",0,0', b'<UnaryMinus>{5158246240}(1),4907788688', b'<exp>{5158243840}(1),5158246240', b'<Plus>{5158250992}(2),5158246864,5158243840', b'<Divide>{5158251232}(2),5158249072,5158250992', b'<Plus>{5158249552}(2),5158245184,5158251232', b'<Times>{5158253008}(2),5158249552,5158249552', b'<Numeric>{5158254304},1.0', b'<RandomVariable>{4907788688}"omega",0,0', b'<UnaryMinus>{5158250368}(1),4907788688', b'<exp>{5158255552}(1),5158250368', b'<Times>{5158253872}(2),5158254304,5158255552', b'<Numeric>{5158245760},1.0', b'<RandomVariable>{4907788688}"omega",0,0', b'<UnaryMinus>{5158252672}(1),4907788688', b'<exp>{5158251472}(1),5158252672', b'<Plus>{5158247392}(2),5158245760,5158251472', b'<PowerConstant>{5158248256},5158247392,-2.0', b'<Times>{5158256176}(2),5158253872,5158248256', b'<Times>{5159599616}(2),5158253008,5158256176', b'<Numeric>{5158245232},1.0', b'<Divide>{5158245328}(2),5159599616,5158245232', b'<Integrate>{5158250752},5158245328,0']
Elem
<Elem>{identifier}(number_of_expressions),keyId,value1,expression1,value2,expression2, etc…
where
keyId is the identifier of the expression calculating the key,
the number of pairs valuex,expressionx must correspond to the value of number_of_expressions
elemExpr.prepare(database=my_data, number_of_draws=NUMBER_OF_DRAWS)
elemExpr.get_signature()
[b'<Variable>{4907717680}"Person",4,0', b'<Numeric>{4852712224},2.0', b'<Beta>{4852715824}"beta1"[0],0,0', b'<Times>{4852724368}(2),4852712224,4852715824', b'<Beta>{4852723072}"beta2"[0],1,1', b'<UnaryMinus>{4852724272}(1),4852723072', b'<exp>{4852720432}(1),4852724272', b'<Beta>{4852723072}"beta2"[0],1,1', b'<Beta>{4852723504}"beta3"[1],2,0', b'<Beta>{4852716160}"beta4"[1],3,1', b'<GreaterOrEqual>{4852724464}(2),4852723504,4852716160', b'<Times>{4852714528}(2),4852723072,4852724464', b'<Beta>{4852715824}"beta1"[0],0,0', b'<Beta>{4852723504}"beta3"[1],2,0', b'<Beta>{4852716160}"beta4"[1],3,1', b'<Less>{4852727152}(2),4852723504,4852716160', b'<Times>{4852713520}(2),4852715824,4852727152', b'<Plus>{4852720192}(2),4852714528,4852713520', b'<Divide>{4852714816}(2),4852720432,4852720192', b'<Minus>{4852727728}(2),4852724368,4852714816', b'<Numeric>{4907792336},2.0', b'<Beta>{4852715824}"beta1"[0],0,0', b'<Times>{4907791952}(2),4907792336,4852715824', b'<Variable>{5158109008}"Variable1",6,2', b'<Times>{4907784416}(2),4907791952,5158109008', b'<Beta>{4852723072}"beta2"[0],1,1', b'<UnaryMinus>{4907790896}(1),4852723072', b'<Variable>{5158099264}"Variable2",7,3', b'<Times>{4907783984}(2),4907790896,5158099264', b'<exp>{4907786432}(1),4907783984', b'<Beta>{4852723072}"beta2"[0],1,1', b'<Beta>{4852723504}"beta3"[1],2,0', b'<Beta>{4852716160}"beta4"[1],3,1', b'<GreaterOrEqual>{4907790608}(2),4852723504,4852716160', b'<Times>{4907791520}(2),4852723072,4907790608', b'<Beta>{4852715824}"beta1"[0],0,0', b'<Beta>{4852723504}"beta3"[1],2,0', b'<Beta>{4852716160}"beta4"[1],3,1', b'<Less>{4907791616}(2),4852723504,4852716160', b'<Times>{4907794208}(2),4852715824,4907791616', b'<Plus>{4907782832}(2),4907791520,4907794208', b'<Divide>{4907792912}(2),4907786432,4907782832', b'<Minus>{4907780432}(2),4907784416,4907792912', b'<Elem>{4907788640}(2),4907717680,1,4852727728,2,4907780432']
bioLinearUtility
<bioLinearUtility>{identifier}(numberOfTerms), beta1_exprId, beta1_uniqueId, beta1_name, variable1_exprId, variable1_uniqueId, variable1_name, etc…
where 6 entries are provided for each term:
beta1_exprId is the expression id of the Beta parameter
beta1_uniqueId is the unique id of the Beta parameter
beta1_name is the name of the parameter
variable1_exprId is the expression id of the variable
variable1_uniqueId is the unique id of the variable
variable1_name is the name of the variable
expr13.prepare(database=my_data, number_of_draws=NUMBER_OF_DRAWS)
expr13.get_signature()
[b'<Beta>{4852715824}"beta1"[0],0,0', b'<Beta>{4852723072}"beta2"[0],1,1', b'<Beta>{4852723504}"beta3"[1],2,0', b'<Variable>{5158243504}"Variable1",5,2', b'<Variable>{5158243168}"Variable2",6,3', b'<Variable>{5158244176}"newvar_b",11,8', b'<bioLinearUtility>{5158247008}(3),4852715824,0,beta1,5158243504,5,Variable1,4852723072,1,beta2,5158243168,6,Variable2,4852723504,2,beta3,5158244176,11,newvar_b']
Total running time of the script: (0 minutes 0.064 seconds)