Note

Go to the end to download the full example code.

biogeme.results

Examples of use of several functions.

This is designed for programmers who need examples of use of the functions of the module. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model.

author:: Michel Bierlaire
date:: Wed Nov 29 09:44:41 2023

import pandas as pd
from biogeme.version import get_text
import biogeme.biogeme as bio
import biogeme.database as db
import biogeme.results as res
from biogeme.expressions import Beta, Variable, exp

Version of Biogeme.

print(get_text())

biogeme 3.2.14 [2024-08-05]
Home page: http://biogeme.epfl.ch
Submit questions to https://groups.google.com/d/forum/biogeme
Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)

Definition of a database

df = pd.DataFrame(
    {
        'Person': [1, 1, 1, 2, 2],
        'Exclude': [0, 0, 1, 0, 1],
        'Variable1': [1, 2, 3, 4, 5],
        'Variable2': [10, 20, 30, 40, 50],
        'Choice': [1, 2, 3, 1, 2],
        'Av1': [0, 1, 1, 1, 1],
        'Av2': [1, 1, 1, 1, 1],
        'Av3': [0, 1, 1, 1, 1],
    }
)
df

	Person	Exclude	Variable1	Variable2	Choice	Av1	Av2	Av3
0	1	0	1	10	1	0	1	0
1	1	0	2	20	2	1	1	1
2	1	1	3	30	3	1	1	1
3	2	0	4	40	1	1	1	1
4	2	1	5	50	2	1	1	1

my_data = db.Database('test', df)

Definition of various expressions

Variable1 = Variable('Variable1')
Variable2 = Variable('Variable2')
beta1 = Beta('beta1', -1.0, -3, 3, 0)
beta2 = Beta('beta2', 2.0, -3, 10, 0)
likelihood = -(beta1**2) * Variable1 - exp(beta2 * beta1) * Variable2 - beta2**4
simul = beta1 / Variable1 + beta2 / Variable2
dict_of_expressions = {'log_like': likelihood, 'beta1': beta1, 'simul': simul}

Creation of the BIOGEME object

my_biogeme = bio.BIOGEME(my_data, dict_of_expressions)
my_biogeme.modelName = 'simple_example'
my_biogeme.bootstrap_samples = 10
results = my_biogeme.estimate(run_bootstrap=True)
print(results)

  0%|          | 0/10 [00:00<?, ?it/s]
100%|██████████| 10/10 [00:00<00:00, 440.33it/s]

Results for model simple_example
Output file (HTML):                     simple_example~00.html
Nbr of parameters:              2
Sample size:                    5
Excluded data:                  0
Init log likelihood:            -67.08858
Final log likelihood:           -67.06549
Likelihood ratio test (init):           0.04618125
Rho square (init):                      0.000344
Rho bar square (init):                  -0.0295
Akaike Information Criterion:   138.131
Bayesian Information Criterion: 137.3499
Final gradient norm:            3.900312e-07
beta1          : -1.27[0.115 -11.1 0][0.0137 -92.8 0][0.0111 -115 0]
beta2          : 1.25[0.0848 14.7 0][0.0591 21.1 0][0.0478 26.1 0]
('beta2', 'beta1'):     0.00167 0.171   19.3    0       0.000811        1       55.6    0

Dump results on a file

the_pickle_file = results.write_pickle()
print(the_pickle_file)

simple_example~01.pickle

Results can be imported from a file previously generated

read_results = res.bioResults(pickle_file=the_pickle_file)
print(read_results)

Results for model simple_example
Output file (HTML):                     simple_example~00.html
Nbr of parameters:              2
Sample size:                    5
Excluded data:                  0
Init log likelihood:            -67.08858
Final log likelihood:           -67.06549
Likelihood ratio test (init):           0.04618125
Rho square (init):                      0.000344
Rho bar square (init):                  -0.0295
Akaike Information Criterion:   138.131
Bayesian Information Criterion: 137.3499
Final gradient norm:            3.900312e-07
beta1          : -1.27[0.115 -11.1 0][0.0137 -92.8 0][0.0111 -115 0]
beta2          : 1.25[0.0848 14.7 0][0.0591 21.1 0][0.0478 26.1 0]
('beta2', 'beta1'):     0.00167 0.171   19.3    0       0.000811        1       55.6    0

Results can be formatted in LaTeX

print(read_results.get_latex())

%% This file is designed to be included into a LaTeX document
%% See http://www.latex-project.org for information about LaTeX
%% simple_example - Report from biogeme 3.2.14 [2024-08-05]
%% biogeme 3.2.14 [2024-08-05]
%% Version entirely written in Python
%% Home page: http://biogeme.epfl.ch
%% Submit questions to https://groups.google.com/d/forum/biogeme
%% Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)

%% This file has automatically been generated on 2024-08-05 20:01:27.541772</p>

%%Database name: test

%% General statistics
\section{General statistics}
\begin{tabular}{ll}
Number of estimated parameters & 2 \\
Sample size & 5 \\
Excluded observations & 0 \\
Init log likelihood & -67.08858 \\
Final log likelihood & -67.06549 \\
Likelihood ratio test for the init. model & 0.04618125 \\
Rho-square for the init. model & 0.000344 \\
Rho-square-bar for the init. model & -0.0295 \\
Akaike Information Criterion & 138.131 \\
Bayesian Information Criterion & 137.3499 \\
Final gradient norm & 3.9003E-07 \\
Bootstrapping time & 0:00:00.023199 \\
Nbr of threads & 12 \\
Relative gradient & \verb$7.1957442139436635e-09$ \\
Cause of termination & \verb$Relative gradient = 7.2e-09 <= 0.00012$ \\
Number of function evaluations & \verb$3$ \\
Number of gradient evaluations & \verb$3$ \\
Number of hessian evaluations & \verb$2$ \\
Algorithm & \verb$Newton with trust region for simple bound constraints$ \\
Number of iterations & \verb$2$ \\
Proportion of Hessian calculation & \verb$2/2 = 100.0%$ \\
Optimization time & \verb$0:00:00.003449$ \\
\end{tabular}

%%Parameter estimates
\section{Parameter estimates}
\begin{tabular}{lrrrr}
 & Value & Rob. Std err & Rob. t-test & Rob. p-value \\
beta1 & -1.27 & 0.0137 & -92.8 & 0.0 \\
beta2 & 1.25 & 0.0591 & 21.1 & 0.0 \\
\end{tabular}

%%Correlation
\section{Correlation}
\begin{tabular}{lrrrrrrrrrrrr}
 & Covariance & Correlation & t-test & p-value & Rob. cov. & Rob. corr. & Rob. t-test & Rob. p-value & Boot. cov. & Boot. corr. & Boot. t-test & Boot. p-value \\
beta2-beta1 & 0.00167 & 0.171 & 19.3 & 0.0 & 0.000811 & 1.0 & 55.6 & 0.0 & 0.000531 & 1.0 & 68.6 & 0.0 \\
\end{tabular}

Results can be formatted in HTML

print(read_results.get_html())

<html>
<head>
<script src="http://transp-or.epfl.ch/biogeme/sorttable.js"></script>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>simple_example - Report from biogeme 3.2.14 [2024-08-05]</title>
<meta name="keywords" content="biogeme, discrete choice, random utility">
<meta name="description" content="Report from biogeme 3.2.14 [2024-08-05]">
<meta name="author" content="{bv.author}">
<style type=text/css>
.biostyle
        {font-size:10.0pt;
        font-weight:400;
        font-style:normal;
        font-family:Courier;}
.boundstyle
        {font-size:10.0pt;
        font-weight:400;
        font-style:normal;
        font-family:Courier;
        color:red}
</style>
</head>
<body bgcolor="#ffffff">
<p>biogeme 3.2.14 [2024-08-05]</p>
<p><a href="https://www.python.org/" target="_blank">Python</a> package</p>
<p>Home page: <a href="http://biogeme.epfl.ch" target="_blank">http://biogeme.epfl.ch</a></p>
<p>Submit questions to <a href="https://groups.google.com/d/forum/biogeme" target="_blank">https://groups.google.com/d/forum/biogeme</a></p>
<p><a href="http://people.epfl.ch/michel.bierlaire">Michel Bierlaire</a>, <a href="http://transp-or.epfl.ch">Transport and Mobility Laboratory</a>, <a href="http://www.epfl.ch">Ecole Polytechnique F&#233;d&#233;rale de Lausanne (EPFL)</a></p>
<p>This file has automatically been generated on 2024-08-05 20:01:27.544784</p>
<table>
<tr class=biostyle><td align=right><strong>Report file</strong>:        </td><td>simple_example~00.html</td></tr>
<tr class=biostyle><td align=right><strong>Database name</strong>:      </td><td>test</td></tr>
</table>
<h1>Estimation report</h1>
<table border="0">
<tr class=biostyle><td align=right ><strong>Number of estimated parameters</strong>: </td> <td>2</td></tr>
<tr class=biostyle><td align=right ><strong>Sample size</strong>: </td> <td>5</td></tr>
<tr class=biostyle><td align=right ><strong>Excluded observations</strong>: </td> <td>0</td></tr>
<tr class=biostyle><td align=right ><strong>Init log likelihood</strong>: </td> <td>-67.08858</td></tr>
<tr class=biostyle><td align=right ><strong>Final log likelihood</strong>: </td> <td>-67.06549</td></tr>
<tr class=biostyle><td align=right ><strong>Likelihood ratio test for the init. model</strong>: </td> <td>0.04618125</td></tr>
<tr class=biostyle><td align=right ><strong>Rho-square for the init. model</strong>: </td> <td>0.000344</td></tr>
<tr class=biostyle><td align=right ><strong>Rho-square-bar for the init. model</strong>: </td> <td>-0.0295</td></tr>
<tr class=biostyle><td align=right ><strong>Akaike Information Criterion</strong>: </td> <td>138.131</td></tr>
<tr class=biostyle><td align=right ><strong>Bayesian Information Criterion</strong>: </td> <td>137.3499</td></tr>
<tr class=biostyle><td align=right ><strong>Final gradient norm</strong>: </td> <td>3.9003E-07</td></tr>
<tr class=biostyle><td align=right ><strong>Bootstrapping time</strong>: </td> <td>0:00:00.023199</td></tr>
<tr class=biostyle><td align=right ><strong>Nbr of threads</strong>: </td> <td>12</td></tr>
<tr class=biostyle><td align=right ><strong>Relative gradient</strong>: </td> <td>7.1957442139436635e-09</td></tr>
<tr class=biostyle><td align=right ><strong>Cause of termination</strong>: </td> <td>Relative gradient = 7.2e-09 <= 0.00012</td></tr>
<tr class=biostyle><td align=right ><strong>Number of function evaluations</strong>: </td> <td>3</td></tr>
<tr class=biostyle><td align=right ><strong>Number of gradient evaluations</strong>: </td> <td>3</td></tr>
<tr class=biostyle><td align=right ><strong>Number of hessian evaluations</strong>: </td> <td>2</td></tr>
<tr class=biostyle><td align=right ><strong>Algorithm</strong>: </td> <td>Newton with trust region for simple bound constraints</td></tr>
<tr class=biostyle><td align=right ><strong>Number of iterations</strong>: </td> <td>2</td></tr>
<tr class=biostyle><td align=right ><strong>Proportion of Hessian calculation</strong>: </td> <td>2/2 = 100.0%</td></tr>
<tr class=biostyle><td align=right ><strong>Optimization time</strong>: </td> <td>0:00:00.003449</td></tr>
</table>
<h1>Estimated parameters</h1>
<table border="1">
<tr class=biostyle><th>Name</th><th>Value</th><th>Rob. Std err</th><th>Rob. t-test</th><th>Rob. p-value</th></tr>
<tr class=biostyle><td>beta1</td><td>-1.27</td><td>0.0137</td><td>-92.8</td><td>0</td></tr>
<tr class=biostyle><td>beta2</td><td>1.25</td><td>0.0591</td><td>21.1</td><td>0</td></tr>
</table>
<h2>Correlation of coefficients</h2>
<table border="1">
<tr class=biostyle><th>Coefficient1</th><th>Coefficient2</th><th>Covariance</th><th>Correlation</th><th>t-test</th><th>p-value</th><th>Rob. cov.</th><th>Rob. corr.</th><th>Rob. t-test</th><th>Rob. p-value</th><th>Boot. cov.</th><th>Boot. corr.</th><th>Boot. t-test</th><th>Boot. p-value</th></tr>
<tr class=biostyle><td>beta2</td><td>beta1</td><td>0.00167</td><td>0.171</td><td>19.3</td><td>0</td><td>0.000811</td><td>1</td><td>55.6</td><td>0</td><td>0.000531</td><td>1</td><td>68.6</td><td>0</td></tr>
</table>
<p>Smallest eigenvalue: 73.054</p>
<p>Largest eigenvalue: 147.802</p>
<p>Condition number: 2.02318</p>
</html>

General statistics, including a suggested format.

statistics = read_results.get_general_statistics()
statistics

{'Number of estimated parameters': GeneralStatistic(value=2, format=''), 'Sample size': GeneralStatistic(value=5, format=''), 'Excluded observations': GeneralStatistic(value=0, format=''), 'Init log likelihood': GeneralStatistic(value=-67.08858110351106, format='.7g'), 'Final log likelihood': GeneralStatistic(value=-67.06549047946356, format='.7g'), 'Likelihood ratio test for the init. model': GeneralStatistic(value=0.04618124809499591, format='.7g'), 'Rho-square for the init. model': GeneralStatistic(value=np.float64(0.00034418113586087706), format='.3g'), 'Rho-square-bar for the init. model': GeneralStatistic(value=np.float64(-0.029467151390522472), format='.3g'), 'Akaike Information Criterion': GeneralStatistic(value=138.13098095892713, format='.7g'), 'Bayesian Information Criterion': GeneralStatistic(value=np.float64(137.34985678379533), format='.7g'), 'Final gradient norm': GeneralStatistic(value=3.9003122659206894e-07, format='.4E'), 'Bootstrapping time': GeneralStatistic(value=datetime.timedelta(microseconds=23199), format=''), 'Nbr of threads': GeneralStatistic(value=12, format='')}

The suggested format can be used as follows

for k, (v, p) in statistics.items():
    print(f'{k}:\t{v:{p}}')

Number of estimated parameters: 2
Sample size:    5
Excluded observations:  0
Init log likelihood:    -67.08858
Final log likelihood:   -67.06549
Likelihood ratio test for the init. model:      0.04618125
Rho-square for the init. model: 0.000344
Rho-square-bar for the init. model:     -0.0295
Akaike Information Criterion:   138.131
Bayesian Information Criterion: 137.3499
Final gradient norm:    3.9003E-07
Bootstrapping time:     0:00:00.023199
Nbr of threads: 12

This result can be generated directly with the following function

print(results.print_general_statistics())

Number of estimated parameters: 2
Sample size:    5
Excluded observations:  0
Init log likelihood:    -67.08858
Final log likelihood:   -67.06549
Likelihood ratio test for the init. model:      0.04618125
Rho-square for the init. model: 0.000344
Rho-square-bar for the init. model:     -0.0295
Akaike Information Criterion:   138.131
Bayesian Information Criterion: 137.3499
Final gradient norm:    3.9003E-07
Bootstrapping time:     0:00:00.023199
Nbr of threads: 12

Estimated parameters as pandas dataframe

read_results.get_estimated_parameters()

	Value	Rob. Std err	Rob. t-test	Rob. p-value
beta1	-1.273264	0.013724	-92.776663	0.0
beta2	1.248769	0.059086	21.134794	0.0

Correlation results

read_results.get_correlation_results()

	Covariance	Correlation	t-test	p-value	Rob. cov.	Rob. corr.	Rob. t-test	Rob. p-value	Boot. cov.	Boot. corr.	Boot. t-test	Boot. p-value
beta2-beta1	0.001671	0.171121	19.280039	0.0	0.000811	1.0	55.597975	0.0	0.000531	0.999983	68.644336	0.0

Obtain the values of the parameters

read_results.get_beta_values()

{'beta1': np.float64(-1.2732639874991436), 'beta2': np.float64(1.2487688117902658)}

Obtain the value of one or several specific parameters

read_results.get_beta_values(my_betas=['beta2'])

{'beta2': np.float64(1.2487688117902658)}

Variance-covariance matrix (Rao-Cramer)

read_results.get_var_covar()

	beta1	beta2
beta1	0.013258	0.001671
beta2	0.001671	0.007196

Variance-covariance matrix (robust)

read_results.get_robust_var_covar()

	beta1	beta2
beta1	0.000188	0.000811
beta2	0.000811	0.003491

Variance-covaraince matrix (bootstrap)

read_results.get_bootstrap_var_covar()

	beta1	beta2
beta1	0.000123	0.000531
beta2	0.000531	0.002289

Draws for sensitivity analysis are generated using bootstrapping. Any indicator can be generated by the model for each draw, and its empirical distribution can be investigated.

read_results.get_betas_for_sensitivity_analysis(['beta1', 'beta2'], size=10)

[{'beta1': np.float64(-1.2873325336200956), 'beta2': np.float64(1.1875198317199467)}, {'beta1': np.float64(-1.2823937362836413), 'beta2': np.float64(1.209195603860152)}, {'beta1': np.float64(-1.2690260405244642), 'beta2': np.float64(1.2669668838392414)}, {'beta1': np.float64(-1.2823937362836415), 'beta2': np.float64(1.209195603860152)}, {'beta1': np.float64(-1.257397879951302), 'beta2': np.float64(1.31651208100832)}, {'beta1': np.float64(-1.2873325336200956), 'beta2': np.float64(1.1875198317199467)}, {'beta1': np.float64(-1.2690260405244642), 'beta2': np.float64(1.2669668838392414)}, {'beta1': np.float64(-1.2777126116342759), 'beta2': np.float64(1.229556814073192)}, {'beta1': np.float64(-1.257397879951302), 'beta2': np.float64(1.31651208100832)}, {'beta1': np.float64(-1.2732639874991436), 'beta2': np.float64(1.2487688117902658)}]

Results can be produced in the ALOGIT F12 format

print(read_results.get_f12())

                                                                 simple_example
From biogeme 3.2.14                                     2024-08-05 20:01:27
END
   0      beta1 F  -1.273263987499e+00 +1.372396830767e-02
   0      beta2 F  +1.248768811790e+00 +5.908592227624e-02
  -1
       5                  0                   0 -6.706549047946e+01
   0   0  2024-08-05 20:01:27
 100000

Miscellaneous functions

Likelihood ratio test. Let’s first estimate a constrained model

beta2_constrained = Beta('beta2_constrained', 2.0, -3, 10, 1)
likelihood_constrained = (
    -(beta1**2) * Variable1
    - exp(beta2_constrained * beta1) * Variable2
    - beta2_constrained**4
)
my_biogeme_constrained = bio.BIOGEME(my_data, likelihood_constrained)
my_biogeme_constrained.modelName = 'simple_example_constrained'
results_constrained = my_biogeme_constrained.estimate()
print(results_constrained.short_summary())

Results for model simple_example_constrained
Nbr of parameters:              1
Sample size:                    5
Excluded data:                  0
Final log likelihood:           -114.7702
Akaike Information Criterion:   231.5403
Bayesian Information Criterion: 231.1498

We can now perform a likelihood ratio test.

test_results = results.likelihood_ratio_test(results_constrained, 0.95)
print(test_results.message)
print(f'Statistic: {test_results.statistic}')
print(f'Threshold: {test_results.threshold}')

H0 can be rejected at level 95.0%
Statistic: 95.40936413216042
Threshold: 0.003932140000019531

Calculation of the \(p\)-value

res.calc_p_value(1.96)

np.float64(0.04999579029644097)

Compilation of results

dict_of_results = {'Model A': read_results, 'Model B': the_pickle_file}

df = res.compile_estimation_results(dict_of_results)
df

(                                       Model A         Model B
Number of estimated parameters               2               2
Sample size                                  5               5
Final log likelihood                 -67.06549       -67.06549
Akaike Information Criterion        138.130981      138.130981
Bayesian Information Criterion      137.349857      137.349857
beta1 (t-test)                  -1.27  (-92.8)  -1.27  (-92.8)
beta2 (t-test)                    1.25  (21.1)    1.25  (21.1), {'Model A': 'Model A', 'Model B': 'Model B'})

Total running time of the script: (0 minutes 0.066 seconds)

Gallery generated by Sphinx-Gallery