biogeme.results

Examples of use of several functions.

This is designed for programmers who need examples of use of the functions of the module. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model.

author:

Michel Bierlaire

date:

Wed Nov 29 09:44:41 2023

import pandas as pd
from biogeme.version import getText
import biogeme.biogeme as bio
import biogeme.database as db
import biogeme.results as res
from biogeme.expressions import Beta, Variable, exp

Version of Biogeme.

print(getText())
biogeme 3.2.13 [2023-12-23]
Home page: http://biogeme.epfl.ch
Submit questions to https://groups.google.com/d/forum/biogeme
Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)

Definition of a database

df = pd.DataFrame(
    {
        'Person': [1, 1, 1, 2, 2],
        'Exclude': [0, 0, 1, 0, 1],
        'Variable1': [1, 2, 3, 4, 5],
        'Variable2': [10, 20, 30, 40, 50],
        'Choice': [1, 2, 3, 1, 2],
        'Av1': [0, 1, 1, 1, 1],
        'Av2': [1, 1, 1, 1, 1],
        'Av3': [0, 1, 1, 1, 1],
    }
)
df
Person Exclude Variable1 Variable2 Choice Av1 Av2 Av3
0 1 0 1 10 1 0 1 0
1 1 0 2 20 2 1 1 1
2 1 1 3 30 3 1 1 1
3 2 0 4 40 1 1 1 1
4 2 1 5 50 2 1 1 1


my_data = db.Database('test', df)

Definition of various expressions

Variable1 = Variable('Variable1')
Variable2 = Variable('Variable2')
beta1 = Beta('beta1', -1.0, -3, 3, 0)
beta2 = Beta('beta2', 2.0, -3, 10, 0)
likelihood = -(beta1**2) * Variable1 - exp(beta2 * beta1) * Variable2 - beta2**4
simul = beta1 / Variable1 + beta2 / Variable2
dict_of_expressions = {'loglike': likelihood, 'beta1': beta1, 'simul': simul}

Creation of the BIOGEME object

my_biogeme = bio.BIOGEME(my_data, dict_of_expressions)
my_biogeme.modelName = 'simple_example'
my_biogeme.bootstrap_samples = 10
results = my_biogeme.estimate(run_bootstrap=True)
print(results)
  0%|          | 0/10 [00:00<?, ?it/s]
100%|██████████| 10/10 [00:00<00:00, 629.72it/s]

Results for model simple_example
Output file (HTML):                     simple_example~00.html
Nbr of parameters:              2
Sample size:                    5
Excluded data:                  0
Init log likelihood:            -67.38866
Final log likelihood:           -67.06549
Likelihood ratio test (init):           0.64633
Rho square (init):                      0.0048
Rho bar square (init):                  -0.0249
Akaike Information Criterion:   138.131
Bayesian Information Criterion: 137.3499
Final gradient norm:            8.367978e-05
beta1          : -1.27[0.115 -11.1 0][0.0137 -92.8 0][0.0111 -115 0]
beta2          : 1.25[0.0848 14.7 0][0.0591 21.1 0][0.0478 26.1 0]
('beta2', 'beta1'):     0.00167 0.171   19.3    0       0.000811        1       55.6    0

Dump results on a file

the_pickle_file = results.writePickle()
print(the_pickle_file)
simple_example~01.pickle

Results can be imported from a file previously generated

read_results = res.bioResults(pickleFile=the_pickle_file)
print(read_results)
Results for model simple_example
Output file (HTML):                     simple_example~00.html
Nbr of parameters:              2
Sample size:                    5
Excluded data:                  0
Init log likelihood:            -67.38866
Final log likelihood:           -67.06549
Likelihood ratio test (init):           0.64633
Rho square (init):                      0.0048
Rho bar square (init):                  -0.0249
Akaike Information Criterion:   138.131
Bayesian Information Criterion: 137.3499
Final gradient norm:            8.367978e-05
beta1          : -1.27[0.115 -11.1 0][0.0137 -92.8 0][0.0111 -115 0]
beta2          : 1.25[0.0848 14.7 0][0.0591 21.1 0][0.0478 26.1 0]
('beta2', 'beta1'):     0.00167 0.171   19.3    0       0.000811        1       55.6    0

Results can be formatted in LaTeX

print(read_results.getLaTeX())
%% This file is designed to be included into a LaTeX document
%% See http://www.latex-project.org for information about LaTeX
%% simple_example - Report from biogeme 3.2.13 [2023-12-23]
%% biogeme 3.2.13 [2023-12-23]
%% Version entirely written in Python
%% Home page: http://biogeme.epfl.ch
%% Submit questions to https://groups.google.com/d/forum/biogeme
%% Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)

%% This file has automatically been generated on 2023-12-23 19:11:07.608823</p>

%%Database name: test

%% General statistics
\section{General statistics}
\begin{tabular}{ll}
Number of estimated parameters & 2 \\
Sample size & 5 \\
Excluded observations & 0 \\
Init log likelihood & -67.38866 \\
Final log likelihood & -67.06549 \\
Likelihood ratio test for the init. model & 0.64633 \\
Rho-square for the init. model & 0.0048 \\
Rho-square-bar for the init. model & -0.0249 \\
Akaike Information Criterion & 138.131 \\
Bayesian Information Criterion & 137.3499 \\
Final gradient norm & 8.3680E-05 \\
Bootstrapping time & 0:00:00.016320 \\
Nbr of threads & 12 \\
Relative gradient & \verb$1.5409993756401287e-06$ \\
Cause of termination & \verb$Relative gradient = 1.5e-06 <= 6.1e-06$ \\
Number of function evaluations & \verb$3$ \\
Number of gradient evaluations & \verb$3$ \\
Number of hessian evaluations & \verb$2$ \\
Algorithm & \verb$Newton with trust region for simple bound constraints$ \\
Number of iterations & \verb$2$ \\
Proportion of Hessian calculation & \verb$2/2 = 100.0%$ \\
Optimization time & \verb$0:00:00.003161$ \\
\end{tabular}

%%Parameter estimates
\section{Parameter estimates}
\begin{tabular}{lrrrr}
 & Value & Rob. Std err & Rob. t-test & Rob. p-value \\
beta1 & -1.27 & 0.0137 & -92.8 & 0.0 \\
beta2 & 1.25 & 0.0591 & 21.1 & 0.0 \\
\end{tabular}

%%Correlation
\section{Correlation}
\begin{tabular}{lrrrrrrrrrrrr}
 & Covariance & Correlation & t-test & p-value & Rob. cov. & Rob. corr. & Rob. t-test & Rob. p-value & Boot. cov. & Boot. corr. & Boot. t-test & Boot. p-value \\
beta2-beta1 & 0.00167 & 0.171 & 19.3 & 0.0 & 0.000811 & 1.0 & 55.6 & 0.0 & 0.000531 & 1.0 & 68.6 & 0.0 \\
\end{tabular}

Results can be formatted in HTML

print(read_results.getHtml())
<html>
<head>
<script src="http://transp-or.epfl.ch/biogeme/sorttable.js"></script>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>simple_example - Report from biogeme 3.2.13 [2023-12-23]</title>
<meta name="keywords" content="biogeme, discrete choice, random utility">
<meta name="description" content="Report from biogeme 3.2.13 [2023-12-23]">
<meta name="author" content="{bv.author}">
<style type=text/css>
.biostyle
        {font-size:10.0pt;
        font-weight:400;
        font-style:normal;
        font-family:Courier;}
.boundstyle
        {font-size:10.0pt;
        font-weight:400;
        font-style:normal;
        font-family:Courier;
        color:red}
</style>
</head>
<body bgcolor="#ffffff">
<p>biogeme 3.2.13 [2023-12-23]</p>
<p><a href="https://www.python.org/" target="_blank">Python</a> package</p>
<p>Home page: <a href="http://biogeme.epfl.ch" target="_blank">http://biogeme.epfl.ch</a></p>
<p>Submit questions to <a href="https://groups.google.com/d/forum/biogeme" target="_blank">https://groups.google.com/d/forum/biogeme</a></p>
<p><a href="http://people.epfl.ch/michel.bierlaire">Michel Bierlaire</a>, <a href="http://transp-or.epfl.ch">Transport and Mobility Laboratory</a>, <a href="http://www.epfl.ch">Ecole Polytechnique F&#233;d&#233;rale de Lausanne (EPFL)</a></p>
<p>This file has automatically been generated on 2023-12-23 19:11:07.630521</p>
<table>
<tr class=biostyle><td align=right><strong>Report file</strong>:        </td><td>simple_example~00.html</td></tr>
<tr class=biostyle><td align=right><strong>Database name</strong>:      </td><td>test</td></tr>
</table>
<h1>Estimation report</h1>
<table border="0">
<tr class=biostyle><td align=right ><strong>Number of estimated parameters</strong>: </td> <td>2</td></tr>
<tr class=biostyle><td align=right ><strong>Sample size</strong>: </td> <td>5</td></tr>
<tr class=biostyle><td align=right ><strong>Excluded observations</strong>: </td> <td>0</td></tr>
<tr class=biostyle><td align=right ><strong>Init log likelihood</strong>: </td> <td>-67.38866</td></tr>
<tr class=biostyle><td align=right ><strong>Final log likelihood</strong>: </td> <td>-67.06549</td></tr>
<tr class=biostyle><td align=right ><strong>Likelihood ratio test for the init. model</strong>: </td> <td>0.64633</td></tr>
<tr class=biostyle><td align=right ><strong>Rho-square for the init. model</strong>: </td> <td>0.0048</td></tr>
<tr class=biostyle><td align=right ><strong>Rho-square-bar for the init. model</strong>: </td> <td>-0.0249</td></tr>
<tr class=biostyle><td align=right ><strong>Akaike Information Criterion</strong>: </td> <td>138.131</td></tr>
<tr class=biostyle><td align=right ><strong>Bayesian Information Criterion</strong>: </td> <td>137.3499</td></tr>
<tr class=biostyle><td align=right ><strong>Final gradient norm</strong>: </td> <td>8.3680E-05</td></tr>
<tr class=biostyle><td align=right ><strong>Bootstrapping time</strong>: </td> <td>0:00:00.016320</td></tr>
<tr class=biostyle><td align=right ><strong>Nbr of threads</strong>: </td> <td>12</td></tr>
<tr class=biostyle><td align=right ><strong>Relative gradient</strong>: </td> <td>1.5409993756401287e-06</td></tr>
<tr class=biostyle><td align=right ><strong>Cause of termination</strong>: </td> <td>Relative gradient = 1.5e-06 <= 6.1e-06</td></tr>
<tr class=biostyle><td align=right ><strong>Number of function evaluations</strong>: </td> <td>3</td></tr>
<tr class=biostyle><td align=right ><strong>Number of gradient evaluations</strong>: </td> <td>3</td></tr>
<tr class=biostyle><td align=right ><strong>Number of hessian evaluations</strong>: </td> <td>2</td></tr>
<tr class=biostyle><td align=right ><strong>Algorithm</strong>: </td> <td>Newton with trust region for simple bound constraints</td></tr>
<tr class=biostyle><td align=right ><strong>Number of iterations</strong>: </td> <td>2</td></tr>
<tr class=biostyle><td align=right ><strong>Proportion of Hessian calculation</strong>: </td> <td>2/2 = 100.0%</td></tr>
<tr class=biostyle><td align=right ><strong>Optimization time</strong>: </td> <td>0:00:00.003161</td></tr>
</table>
<h1>Estimated parameters</h1>
<table border="1">
<tr class=biostyle><th>Name</th><th>Value</th><th>Rob. Std err</th><th>Rob. t-test</th><th>Rob. p-value</th></tr>
<tr class=biostyle><td>beta1</td><td>-1.27</td><td>0.0137</td><td>-92.8</td><td>0</td></tr>
<tr class=biostyle><td>beta2</td><td>1.25</td><td>0.0591</td><td>21.1</td><td>0</td></tr>
</table>
<h2>Correlation of coefficients</h2>
<table border="1">
<tr class=biostyle><th>Coefficient1</th><th>Coefficient2</th><th>Covariance</th><th>Correlation</th><th>t-test</th><th>p-value</th><th>Rob. cov.</th><th>Rob. corr.</th><th>Rob. t-test</th><th>Rob. p-value</th><th>Boot. cov.</th><th>Boot. corr.</th><th>Boot. t-test</th><th>Boot. p-value</th></tr>
<tr class=biostyle><td>beta2</td><td>beta1</td><td>0.00167</td><td>0.171</td><td>19.3</td><td>0</td><td>0.000811</td><td>1</td><td>55.6</td><td>0</td><td>0.000531</td><td>1</td><td>68.6</td><td>0</td></tr>
</table>
<p>Smallest eigenvalue: 73.054</p>
<p>Largest eigenvalue: 147.802</p>
<p>Condition number: 2.02319</p>
</html>

General statistics, including a suggested format.

statistics = read_results.getGeneralStatistics()
statistics
{'Number of estimated parameters': GeneralStatistic(value=2, format=''), 'Sample size': GeneralStatistic(value=5, format=''), 'Excluded observations': GeneralStatistic(value=0, format=''), 'Init log likelihood': GeneralStatistic(value=-67.38865550201763, format='.7g'), 'Final log likelihood': GeneralStatistic(value=-67.0654904794877, format='.7g'), 'Likelihood ratio test for the init. model': GeneralStatistic(value=0.646330045059841, format='.7g'), 'Rho-square for the init. model': GeneralStatistic(value=0.004795540438111923, format='.3g'), 'Rho-square-bar for the init. model': GeneralStatistic(value=-0.024883045447017027, format='.3g'), 'Akaike Information Criterion': GeneralStatistic(value=138.1309809589754, format='.7g'), 'Bayesian Information Criterion': GeneralStatistic(value=137.34985678384362, format='.7g'), 'Final gradient norm': GeneralStatistic(value=8.367978491514103e-05, format='.4E'), 'Bootstrapping time': GeneralStatistic(value=datetime.timedelta(microseconds=16320), format=''), 'Nbr of threads': GeneralStatistic(value=12, format='')}

The suggested format can be used as follows

for k, (v, p) in statistics.items():
    print(f'{k}:\t{v:{p}}')
Number of estimated parameters: 2
Sample size:    5
Excluded observations:  0
Init log likelihood:    -67.38866
Final log likelihood:   -67.06549
Likelihood ratio test for the init. model:      0.64633
Rho-square for the init. model: 0.0048
Rho-square-bar for the init. model:     -0.0249
Akaike Information Criterion:   138.131
Bayesian Information Criterion: 137.3499
Final gradient norm:    8.3680E-05
Bootstrapping time:     0:00:00.016320
Nbr of threads: 12

This result can be generated directly with the following function

print(results.printGeneralStatistics())
Number of estimated parameters: 2
Sample size:    5
Excluded observations:  0
Init log likelihood:    -67.38866
Final log likelihood:   -67.06549
Likelihood ratio test for the init. model:      0.64633
Rho-square for the init. model: 0.0048
Rho-square-bar for the init. model:     -0.0249
Akaike Information Criterion:   138.131
Bayesian Information Criterion: 137.3499
Final gradient norm:    8.3680E-05
Bootstrapping time:     0:00:00.016320
Nbr of threads: 12

Estimated parameters as pandas dataframe

read_results.getEstimatedParameters()
Value Rob. Std err Rob. t-test Rob. p-value
beta1 -1.273264 0.013724 -92.776551 0.0
beta2 1.248769 0.059086 21.134825 0.0


Correlation results

read_results.getCorrelationResults()
Covariance Correlation t-test p-value Rob. cov. Rob. corr. Rob. t-test Rob. p-value Boot. cov. Boot. corr. Boot. t-test Boot. p-value
beta2-beta1 0.001671 0.171121 19.280045 0.0 0.000811 1.0 55.598081 0.0 0.000531 0.999983 68.644339 0.0


Obtain the values of the parameters

read_results.getBetaValues()
{'beta1': -1.2732639720980743, 'beta2': 1.2487693919282536}

Obtain the value of one or several specific parameters

read_results.getBetaValues(myBetas=['beta2'])
{'beta2': 1.2487693919282536}

Variance-covariance matrix (Rao-Cramer)

read_results.getVarCovar()
beta1 beta2
beta1 0.013258 0.001671
beta2 0.001671 0.007196


Variance-covariance matrix (robust)

read_results.getRobustVarCovar()
beta1 beta2
beta1 0.000188 0.000811
beta2 0.000811 0.003491


Variance-covaraince matrix (bootstrap)

read_results.getBootstrapVarCovar()
beta1 beta2
beta1 0.000123 0.000531
beta2 0.000531 0.002289


Draws for sensitivity analysis are generated using bootstrapping. Any indicator can be generated by the model for each draw, and its empirical distribution can be investigate .

read_results.getBetasForSensitivityAnalysis(['beta1', 'beta2'], size=10)
[{'beta1': -1.2873325336225045, 'beta2': 1.1875198317410949}, {'beta1': -1.2823937362841489, 'beta2': 1.2091956038649867}, {'beta1': -1.2690260405244391, 'beta2': 1.2669668838389574}, {'beta1': -1.2823937362841489, 'beta2': 1.2091956038649867}, {'beta1': -1.2573978799505356, 'beta2': 1.3165120809997202}, {'beta1': -1.2873325336225045, 'beta2': 1.1875198317410949}, {'beta1': -1.269026040524439, 'beta2': 1.2669668838389574}, {'beta1': -1.2777126116343218, 'beta2': 1.2295568140736606}, {'beta1': -1.2573978799505356, 'beta2': 1.3165120809997202}, {'beta1': -1.2732639720980743, 'beta2': 1.2487693919282536}]

Results can be produced in the ALOGIT F12 format

print(read_results.getF12())
                                                                 simple_example
From biogeme 3.2.13                                     2023-12-23 19:11:07
END
   0      beta1 F  -1.273263972098e+00 +1.372398469903e-02
   0      beta2 F  +1.248769391928e+00 +5.908586258958e-02
  -1
       5                  0                   0 -6.706549047949e+01
   0   0  2023-12-23 19:11:07
  99999

Miscellaneous functions

Likelihood ratio test. Let’s first estimate a constrained model

beta2_constrained = Beta('beta2_constrained', 2.0, -3, 10, 1)
likelihood_constrained = (
    -(beta1**2) * Variable1
    - exp(beta2_constrained * beta1) * Variable2
    - beta2_constrained**4
)
my_biogeme_constrained = bio.BIOGEME(my_data, likelihood_constrained)
my_biogeme_constrained.modelName = 'simple_example_constrained'
results_constrained = my_biogeme_constrained.estimate()
print(results_constrained.short_summary())
Results for model simple_example_constrained
Nbr of parameters:              1
Sample size:                    5
Excluded data:                  0
Final log likelihood:           -114.7702
Akaike Information Criterion:   231.5403
Bayesian Information Criterion: 231.1498

We can now perform a likelihood ratio test.

test_results = results.likelihood_ratio_test(results_constrained, 0.95)
print(test_results.message)
print(f'Statistic: {test_results.statistic}')
print(f'Threshold: {test_results.threshold}')
H0 can be rejected at level 95.0%
Statistic: 95.40936413211196
Threshold: 0.003932140000019531

Calculation of the \(p\)-value

res.calcPValue(1.96)
0.04999579029644097

Compilation of results

dict_of_results = {'Model A': read_results, 'Model B': the_pickle_file}
df = res.compile_estimation_results(dict_of_results)
df
(                                       Model A         Model B
Number of estimated parameters               2               2
Sample size                                  5               5
Final log likelihood                 -67.06549       -67.06549
Akaike Information Criterion        138.130981      138.130981
Bayesian Information Criterion      137.349857      137.349857
beta1 (t-test)                  -1.27  (-92.8)  -1.27  (-92.8)
beta2 (t-test)                    1.25  (21.1)    1.25  (21.1), {'Model A': 'Model A', 'Model B': 'Model B'})

Total running time of the script: (0 minutes 0.081 seconds)

Gallery generated by Sphinx-Gallery