Note
Go to the end to download the full example code.
biogeme.results
Examples of use of several functions.
This is designed for programmers who need examples of use of the functions of the module. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model.
- author:
Michel Bierlaire
- date:
Wed Nov 29 09:44:41 2023
import pandas as pd
from biogeme.version import get_text
import biogeme.biogeme as bio
import biogeme.database as db
import biogeme.results as res
from biogeme.expressions import Beta, Variable, exp
Version of Biogeme.
print(get_text())
biogeme 3.2.14 [2024-08-05]
Home page: http://biogeme.epfl.ch
Submit questions to https://groups.google.com/d/forum/biogeme
Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)
Definition of a database
df = pd.DataFrame(
{
'Person': [1, 1, 1, 2, 2],
'Exclude': [0, 0, 1, 0, 1],
'Variable1': [1, 2, 3, 4, 5],
'Variable2': [10, 20, 30, 40, 50],
'Choice': [1, 2, 3, 1, 2],
'Av1': [0, 1, 1, 1, 1],
'Av2': [1, 1, 1, 1, 1],
'Av3': [0, 1, 1, 1, 1],
}
)
df
my_data = db.Database('test', df)
Definition of various expressions
Variable1 = Variable('Variable1')
Variable2 = Variable('Variable2')
beta1 = Beta('beta1', -1.0, -3, 3, 0)
beta2 = Beta('beta2', 2.0, -3, 10, 0)
likelihood = -(beta1**2) * Variable1 - exp(beta2 * beta1) * Variable2 - beta2**4
simul = beta1 / Variable1 + beta2 / Variable2
dict_of_expressions = {'log_like': likelihood, 'beta1': beta1, 'simul': simul}
Creation of the BIOGEME object
my_biogeme = bio.BIOGEME(my_data, dict_of_expressions)
my_biogeme.modelName = 'simple_example'
my_biogeme.bootstrap_samples = 10
results = my_biogeme.estimate(run_bootstrap=True)
print(results)
0%| | 0/10 [00:00<?, ?it/s]
100%|██████████| 10/10 [00:00<00:00, 440.33it/s]
Results for model simple_example
Output file (HTML): simple_example~00.html
Nbr of parameters: 2
Sample size: 5
Excluded data: 0
Init log likelihood: -67.08858
Final log likelihood: -67.06549
Likelihood ratio test (init): 0.04618125
Rho square (init): 0.000344
Rho bar square (init): -0.0295
Akaike Information Criterion: 138.131
Bayesian Information Criterion: 137.3499
Final gradient norm: 3.900312e-07
beta1 : -1.27[0.115 -11.1 0][0.0137 -92.8 0][0.0111 -115 0]
beta2 : 1.25[0.0848 14.7 0][0.0591 21.1 0][0.0478 26.1 0]
('beta2', 'beta1'): 0.00167 0.171 19.3 0 0.000811 1 55.6 0
Dump results on a file
the_pickle_file = results.write_pickle()
print(the_pickle_file)
simple_example~01.pickle
Results can be imported from a file previously generated
read_results = res.bioResults(pickle_file=the_pickle_file)
print(read_results)
Results for model simple_example
Output file (HTML): simple_example~00.html
Nbr of parameters: 2
Sample size: 5
Excluded data: 0
Init log likelihood: -67.08858
Final log likelihood: -67.06549
Likelihood ratio test (init): 0.04618125
Rho square (init): 0.000344
Rho bar square (init): -0.0295
Akaike Information Criterion: 138.131
Bayesian Information Criterion: 137.3499
Final gradient norm: 3.900312e-07
beta1 : -1.27[0.115 -11.1 0][0.0137 -92.8 0][0.0111 -115 0]
beta2 : 1.25[0.0848 14.7 0][0.0591 21.1 0][0.0478 26.1 0]
('beta2', 'beta1'): 0.00167 0.171 19.3 0 0.000811 1 55.6 0
Results can be formatted in LaTeX
print(read_results.get_latex())
%% This file is designed to be included into a LaTeX document
%% See http://www.latex-project.org for information about LaTeX
%% simple_example - Report from biogeme 3.2.14 [2024-08-05]
%% biogeme 3.2.14 [2024-08-05]
%% Version entirely written in Python
%% Home page: http://biogeme.epfl.ch
%% Submit questions to https://groups.google.com/d/forum/biogeme
%% Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)
%% This file has automatically been generated on 2024-08-05 20:01:27.541772</p>
%%Database name: test
%% General statistics
\section{General statistics}
\begin{tabular}{ll}
Number of estimated parameters & 2 \\
Sample size & 5 \\
Excluded observations & 0 \\
Init log likelihood & -67.08858 \\
Final log likelihood & -67.06549 \\
Likelihood ratio test for the init. model & 0.04618125 \\
Rho-square for the init. model & 0.000344 \\
Rho-square-bar for the init. model & -0.0295 \\
Akaike Information Criterion & 138.131 \\
Bayesian Information Criterion & 137.3499 \\
Final gradient norm & 3.9003E-07 \\
Bootstrapping time & 0:00:00.023199 \\
Nbr of threads & 12 \\
Relative gradient & \verb$7.1957442139436635e-09$ \\
Cause of termination & \verb$Relative gradient = 7.2e-09 <= 0.00012$ \\
Number of function evaluations & \verb$3$ \\
Number of gradient evaluations & \verb$3$ \\
Number of hessian evaluations & \verb$2$ \\
Algorithm & \verb$Newton with trust region for simple bound constraints$ \\
Number of iterations & \verb$2$ \\
Proportion of Hessian calculation & \verb$2/2 = 100.0%$ \\
Optimization time & \verb$0:00:00.003449$ \\
\end{tabular}
%%Parameter estimates
\section{Parameter estimates}
\begin{tabular}{lrrrr}
& Value & Rob. Std err & Rob. t-test & Rob. p-value \\
beta1 & -1.27 & 0.0137 & -92.8 & 0.0 \\
beta2 & 1.25 & 0.0591 & 21.1 & 0.0 \\
\end{tabular}
%%Correlation
\section{Correlation}
\begin{tabular}{lrrrrrrrrrrrr}
& Covariance & Correlation & t-test & p-value & Rob. cov. & Rob. corr. & Rob. t-test & Rob. p-value & Boot. cov. & Boot. corr. & Boot. t-test & Boot. p-value \\
beta2-beta1 & 0.00167 & 0.171 & 19.3 & 0.0 & 0.000811 & 1.0 & 55.6 & 0.0 & 0.000531 & 1.0 & 68.6 & 0.0 \\
\end{tabular}
Results can be formatted in HTML
print(read_results.get_html())
<html>
<head>
<script src="http://transp-or.epfl.ch/biogeme/sorttable.js"></script>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>simple_example - Report from biogeme 3.2.14 [2024-08-05]</title>
<meta name="keywords" content="biogeme, discrete choice, random utility">
<meta name="description" content="Report from biogeme 3.2.14 [2024-08-05]">
<meta name="author" content="{bv.author}">
<style type=text/css>
.biostyle
{font-size:10.0pt;
font-weight:400;
font-style:normal;
font-family:Courier;}
.boundstyle
{font-size:10.0pt;
font-weight:400;
font-style:normal;
font-family:Courier;
color:red}
</style>
</head>
<body bgcolor="#ffffff">
<p>biogeme 3.2.14 [2024-08-05]</p>
<p><a href="https://www.python.org/" target="_blank">Python</a> package</p>
<p>Home page: <a href="http://biogeme.epfl.ch" target="_blank">http://biogeme.epfl.ch</a></p>
<p>Submit questions to <a href="https://groups.google.com/d/forum/biogeme" target="_blank">https://groups.google.com/d/forum/biogeme</a></p>
<p><a href="http://people.epfl.ch/michel.bierlaire">Michel Bierlaire</a>, <a href="http://transp-or.epfl.ch">Transport and Mobility Laboratory</a>, <a href="http://www.epfl.ch">Ecole Polytechnique Fédérale de Lausanne (EPFL)</a></p>
<p>This file has automatically been generated on 2024-08-05 20:01:27.544784</p>
<table>
<tr class=biostyle><td align=right><strong>Report file</strong>: </td><td>simple_example~00.html</td></tr>
<tr class=biostyle><td align=right><strong>Database name</strong>: </td><td>test</td></tr>
</table>
<h1>Estimation report</h1>
<table border="0">
<tr class=biostyle><td align=right ><strong>Number of estimated parameters</strong>: </td> <td>2</td></tr>
<tr class=biostyle><td align=right ><strong>Sample size</strong>: </td> <td>5</td></tr>
<tr class=biostyle><td align=right ><strong>Excluded observations</strong>: </td> <td>0</td></tr>
<tr class=biostyle><td align=right ><strong>Init log likelihood</strong>: </td> <td>-67.08858</td></tr>
<tr class=biostyle><td align=right ><strong>Final log likelihood</strong>: </td> <td>-67.06549</td></tr>
<tr class=biostyle><td align=right ><strong>Likelihood ratio test for the init. model</strong>: </td> <td>0.04618125</td></tr>
<tr class=biostyle><td align=right ><strong>Rho-square for the init. model</strong>: </td> <td>0.000344</td></tr>
<tr class=biostyle><td align=right ><strong>Rho-square-bar for the init. model</strong>: </td> <td>-0.0295</td></tr>
<tr class=biostyle><td align=right ><strong>Akaike Information Criterion</strong>: </td> <td>138.131</td></tr>
<tr class=biostyle><td align=right ><strong>Bayesian Information Criterion</strong>: </td> <td>137.3499</td></tr>
<tr class=biostyle><td align=right ><strong>Final gradient norm</strong>: </td> <td>3.9003E-07</td></tr>
<tr class=biostyle><td align=right ><strong>Bootstrapping time</strong>: </td> <td>0:00:00.023199</td></tr>
<tr class=biostyle><td align=right ><strong>Nbr of threads</strong>: </td> <td>12</td></tr>
<tr class=biostyle><td align=right ><strong>Relative gradient</strong>: </td> <td>7.1957442139436635e-09</td></tr>
<tr class=biostyle><td align=right ><strong>Cause of termination</strong>: </td> <td>Relative gradient = 7.2e-09 <= 0.00012</td></tr>
<tr class=biostyle><td align=right ><strong>Number of function evaluations</strong>: </td> <td>3</td></tr>
<tr class=biostyle><td align=right ><strong>Number of gradient evaluations</strong>: </td> <td>3</td></tr>
<tr class=biostyle><td align=right ><strong>Number of hessian evaluations</strong>: </td> <td>2</td></tr>
<tr class=biostyle><td align=right ><strong>Algorithm</strong>: </td> <td>Newton with trust region for simple bound constraints</td></tr>
<tr class=biostyle><td align=right ><strong>Number of iterations</strong>: </td> <td>2</td></tr>
<tr class=biostyle><td align=right ><strong>Proportion of Hessian calculation</strong>: </td> <td>2/2 = 100.0%</td></tr>
<tr class=biostyle><td align=right ><strong>Optimization time</strong>: </td> <td>0:00:00.003449</td></tr>
</table>
<h1>Estimated parameters</h1>
<table border="1">
<tr class=biostyle><th>Name</th><th>Value</th><th>Rob. Std err</th><th>Rob. t-test</th><th>Rob. p-value</th></tr>
<tr class=biostyle><td>beta1</td><td>-1.27</td><td>0.0137</td><td>-92.8</td><td>0</td></tr>
<tr class=biostyle><td>beta2</td><td>1.25</td><td>0.0591</td><td>21.1</td><td>0</td></tr>
</table>
<h2>Correlation of coefficients</h2>
<table border="1">
<tr class=biostyle><th>Coefficient1</th><th>Coefficient2</th><th>Covariance</th><th>Correlation</th><th>t-test</th><th>p-value</th><th>Rob. cov.</th><th>Rob. corr.</th><th>Rob. t-test</th><th>Rob. p-value</th><th>Boot. cov.</th><th>Boot. corr.</th><th>Boot. t-test</th><th>Boot. p-value</th></tr>
<tr class=biostyle><td>beta2</td><td>beta1</td><td>0.00167</td><td>0.171</td><td>19.3</td><td>0</td><td>0.000811</td><td>1</td><td>55.6</td><td>0</td><td>0.000531</td><td>1</td><td>68.6</td><td>0</td></tr>
</table>
<p>Smallest eigenvalue: 73.054</p>
<p>Largest eigenvalue: 147.802</p>
<p>Condition number: 2.02318</p>
</html>
General statistics, including a suggested format.
statistics = read_results.get_general_statistics()
statistics
{'Number of estimated parameters': GeneralStatistic(value=2, format=''), 'Sample size': GeneralStatistic(value=5, format=''), 'Excluded observations': GeneralStatistic(value=0, format=''), 'Init log likelihood': GeneralStatistic(value=-67.08858110351106, format='.7g'), 'Final log likelihood': GeneralStatistic(value=-67.06549047946356, format='.7g'), 'Likelihood ratio test for the init. model': GeneralStatistic(value=0.04618124809499591, format='.7g'), 'Rho-square for the init. model': GeneralStatistic(value=np.float64(0.00034418113586087706), format='.3g'), 'Rho-square-bar for the init. model': GeneralStatistic(value=np.float64(-0.029467151390522472), format='.3g'), 'Akaike Information Criterion': GeneralStatistic(value=138.13098095892713, format='.7g'), 'Bayesian Information Criterion': GeneralStatistic(value=np.float64(137.34985678379533), format='.7g'), 'Final gradient norm': GeneralStatistic(value=3.9003122659206894e-07, format='.4E'), 'Bootstrapping time': GeneralStatistic(value=datetime.timedelta(microseconds=23199), format=''), 'Nbr of threads': GeneralStatistic(value=12, format='')}
The suggested format can be used as follows
for k, (v, p) in statistics.items():
print(f'{k}:\t{v:{p}}')
Number of estimated parameters: 2
Sample size: 5
Excluded observations: 0
Init log likelihood: -67.08858
Final log likelihood: -67.06549
Likelihood ratio test for the init. model: 0.04618125
Rho-square for the init. model: 0.000344
Rho-square-bar for the init. model: -0.0295
Akaike Information Criterion: 138.131
Bayesian Information Criterion: 137.3499
Final gradient norm: 3.9003E-07
Bootstrapping time: 0:00:00.023199
Nbr of threads: 12
This result can be generated directly with the following function
print(results.print_general_statistics())
Number of estimated parameters: 2
Sample size: 5
Excluded observations: 0
Init log likelihood: -67.08858
Final log likelihood: -67.06549
Likelihood ratio test for the init. model: 0.04618125
Rho-square for the init. model: 0.000344
Rho-square-bar for the init. model: -0.0295
Akaike Information Criterion: 138.131
Bayesian Information Criterion: 137.3499
Final gradient norm: 3.9003E-07
Bootstrapping time: 0:00:00.023199
Nbr of threads: 12
Estimated parameters as pandas dataframe
read_results.get_estimated_parameters()
Correlation results
read_results.get_correlation_results()
Obtain the values of the parameters
read_results.get_beta_values()
{'beta1': np.float64(-1.2732639874991436), 'beta2': np.float64(1.2487688117902658)}
Obtain the value of one or several specific parameters
read_results.get_beta_values(my_betas=['beta2'])
{'beta2': np.float64(1.2487688117902658)}
Variance-covariance matrix (Rao-Cramer)
read_results.get_var_covar()
Variance-covariance matrix (robust)
read_results.get_robust_var_covar()
Variance-covaraince matrix (bootstrap)
read_results.get_bootstrap_var_covar()
Draws for sensitivity analysis are generated using bootstrapping. Any indicator can be generated by the model for each draw, and its empirical distribution can be investigated.
read_results.get_betas_for_sensitivity_analysis(['beta1', 'beta2'], size=10)
[{'beta1': np.float64(-1.2873325336200956), 'beta2': np.float64(1.1875198317199467)}, {'beta1': np.float64(-1.2823937362836413), 'beta2': np.float64(1.209195603860152)}, {'beta1': np.float64(-1.2690260405244642), 'beta2': np.float64(1.2669668838392414)}, {'beta1': np.float64(-1.2823937362836415), 'beta2': np.float64(1.209195603860152)}, {'beta1': np.float64(-1.257397879951302), 'beta2': np.float64(1.31651208100832)}, {'beta1': np.float64(-1.2873325336200956), 'beta2': np.float64(1.1875198317199467)}, {'beta1': np.float64(-1.2690260405244642), 'beta2': np.float64(1.2669668838392414)}, {'beta1': np.float64(-1.2777126116342759), 'beta2': np.float64(1.229556814073192)}, {'beta1': np.float64(-1.257397879951302), 'beta2': np.float64(1.31651208100832)}, {'beta1': np.float64(-1.2732639874991436), 'beta2': np.float64(1.2487688117902658)}]
Results can be produced in the ALOGIT F12 format
print(read_results.get_f12())
simple_example
From biogeme 3.2.14 2024-08-05 20:01:27
END
0 beta1 F -1.273263987499e+00 +1.372396830767e-02
0 beta2 F +1.248768811790e+00 +5.908592227624e-02
-1
5 0 0 -6.706549047946e+01
0 0 2024-08-05 20:01:27
100000
Miscellaneous functions
Likelihood ratio test. Let’s first estimate a constrained model
beta2_constrained = Beta('beta2_constrained', 2.0, -3, 10, 1)
likelihood_constrained = (
-(beta1**2) * Variable1
- exp(beta2_constrained * beta1) * Variable2
- beta2_constrained**4
)
my_biogeme_constrained = bio.BIOGEME(my_data, likelihood_constrained)
my_biogeme_constrained.modelName = 'simple_example_constrained'
results_constrained = my_biogeme_constrained.estimate()
print(results_constrained.short_summary())
Results for model simple_example_constrained
Nbr of parameters: 1
Sample size: 5
Excluded data: 0
Final log likelihood: -114.7702
Akaike Information Criterion: 231.5403
Bayesian Information Criterion: 231.1498
We can now perform a likelihood ratio test.
test_results = results.likelihood_ratio_test(results_constrained, 0.95)
print(test_results.message)
print(f'Statistic: {test_results.statistic}')
print(f'Threshold: {test_results.threshold}')
H0 can be rejected at level 95.0%
Statistic: 95.40936413216042
Threshold: 0.003932140000019531
Calculation of the \(p\)-value
res.calc_p_value(1.96)
np.float64(0.04999579029644097)
Compilation of results
dict_of_results = {'Model A': read_results, 'Model B': the_pickle_file}
df = res.compile_estimation_results(dict_of_results)
df
( Model A Model B
Number of estimated parameters 2 2
Sample size 5 5
Final log likelihood -67.06549 -67.06549
Akaike Information Criterion 138.130981 138.130981
Bayesian Information Criterion 137.349857 137.349857
beta1 (t-test) -1.27 (-92.8) -1.27 (-92.8)
beta2 (t-test) 1.25 (21.1) 1.25 (21.1), {'Model A': 'Model A', 'Model B': 'Model B'})
Total running time of the script: (0 minutes 0.066 seconds)