Read the primer.
Watch the video on data.
Watch the video on model specification.
Use the Biogeme Assistant from ChatGPT.
Biogeme is a open source Python package designed for the maximum likelihood estimation of parametric models in general, with a special emphasis on discrete choice models. It relies on the package Python Data Analysis Library called Pandas.
It is developed and maintained by Prof. Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne, Switzerland.
Biogeme used to be a stand alone software package, written in C++. All the material related to the previous versions of Biogeme are available on the old webpage.
In this release, various improvements have been made, including code reorganization and documentation, bug fixes, and new functionalities. In particular, the name of several objects and functions have been modified for a better compliance with the Python recommendations. The old syntax has been maintained, but is tagged as deprecated.
biogeme.toml
file, or directlywhen constructing the BIOGEME object.
BIOGEME is distributed free of charge. We ask each user
Disclaimer This software is provided free of charge and "AS IS" WITHOUT ANY WARRANTY of any kind. The implied warranties of merchantability, fitness for a particular purpose and non-infringement are expressly disclaimed. In no event will the author (Michel Bierlaire) or his employer (EPFL) be liable to any party for any direct, indirect, special or other consequential damages for any use of the code including, without limitation, any lost profits, business interruption, loss of programs or other data on your information handling system or otherwise, even if we are expressly advised of the possibility of such damages.
Biogeme has been developed by Michel Bierlaire, Ecole Polytechnique Fédérale de Lausanne, Switzerland.
I would like to thank the following persons who played various roles in the development of Biogeme along the years. The list is certainly not complete, and I apologize for those who are omitted: Alexandre Alahi, Nicolas Antille, Gianluca Antonini, Cristian Arteaga, Kay Axhausen, John Bates, Denis Bolduc, David Bunch, Pedro Camargo, Andrew Daly, Nicolas Dubois, Anna Fernandez Antolin, Mamy Fetiarison, Mogens Fosgerau, Emma Frejinger, Carmine Gioia, Marie-Hélène Godbout, Jason Hawkins, Stephane Hess, Tim Hillel, Richard Hurni, Eva Kazagli, Jasper Knockaert, Xinjun Lai, Gael Lederrey, Virginie Lurkin, Yousef Maknoon, Nicholas Molyneaux, Nicola Ortelli, Carolina Osorio, Meritxell Pacheco Paneque, Thomas Robin, Pascal Scheiben, Matteo Sorci, Ewout ter Hoeven, Michael Thémans, Joan Walker, Mengyi Wang.
I would like to give special thanks to Moshe Ben-Akiva and Daniel McFadden for their friendship, and for the immense influence that they had and still have on my work.
Biogeme is an open source Python package, that relies on the version 3 of Python. Make sure that Python 3.x is installed on your computer. If you have never used Python before, you may want to consider a complete platform such as Anaconda.
If Python is already installed on your computer, verify the version. Two versions of Python are distributed: version 2 and version 3. Biogeme works only with version 3.
A significant part of Biogeme is coded in C++ for the sake of computational efficiency. Since version 3.2.11, this part of the code has been isolated in a separate package called cythonbiogeme. Binaries for Mac OSX and Windowns are available for versions of Python ranging from 3.10 to 3.12. If, for some reasons, the binary distribution for your system is not available, pip will attempt to compile the package from sources. In that case, it requires a proper environment to compile C++ code. In general, it is readily available on Linux, and MacOSX (if Xcode has been installed). It may be more complicated on Windows.
The source code of CythonBiogeme is available on GitHub. There are several tutorials available on the internet such as this one or this one.
The command to install CythonBiogeme from source is
pip install .
that must be executed in the root directory, containing the file setup.py.
Note that it requires a proper environment to compile C++ code. In general, it is readily available on Linux, and MacOSX (if Xcode has been installed). On Windows, it is possible to compile cythonbiogeme with Microsoft Visual C++. See the Python documentation.
The source code of Biogeme is available on GitHub. There are several tutorials available on the internet such as this one or this one.
The command to install Biogeme from source is
pip install .
that must be executed in the root directory containing the pyproject.toml file.
Note that it does not require to compile C++ code (thanks to CythonBiogeme) and should be working in any environment where Python and CythonBiogeme are properly installed.
To verify if biogeme is correctly installed, you can print the version of Biogeme. To do so, execute the following commands in Python:
from biogeme.version import get_text
print(get_text())
Python 3.12.4 (v3.12.4:8e8a4baf65, Jun 6 2024, 17:33:18) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from biogeme.version import get_text >>> print(get_text()) biogeme 3.2.14 [2024-08-05] Home page: http://biogeme.epfl.ch Submit questions to https://groups.google.com/d/forum/biogeme Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)
It is still experimental, but there is an instance of ChatGPT dedicated to Biogeme:
If you need help, submit your questions to the users' group:
groups.google.com/d/forum/biogeme
The forum is moderated. Please keep the following in mind before posting a question:
biogeme.logging
into biogeme.biogeme_logging.py
. It was necessary because of the ambiguity with the logging
module from Python.
This release mainly implements some re-organization of the code and bugs fixes. In particular, the generic optimization algorithms are now distributed in a different package, called biogeme_optimization.
Note: versions 3.2.9 and 3.2.10 are identical. Therefore, version 3.2.9 has been removed from the official distribution platform.
DefineVariable
DefineVariable
actually defines a new column in the
database. The old syntax was:
myvar = DefineVariable('myvar', x * y + 2,
database)
The new syntax is:
myvar = database.DefineVariable('myvar', x * y +
2)
01logitBis.py
.13panel_simul.py
.recycle=True
. See the online
documentation [here].removeUnusedVariables
and displayUsedVariables
in the BIOGEME
constructor have been removed.NamedTuple
to make the code
more readable. Refer to the examples, such as
optima.py
.Note that version 3.2.7 and 3.2.8 are almost identical. The description belows compares to version 3.2.6.
algorithms.py
contains generic optimization algorithms. The
module optimization.py
contains the functions that can be called directly by
Biogeme [Click here for the documentation of
the estimate
function]. [Click
here for an example.].iter
. If the file exists, Biogeme
will initialize the parameters from this .py, and
ignore the starting values provided. To turn this feature
off, set biogeme.saveIterations=False
split
function
of the database object.
estimate
function, and
the optimization
module. See also an
example.user_notes
parameter of the
biogeme
object. See documentation. See
example.suggestScales
parameter of
the biogeme
object. See documentation.quickEstimate
performs
the estimation of the parameters, and skips the
calculation of the
statistics. See documentation. database
module allows to split the database in order to
prepare an estimation and a validation sets, for
out-of-sample
validation. See documentation. It
is used by the new function validate
in the
biogeme
module. See documentation. See example.temporarySilence
and resume
.In order to comply better with good programming practice in
Python, the syntax to import the variable names from the data
file has been modified since version 3.2.5. The file
headers.py
is not generated anymore.
The best practice is to declare every variable explicitly:
PURPOSE = Variable('PURPOSE') CHOICE = Variable('CHOICE') GA = Variable('GA') TRAIN_CO = Variable('TRAIN_CO') CAR_AV = Variable('CAR_AV') SP = Variable('SP') TRAIN_AV = Variable('TRAIN_AV') TRAIN_TT = Variable('TRAIN_TT')
If, for any reason, this explicit declaration is not desired, it is possible to replace the statement
from headers import *
by
globals().update(database.variables)
where database
is the object containing the
database, created as follows:
import biogeme.database as db
df = pd.read_csv('swissmetro.dat', '\t')
database = db.Database('swissmetro', df)
Also, in order to avoid any ambiguity, the operators used by Biogeme must be explicitly imported. For instance:
from biogeme.expressions import Beta, bioDraws, PanelLikelihoodTrajectory, MonteCarlo, log
Note that it is also possible to import all of them using the following syntax
from biogeme.expressions import *
although this is not recommended.
__mymodel.iter
exists, where mymodel
is the name of the model
to be estimated, the initial values of the parameters are read from this file.
Yes. It is actually the default behavior. At each
iteration, Biogeme creates a
file __myModel.iter
. This file will be read the
next time Biogeme tries to estimate the same model. If you want to turn this
feature off, set the parameter save_iterations
to False
in the biogeme.toml
file. See
the documentation for details.
Yes. See this example.
If the model returns a probability 0 for the chosen alternative for at least one observation in the sample, then the likelihood is 0, and the log likelihood is minus infinity.
A possible reason is when the initial value of a scale parameter is too close to zero.
There may be several other reasons for the issue. The most effective method to identify the problem’s source is to use Biogeme in simulation mode and report the probability of the chosen alternative for each observation. Once the problematic entries are identified, it becomes easier to investigate why the model returns a probability of zero.
C:\Users\[USER_NAME]\anaconda3\DLLs or C:\ProgramData\Anaconda3\DLLs
.C:\Users\[USER_NAME]\anaconda3\DLLs
or C:\ProgramData\Anaconda3\DLLs
.ImportError: dlopen(/Users/~/anaconda3/lib/python3.6/site-packages/biogeme/cbiogeme.cpython-36m-darwin.so, 2): Symbol not found: __ZNSt15__exception_ptr13exception_ptrD1Ev
It is likely to be due to a conflict of versions of Python packages. The best way to deal with it is to reinstall Biogeme in a clean environment using the following steps:
deactivate
.virtualenv -p python3.12 env_biogeme
source env_biogeme/bin/activateOn Windows:
.\env_biogeme\Scripts\activate
pip install --upgrade pipor
python -m pip install --upgrade pip
pip install biogeme
Running setup.py install for biogeme ... error Complete output from command c:\users\willi\anaconda3\python.exe -u -c "import setuptools, tokenize; __file__='C:\Users\willi\AppData\Local\Temp\pip-install-iaflhasr\biogeme\setup.py'; f=getattr(tokenize, 'open', open)(__file__); code=f.read().replace('\r\n', '\n'); f.close(); exec(compile(code, __file__, 'exec'))" install --record C:\Users\willi\AppData\Local\Temp\pip-record-v6_zn0ff\install-record.txt --single-version-externally-managed --compile: Using Cython Please put "# distutils: language=c++" in your .pyx or .pxd file(s) running installIt means that there is no binaries available for your version of Python. To check which versions are supported, go to the repository
pypi.org/project/cythonbiogeme/
For instance, the following files are available for CythonBiogeme 1.0.4:
cythonbiogeme-1.0.4.tar.gz
cythonbiogeme-1.0.4-cp312-cp312-win_amd64.whl
cythonbiogeme-1.0.4-cp312-cp312-macosx_10_9_universal2.whl
cythonbiogeme-1.0.4-cp311-cp311-win_amd64.whl
cythonbiogeme-1.0.4-cp311-cp311-macosx_10_9_universal2.whl
cythonbiogeme-1.0.4-cp310-cp310-win_amd64.whl
cythonbiogeme-1.0.4-cp310-cp310-macosx_10_9_universal2.whlIt means that you can use Python 3.10, 3.11 and 3.12 on both platforms.
Click here for the online documentation. It has been generated with the Python Documentation Generator Sphinx.
The following technical reports will walk through concrete examples to get familiar with the software.
Click here for information about the course
EPFL proposes a 5-day short course entitled "Discrete Choice Analysis: Predicting Individual Behavior and Market Demand". It is organized every year in March (occasionally in February).
Content:
Lecturers: | Prof. Moshe Ben-Akiva | Massachusetts Institute of Technology, Cambridge, Ma (USA) |
Prof. Daniel McFadden | University of Southern California [Nobel Prize Laureate, 2000] | |
Prof. Michel Bierlaire | Ecole Polytechnique Fédérale de Lausanne, Switzerland |
An online course entitled "Introduction to Discrete Choice Models" is available on the following platforms:
Click here for information about the course
MIT proposes a 5-day short course entitled "Discrete Choice Analysis: Predicting demand and market shares". It is organized every year in June.
Lecturer: Prof. Moshe Ben-Akiva, Massachusetts Institute of Technology, Cambridge, Ma (USA)
Click here for information about the course
The University of Sydney Business School offers a course taught by Prof. David Hensher, Prof. Michiel Bliemer, Prof. John Rose and Dr. Andrew Collins.
The releases of PandasBiogeme are available on the Python Package Index repository.
Previous webpages:
Around 1990, Michel Bierlaire wrote a software package called HieLoW: Hierarchical Logit for Windows. It was written in Borland C++, and was the first discrete choice estimation software with a graphical user interface. It was designed for the estimation of logit and nested logit models. The user had to specify the models through a graphical user interface. This software was distributed by Stratec SA, Brussels.
Around 2000, the first version of Biogeme was released. Written in GNU C++, it was the first open source discrete choice software. It was designed to estimate the parameters of a list of predetermined discrete choice models such as logit, binary probit, nested logit, cross-nested logit, multivariate extreme value models, discrete and continuous mixtures of multivariate extreme value models, models with nonlinear utility functions, models designed for panel data, and heteroscedastic models. The modeling language was designed to be simple, and was developed using a a general-purpose parser generator called GNU Bison. Later, it will be referred to as BisonBiogeme. The distributions can be found here.
Around 2010, a more flexible version was designed for general purpose parametric models. The modeling language was extended, and based on the Python language. A series of discrete choice models were precoded for an easy use. Also written in GNU C++, the distributions can be found here.
In 2018, a completely new version of the software was released. It was not anymore a standalone executable, but a Python package. The package is written in Python, with the exception of the core calculations of the models, that are written in C++ for the sake of efficiency. The motivation was to combine the simplicity of the usage (especially for teaching purposes), with the sophistication provided by Python (for research and applications purposes). Morever, the management of the data relies on the package Pandas, which has become the workhorse of data scientists. Therefore, the name PandasBiogeme has been adopted. It is distributed on the Python Package Index repository.