# Utilities

- mod2py: transforms a model specification for BisonBiogeme into a specification for PythonBiogeme
- Excel sheet for likelihood ratio tests
- Matlab code to compute the correlation structure of a cross-nested logit model.
- Excel sheet to calculate the variance of ratio of estimates.
- Script to check the compatibility of the data set with the Biogeme data format.
- Script to prepare a dataset compliant with the Biogeme data format.
- Script to extract a random subsample from a data file.
- Script to organize data into histograms
- Sven Mueller's utilities

This utility reads a model description file written for the Bison
version of Biogeme (`.mod`

) and transforms it into a model
description file for the Python version of Biogem (`.py`

).

Syntax: `mod2py model`

transforms the
file `mymodel.mod`

into `mymodel.py`

.

Example, `.mod`

file:

[Choice] CHOICE [Beta] // Name Value LowerBound UpperBound status (0=variable, 1=fixed) ASC_CAR 0 -10 10 0 ASC_TRAIN 0 -10 10 0 ASC_SM 0 -10 10 1 B_TIME 0 -10 10 0 B_COST 0 -10 10 0 [Utilities] // Id Name Avail linear-in-parameter expression (beta1*x1 + beta2*x2 + ... ) 1 A1_TRAIN TRAIN_AV_SP ASC_TRAIN * one + B_TIME * TRAIN_TT_SCALED + B_COST * TRAIN_COST_SCALED 2 A2_SM SM_AV ASC_SM * one + B_TIME * SM_TT_SCALED + B_COST * SM_COST_SCALED 3 A3_Car CAR_AV_SP ASC_CAR * one + B_TIME * CAR_TT_SCALED + B_COST * CAR_CO_SCALED [Expressions] one = 1 CAR_AV_SP = CAR_AV * ( SP != 0 ) TRAIN_AV_SP = TRAIN_AV * ( SP != 0 ) SM_COST = SM_CO * ( GA == 0 ) TRAIN_COST = TRAIN_CO * ( GA == 0 ) TRAIN_TT_SCALED = TRAIN_TT / 100.0 TRAIN_COST_SCALED = TRAIN_COST / 100 SM_TT_SCALED = SM_TT / 100.0 SM_COST_SCALED = SM_COST / 100 CAR_TT_SCALED = CAR_TT / 100 CAR_CO_SCALED = CAR_CO / 100 [Exclude] (( PURPOSE != 1 ) * ( PURPOSE != 3 ) + ( CHOICE == 0 )) [Model] $MNLGenerated

`.py`

file:
# This file has automatically been generated. # Tue Apr 21 09:36:59 2015 # Michel Bierlaire, EPFL # biogeme 2.4 [Lun 19 jan 2015 18:40:42 CET] # Michel Bierlaire, EPFL ##################################################### # This file complies with the syntax of pythonbiogeme # In general, it may require to be edited by hand before being operational # It is meant to help users translating their models from the previous version of biogeme to the python version. ##################################################### from biogeme import * from headers import * from loglikelihood import * from statistics import * # [Choice] __chosenAlternative = CHOICE # [Weight] # NONE #[Beta] #Parameters to be estimated # Arguments: # 1 Name for report. Typically, the same as the variable # 2 Starting value # 3 Lower bound # 4 Upper bound # 5 0: estimate the parameter, 1: keep it fixed ASC_CAR = Beta('ASC_CAR',0,-10,10,0) ASC_SM = Beta('ASC_SM',0,-10,10,1) ASC_TRAIN = Beta('ASC_TRAIN',0,-10,10,0) B_COST = Beta('B_COST',0,-10,10,0) B_TIME = Beta('B_TIME',0,-10,10,0) # [Expressions] # Define here arithmetic expressions for name that are not directly # available from the data one = DefineVariable('one',1) CAR_AV_SP = DefineVariable('CAR_AV_SP', CAR_AV * ( SP != 0 )) TRAIN_AV_SP = DefineVariable('TRAIN_AV_SP', TRAIN_AV * ( SP != 0 )) SM_COST = DefineVariable('SM_COST', SM_CO * ( GA == 0 )) TRAIN_COST = DefineVariable('TRAIN_COST', TRAIN_CO * ( GA == 0 )) TRAIN_TT_SCALED = DefineVariable('TRAIN_TT_SCALED', TRAIN_TT / 100 ) TRAIN_COST_SCALED = DefineVariable('TRAIN_COST_SCALED', TRAIN_COST / 100 ) SM_TT_SCALED = DefineVariable('SM_TT_SCALED', SM_TT / 100 ) SM_COST_SCALED = DefineVariable('SM_COST_SCALED', SM_COST / 100 ) CAR_TT_SCALED = DefineVariable('CAR_TT_SCALED', CAR_TT / 100 ) CAR_CO_SCALED = DefineVariable('CAR_CO_SCALED', CAR_CO / 100 ) #[Group] #[Utilities] __A1_TRAIN = ASC_TRAIN * one + B_TIME * TRAIN_TT_SCALED + B_COST * TRAIN_COST_SCALED __A2_SM = ASC_SM * one + B_TIME * SM_TT_SCALED + B_COST * SM_COST_SCALED __A3_Car = ASC_CAR * one + B_TIME * CAR_TT_SCALED + B_COST * CAR_CO_SCALED __V = {1: __A1_TRAIN,2: __A2_SM,3: __A3_Car} __av = {1: TRAIN_AV_SP,2: SM_AV,3: CAR_AV_SP} #[Draws] BIOGEME_OBJECT.PARAMETERS['NbrOfDraws'] = "150" #[Exclude] BIOGEME_OBJECT.EXCLUDE = ( ( PURPOSE != 1 ) * ( PURPOSE != 3 ) ) + ( CHOICE == 0 ) #[Model] # MNL // Logit Model # The choice model is a logit, with availability conditions prob = bioLogit(__V,__av,__chosenAlternative) __l = log(prob) # Defines an itertor on the data rowIterator('obsIter') # Define the likelihood function for the estimation BIOGEME_OBJECT.ESTIMATE = Sum(__l,'obsIter')

It is an Excel file where the user can apply the likelihood ratio test. The final log likelihood and the number of estimated parameters for both the restricted and the unrestricted models must be provided.

It is a Matlab code to compute the correlation structure of a cross-nested logit model.

It is an Excel sheet that allows to compute the variance of the difference and the ratio of two estimators. A typical application in the context of discrete choice is the computation of the standard error for parameters like value-of-time. The approximation computed here come from the Taylor series, where terms higher than second order are ignored. Source: MVA et al. (1998) "Value of Trave Time Savings".

If biogeme has been installed from source, this script has been installed on your system. If not, you need to download the python script and run it from Python 3.0.

The syntax is: ```
biocheckdata
mydata.dat
```

The script checks if a datafile
is complying with the requirements of biogeme. In
particular, it checks if the number of elements in
each row matches the number of headers in the first
row. It also detects if some entries are not
numeric.

`/usr/local/bin/biocheckdata`

.
If biogeme has been installed from source, this script has been installed on your system. If not, you need to download the python script and run it from Python 3.0.

The syntax is: `biopreparedata mydata.csv`

The script prepares a CSV data file in a format requested by biogeme. Each column containing strings is coded with numbers. The following conventions are adopted:

- Strings in the CSV file are delimited with double quotes.
"

- Each blank in the name of a header is replaced by a underscore.
_

- Entries in the CSV file are separated by a comma [
,

]. Some versions of Excel export CSV files with semicolons [;

]. This script will guess what version of CSV it is by counting the number of commas and the number of semicolons, and consider the one with the highest number of occurences as a delimiter. - If an entry of the first row is numeric, the corresponding column is supposed to contain only numerical values. If a non-numerical value is detected in another row, is it replaced by
`99999`

.

`/usr/local/share/biogeme/biopreparedata.py`

in order to change these conventions.
If an entry in the first column is a string, the script associates each string in the corresponding column with a numeric value.

Two files are generated:

`biogeme_mydata.csv`

is the data file complying with the biogeme requirements.`legend_mydata.csv`

describes the codings that have been used for non numeric data.

Suppose that the `mydata.csv`

file contains

Id, The name, The rank 1, "Me", 2 2, "You", 3 3, "Him", 3Then, the generated file

`biogeme_mydata.csv`

will contain
Id ___The_name _The_rank 1 2 2 2 0 3 3 1 3and the generated file

`legend_mydata.csv`

will contain
+++++++++++++++++++++++++ Legend for column ___The_name +++++++++++++++++++++++++ 0 : "You" 1 : "Him" 2 : "Me"

If biogeme has been installed from source, this script has been installed on your system. If not, you need to download the python script and run it from Python 3.0.

The syntax is `biosampledata dataFile samplingRate`

`dataFile`

is name of a data file in Biogeme format,
and
`samplingRate`

is a number between 1 and 100 representing (in percentage)
the probability for each row to be inserted in the sample.
Two output files are generated:
- A file with the sampled data (with the same headers as the original file)
- A file with the data that have not be sampled (also with the same headers as the original file)

If biogeme has been installed from source, this script has
been installed on your system. If not, you need to download
the python
scripts `generateHistogram.py`

and `histogram.py`

and
run `generateHistogram.py`

from Python 3.0.

Tool to organize a list of raw numbers into categories to plot an
histogram using bins of a given size. The raw data should contain only a list of numbers. The
syntax is
`histogram filename binSize`

.

For example, if the Raw data file is

0.2748861875 1.2194215178 -0.1088626369 1.887765541 0.143842688 0.5121648584 0.9323810467 0.1969901739 -0.2501622963 0.6349579371 -0.6544964817 0.1684235135 1.0532380188 0.9794024028 0.8001565071 -0.7680558349 -2.5749162274 0.8619355768 -0.0267481139 -0.9200574846the command is

`histogram test.dat 1.0`

, and the following files are
generated: `_hist_test.dat`

Value Frequency -3.0 1 -1.0 6 0.0 10 1.0 3meaning that there is 1 value between -3 and -1, 6 values between -1 and 0, 10 values between 0 and 1, and 3 values larger than 1. A Gnuplot file

`_hist.gp`

is also generated
set style data histogram set style histogram cluster gap 0 set style fill solid 1.0 plot '_hist_test.dat' using 1:2 ti col smooth frequency with boxes

A weighted version of the tool exists. Instead of couting the
number of value in each bin, it adds the weights of these
values. The raw data file must contain two values on each row: the
value, and its weight. The
syntax is
`weightedhistogram filename binSize`

.

For example, if the Raw data file is

0.2748861875 0.9 1.2194215178 0.2 -0.1088626369 1.0 1.887765541 1.0 0.143842688 0.6 0.5121648584 0.1 0.9323810467 0.1 0.1969901739 0.4 -0.2501622963 0.3 0.6349579371 0.2 -0.6544964817 0.4 0.1684235135 0.1 1.0532380188 0.8 0.9794024028 0.9 0.8001565071 0.3 -0.7680558349 0.8 -2.5749162274 0.4 0.8619355768 0.1 -0.0267481139 0.3 -0.9200574846 1.0the command is

`weightedhistogram test.dat 1.0`

, and the following files are
generated: `_hist_test.dat`

Value Frequency -3.0 0.4 -1.0 3.8 0.0 3.7 1.0 2.0meaning that the total weight of values between -3 and -1 is 0.4, the total weight for values between -1 and 0 is 3.8, the total weighr for values between 0 and 1 is 3.7, and the total weight for values larger than 1 is 2. A Gnuplot file

`_hist.gp`

is also generated
set style data histogram set style histogram cluster gap 0 set style fill solid 1.0 plot '_hist_test.dat' using 2:xtic(1) t '_hist_test.dat'

xml-file for code highlighting of Biogeme .mod-files in notepad, and MS Excel sheet for Horowitz-Test (non-nested hypothesis).