Note
Go to the end to download the full example code
Triangular mixture with panel data
Example of a mixture of logit models, using Monte-Carlo integration. The mixing distribution is user-defined (triangular, here). The datafile is organized as panel data.
- author:
Michel Bierlaire, EPFL
- date:
Tue Dec 6 18:30:44 2022
import numpy as np
import biogeme.biogeme as bio
from biogeme import models
import biogeme.biogeme_logging as blog
from biogeme.expressions import (
Beta,
bioDraws,
MonteCarlo,
PanelLikelihoodTrajectory,
log,
)
See the data processing script: Panel data preparation for Swissmetro.
from swissmetro_panel import (
database,
CHOICE,
CAR_AV_SP,
TRAIN_AV_SP,
TRAIN_TT_SCALED,
TRAIN_COST_SCALED,
SM_TT_SCALED,
SM_COST_SCALED,
CAR_TT_SCALED,
CAR_CO_SCALED,
SM_AV,
)
logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Example b26triangular_panel_mixture.py')
Example b26triangular_panel_mixture.py
Function generating the draws.
def the_triangular_generator(sample_size: int, number_of_draws: int) -> np.ndarray:
"""
Provide my own random number generator to the database.
See the numpy.random documentation to obtain a list of other distributions.
"""
return np.random.triangular(-1, 0, 1, (sample_size, number_of_draws))
Associate the function with a name.
myRandomNumberGenerators = {
'TRIANGULAR': (
the_triangular_generator,
'Draws from a triangular distribution',
)
}
Submit the generator to the database.
database.setRandomNumberGenerators(myRandomNumberGenerators)
Parameters to be estimated.
B_COST = Beta('B_COST', 0, None, None, 0)
Define a random parameter, normally distributed across individuals, designed to be used for Monte-Carlo simulation.
Mean of the distribution.
B_TIME = Beta('B_TIME', 0, None, None, 0)
Scale of the distribution. It is advised not to use 0 as starting value for the following parameter.
B_TIME_S = Beta('B_TIME_S', 1, None, None, 0)
B_TIME_RND = B_TIME + B_TIME_S * bioDraws('B_TIME_RND', 'TRIANGULAR')
We do the same for the constants, to address serial correlation.
ASC_CAR = Beta('ASC_CAR', 0, None, None, 0)
ASC_CAR_S = Beta('ASC_CAR_S', 1, None, None, 0)
ASC_CAR_RND = ASC_CAR + ASC_CAR_S * bioDraws('ASC_CAR_RND', 'TRIANGULAR')
ASC_TRAIN = Beta('ASC_TRAIN', 0, None, None, 0)
ASC_TRAIN_S = Beta('ASC_TRAIN_S', 1, None, None, 0)
ASC_TRAIN_RND = ASC_TRAIN + ASC_TRAIN_S * bioDraws('ASC_TRAIN_RND', 'TRIANGULAR')
ASC_SM = Beta('ASC_SM', 0, None, None, 1)
ASC_SM_S = Beta('ASC_SM_S', 1, None, None, 0)
ASC_SM_RND = ASC_SM + ASC_SM_S * bioDraws('ASC_SM_RND', 'TRIANGULAR')
Definition of the utility functions.
V1 = ASC_TRAIN_RND + B_TIME_RND * TRAIN_TT_SCALED + B_COST * TRAIN_COST_SCALED
V2 = ASC_SM_RND + B_TIME_RND * SM_TT_SCALED + B_COST * SM_COST_SCALED
V3 = ASC_CAR_RND + B_TIME_RND * CAR_TT_SCALED + B_COST * CAR_CO_SCALED
Associate utility functions with the numbering of alternatives.
V = {1: V1, 2: V2, 3: V3}
Associate the availability conditions with the alternatives.
av = {1: TRAIN_AV_SP, 2: SM_AV, 3: CAR_AV_SP}
Conditional to the random parameters, the likelihood of one observation is given by the logit model (called the kernel).
obsprob = models.logit(V, av, CHOICE)
Conditional on the random parameters, the likelihood of all observations for one individual (the trajectory) is the product of the likelihood of each observation.
condprobIndiv = PanelLikelihoodTrajectory(obsprob)
We integrate over the random parameters using Monte-Carlo
logprob = log(MonteCarlo(condprobIndiv))
Create the Biogeme object. As the objective is to illustrate the syntax, we calculate the Monte-Carlo approximation with a small number of draws. To achieve that, we provide a parameter file different from the default one.
the_biogeme = bio.BIOGEME(database, logprob, parameter_file='few_draws.toml')
the_biogeme.modelName = 'b26triangular_panel_mixture'
File few_draws.toml has been parsed.
Estimate the parameters.
results = the_biogeme.estimate()
*** Initial values of the parameters are obtained from the file __b26triangular_panel_mixture.iter
Cannot read file __b26triangular_panel_mixture.iter. Statement is ignored.
Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds]
** Optimization: Newton with trust region for simple bounds
Iter. ASC_CAR ASC_CAR_S ASC_SM_S ASC_TRAIN ASC_TRAIN_S B_COST B_TIME B_TIME_S Function Relgrad Radius Rho
0 -0.11 1.4 1.4 -0.55 1.1 -0.79 -1 1.1 4.6e+03 0.047 10 1.2 ++
1 -0.11 1.4 1.4 -0.55 1.1 -0.79 -1 1.1 4.6e+03 0.047 5 -0.15 -
2 0.65 3.6 -0.046 0.22 6.1 0.56 -2.3 4 4.2e+03 0.056 5 0.42 +
3 0.65 3.6 -0.046 0.22 6.1 0.56 -2.3 4 4.2e+03 0.056 2.5 -0.79 -
4 0.65 3.6 -0.046 0.22 6.1 0.56 -2.3 4 4.2e+03 0.056 1.2 -0.13 -
5 0.41 4.6 -0.046 -1 6 -0.69 -3.6 3.9 3.8e+03 0.059 1.2 0.71 +
6 0.098 4.8 0.06 -0.99 6.2 -1.9 -3.4 4.5 3.7e+03 0.027 1.2 0.84 +
7 0.098 4.8 0.06 -0.99 6.2 -1.9 -3.4 4.5 3.7e+03 0.027 0.62 -0.2 -
8 0.098 4.8 0.06 -0.99 6.2 -1.9 -3.4 4.5 3.7e+03 0.027 0.31 -0.2 -
9 0.098 4.8 0.06 -0.99 6.2 -1.9 -3.4 4.5 3.7e+03 0.027 0.16 -0.097 -
10 0.098 4.8 0.06 -0.99 6.2 -1.9 -3.4 4.5 3.7e+03 0.027 0.078 0.093 -
11 0.02 4.8 -0.018 -1 6.2 -1.9 -3.5 4.5 3.7e+03 0.026 0.078 0.45 +
12 0.098 4.9 -0.096 -0.94 6.3 -1.8 -3.5 4.5 3.7e+03 0.023 0.78 1 ++
13 0.14 5.7 -0.16 -0.94 6.5 -1.9 -3.8 4.9 3.7e+03 0.021 7.8 1 ++
14 0.14 5.7 -0.16 -0.94 6.5 -1.9 -3.8 4.9 3.7e+03 0.021 3.9 0.054 -
15 0.37 7.9 3.7 0.089 4.9 -2.6 -5.9 8 3.7e+03 0.039 3.9 0.69 +
16 0.37 7.9 3.7 0.089 4.9 -2.6 -5.9 8 3.7e+03 0.039 2 -0.15 -
17 0.37 7.9 3.7 0.089 4.9 -2.6 -5.9 8 3.7e+03 0.039 0.98 -0.15 -
18 0.37 7.9 3.7 0.089 4.9 -2.6 -5.9 8 3.7e+03 0.039 0.49 -0.15 -
19 0.37 7.9 3.7 0.089 4.9 -2.6 -5.9 8 3.7e+03 0.039 0.24 -0.029 -
20 0.37 7.9 3.7 0.089 4.9 -2.6 -5.9 8 3.7e+03 0.039 0.12 0.018 -
21 0.48 8 3.7 -0.029 4.9 -2.7 -6.1 7.9 3.7e+03 0.012 0.12 0.37 +
22 0.58 8.1 3.8 0.017 4.8 -2.7 -6 7.8 3.7e+03 0.012 1.2 0.99 ++
23 0.47 9.3 4.2 -0.016 3.9 -2.9 -5.5 7.3 3.7e+03 0.0087 1.2 0.74 +
24 0.9 9.1 5.3 0.4 2.7 -2.7 -6.2 8.1 3.7e+03 0.0068 1.2 0.56 +
25 0.46 9.5 4.7 0.34 1.5 -2.9 -5.9 7.7 3.6e+03 0.0027 1.2 0.83 +
26 0.65 9.4 5.1 0.42 0.71 -3 -6.1 8 3.6e+03 0.00054 12 0.95 ++
27 0.65 9.5 5.1 0.42 0.69 -3 -6.1 8.1 3.6e+03 8.5e-06 1.2e+02 1 ++
28 0.65 9.5 5.1 0.42 0.69 -3 -6.1 8.1 3.6e+03 2.6e-09 1.2e+02 1 ++
Results saved in file b26triangular_panel_mixture.html
Results saved in file b26triangular_panel_mixture.pickle
print(results.short_summary())
Results for model b26triangular_panel_mixture
Nbr of parameters: 8
Sample size: 752
Observations: 6768
Excluded data: 3960
Final log likelihood: -3645.824
Akaike Information Criterion: 7307.648
Bayesian Information Criterion: 7344.63
pandas_results = results.getEstimatedParameters()
pandas_results
Total running time of the script: (0 minutes 34.863 seconds)