Factor analysis

Preliminary analysis of the indicators using factor analysis

author:

Michel Bierlaire, EPFL

date:

Thu Apr 13 16:39:54 2023

import sys
import pandas as pd
import numpy as np
from IPython.core.display_functions import display
from biogeme.data.optima import data_file_path

The following package can be installed using

pip install factor_analyzer

See https://github.com/EducationalTestingService/factor_analyzer

try:
    from factor_analyzer import FactorAnalyzer
except ModuleNotFoundError:
    print('Use "pip install factor_analyzer" to install a requested package')
    sys.exit()

We first extract the columns containing the indicators

variables = [
    'Envir01',
    'Envir02',
    'Envir03',
    'Envir04',
    'Envir05',
    'Envir06',
    'Mobil01',
    'Mobil02',
    'Mobil03',
    'Mobil04',
    'Mobil05',
    'Mobil06',
    'Mobil07',
    'Mobil08',
    'Mobil09',
    'Mobil10',
    'Mobil11',
    'Mobil12',
    'Mobil13',
    'Mobil14',
    'Mobil15',
    'Mobil16',
    'Mobil17',
    'Mobil18',
    'Mobil19',
    'Mobil20',
    'Mobil21',
    'Mobil22',
    'Mobil23',
    'Mobil24',
    'Mobil25',
    'Mobil26',
    'Mobil27',
    'ResidCh01',
    'ResidCh02',
    'ResidCh03',
    'ResidCh04',
    'ResidCh05',
    'ResidCh06',
    'ResidCh07',
    'LifSty01',
    'LifSty02',
    'LifSty03',
    'LifSty04',
    'LifSty05',
    'LifSty06',
    'LifSty07',
    'LifSty08',
    'LifSty09',
    'LifSty10',
    'LifSty11',
    'LifSty12',
    'LifSty13',
    'LifSty14',
]

indicators = pd.read_csv(data_file_path, sep='\t', usecols=variables)

Negative values are missing values.

indicators[indicators <= 0] = np.nan
indicators = indicators.dropna(axis=0, how='any')

We perform the factor analysis

fa = FactorAnalyzer(rotation='varimax')
fa.fit(indicators)
FactorAnalyzer(rotation='varimax', rotation_kwargs={})
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


We obtain the factor loadings and label them

labeled_results = pd.DataFrame(fa.loadings_)
labeled_results.index = variables
display(labeled_results)
                  0         1         2
Envir01   -0.564729 -0.002045  0.111001
Envir02   -0.407355  0.047089  0.105001
Envir03    0.413744  0.135357  0.089509
Envir04    0.339893  0.164578  0.027584
Envir05   -0.293116  0.082449  0.117659
Envir06   -0.397525 -0.000947  0.142948
Mobil01    0.070801  0.094246  0.348101
Mobil02    0.161354  0.221123  0.350501
Mobil03   -0.034115  0.120752  0.377315
Mobil04   -0.158243 -0.018616  0.160603
Mobil05   -0.174380  0.146278  0.352953
Mobil06    0.274974  0.163948  0.116831
Mobil07    0.230149 -0.018872  0.140380
Mobil08    0.380445  0.242195  0.165503
Mobil09   -0.286631 -0.006629  0.233548
Mobil10    0.341252  0.126686  0.191877
Mobil11    0.482423  0.061773  0.101396
Mobil12    0.291986  0.345614 -0.043070
Mobil13    0.293047 -0.005639  0.040887
Mobil14    0.476652  0.171167  0.070976
Mobil15   -0.165495 -0.042412  0.151499
Mobil16    0.459362  0.066581  0.077292
Mobil17    0.431262  0.098147  0.208418
Mobil18    0.333607  0.214557  0.060210
Mobil19   -0.235192 -0.035251  0.368719
Mobil20    0.025723 -0.060406  0.412330
Mobil21    0.147848  0.092530  0.295888
Mobil22    0.178539  0.182377  0.078155
Mobil23   -0.070000 -0.026870  0.243105
Mobil24   -0.310479  0.065048  0.299017
Mobil25    0.193832  0.355736  0.188833
Mobil26    0.045875  0.082315  0.417643
Mobil27    0.083807  0.096861  0.368456
ResidCh01 -0.021483  0.565421 -0.028667
ResidCh02 -0.345073  0.325007  0.170992
ResidCh03  0.151949  0.141772  0.103544
ResidCh04  0.026371  0.413732  0.115635
ResidCh05 -0.088138  0.606071 -0.058620
ResidCh06 -0.024728  0.441251 -0.100628
ResidCh07  0.117831 -0.375275  0.263202
LifSty01  -0.111344  0.116332  0.148828
LifSty02   0.182615  0.224647  0.108381
LifSty03  -0.015983  0.033632  0.223094
LifSty04   0.061664  0.064185  0.161211
LifSty05   0.017626  0.193170  0.276527
LifSty06   0.046335  0.310471  0.258672
LifSty07   0.157569  0.447161  0.108234
LifSty08  -0.066643 -0.151222  0.250315
LifSty09   0.105715  0.302478  0.283061
LifSty10   0.122540  0.403303  0.160987
LifSty11   0.084352  0.244044  0.185830
LifSty12  -0.231626  0.178191  0.071661
LifSty13  -0.188736  0.014129  0.063625
LifSty14   0.035538 -0.002001  0.181145

When we print the results, we keep only loadings that are 0.4 or higher, in absolute value.

labeled_results = labeled_results.astype('object')
labeled_results[(labeled_results <= 0.4) & (labeled_results >= -0.4)] = ''

Results.

display(labeled_results)
                  0         1         2
Envir01   -0.564729
Envir02   -0.407355
Envir03    0.413744
Envir04
Envir05
Envir06
Mobil01
Mobil02
Mobil03
Mobil04
Mobil05
Mobil06
Mobil07
Mobil08
Mobil09
Mobil10
Mobil11    0.482423
Mobil12
Mobil13
Mobil14    0.476652
Mobil15
Mobil16    0.459362
Mobil17    0.431262
Mobil18
Mobil19
Mobil20                         0.41233
Mobil21
Mobil22
Mobil23
Mobil24
Mobil25
Mobil26                        0.417643
Mobil27
ResidCh01            0.565421
ResidCh02
ResidCh03
ResidCh04            0.413732
ResidCh05            0.606071
ResidCh06            0.441251
ResidCh07
LifSty01
LifSty02
LifSty03
LifSty04
LifSty05
LifSty06
LifSty07             0.447161
LifSty08
LifSty09
LifSty10             0.403303
LifSty11
LifSty12
LifSty13
LifSty14

Total running time of the script: (0 minutes 1.016 seconds)

Gallery generated by Sphinx-Gallery