16.3 EFA con lavaan | Appunti di Costruzione e validazione di strumenti di misura dell’efficacia dell’intervento psicologico in neuropsicologia

16.3 EFA con `lavaan`

Una funzionalità sperimentale di lavaan (ancora non ufficiale) è quella che consente di svolgere l’analisi fattoriale esplorativa con la funzione efa(). Consideriamo nuovamente i dati di Brown (2015), ovvero otto misure di personalità raccolte su un campione di 250 pazienti che hanno concluso un programma di psicoterapia.

Definiamo un modello ad un solo fattore comune.

# 1-factor model
f1 <- '
efa("efa")*f1 =~ N1 + N2 + N3 + N4 + E1 + E2 + E3 + E4
'

Definiamo un modello con due fattori comuni.

# 2-factor model
f2 <- '
efa("efa")*f1 +
efa("efa")*f2 =~ N1 + N2 + N3 + N4 + E1 + E2 + E3 + E4
'

Adattiamo ai dati il modello ad un fattore comune.

efa_f1 <-
  cfa(
    model = f1,
    sample.cov = psychot_cor_mat,
    sample.nobs = 250,
    rotation = "oblimin"
  )

Esaminiamo la soluzione ottenuta.

summary(
  efa_f1,
  fit.measures = TRUE,
  standardized = TRUE,
  rsquare = TRUE
)
#> lavaan 0.6.15 ended normally after 2 iterations
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                        16
#> 
#>   Rotation method                      OBLIMIN OBLIQUE
#>   Oblimin gamma                                      0
#>   Rotation algorithm (rstarts)                GPA (30)
#>   Standardized metric                             TRUE
#>   Row weights                                     None
#> 
#>   Number of observations                           250
#> 
#> Model Test User Model:
#>                                                       
#>   Test statistic                               375.327
#>   Degrees of freedom                                20
#>   P-value (Chi-square)                           0.000
#> 
#> Model Test Baseline Model:
#> 
#>   Test statistic                              1253.791
#>   Degrees of freedom                                28
#>   P-value                                        0.000
#> 
#> User Model versus Baseline Model:
#> 
#>   Comparative Fit Index (CFI)                    0.710
#>   Tucker-Lewis Index (TLI)                       0.594
#> 
#> Loglikelihood and Information Criteria:
#> 
#>   Loglikelihood user model (H0)              -2394.637
#>   Loglikelihood unrestricted model (H1)      -2206.974
#>                                                       
#>   Akaike (AIC)                                4821.275
#>   Bayesian (BIC)                              4877.618
#>   Sample-size adjusted Bayesian (SABIC)       4826.897
#> 
#> Root Mean Square Error of Approximation:
#> 
#>   RMSEA                                          0.267
#>   90 Percent confidence interval - lower         0.243
#>   90 Percent confidence interval - upper         0.291
#>   P-value H_0: RMSEA <= 0.050                    0.000
#>   P-value H_0: RMSEA >= 0.080                    1.000
#> 
#> Standardized Root Mean Square Residual:
#> 
#>   SRMR                                           0.187
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Latent Variables:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>   f1 =~ efa                                                             
#>     N1                0.879    0.051   17.333    0.000    0.879    0.880
#>     N2                0.841    0.052   16.154    0.000    0.841    0.842
#>     N3                0.841    0.052   16.175    0.000    0.841    0.843
#>     N4                0.870    0.051   17.065    0.000    0.870    0.872
#>     E1               -0.438    0.062   -7.041    0.000   -0.438   -0.439
#>     E2               -0.398    0.063   -6.327    0.000   -0.398   -0.398
#>     E3               -0.398    0.063   -6.342    0.000   -0.398   -0.399
#>     E4               -0.364    0.063   -5.746    0.000   -0.364   -0.364
#> 
#> Variances:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>    .N1                0.224    0.028    7.915    0.000    0.224    0.225
#>    .N2                0.289    0.033    8.880    0.000    0.289    0.290
#>    .N3                0.288    0.032    8.866    0.000    0.288    0.289
#>    .N4                0.239    0.029    8.174    0.000    0.239    0.240
#>    .E1                0.804    0.073   10.963    0.000    0.804    0.807
#>    .E2                0.838    0.076   11.008    0.000    0.838    0.841
#>    .E3                0.837    0.076   11.007    0.000    0.837    0.841
#>    .E4                0.864    0.078   11.041    0.000    0.864    0.867
#>     f1                1.000                               1.000    1.000
#> 
#> R-Square:
#>                    Estimate
#>     N1                0.775
#>     N2                0.710
#>     N3                0.711
#>     N4                0.760
#>     E1                0.193
#>     E2                0.159
#>     E3                0.159
#>     E4                0.133

Adattiamo ai dati il modello a due fattori comuni.

efa_f2 <-
  cfa(
    model = f2,
    sample.cov = psychot_cor_mat,
    sample.nobs = 250,
    rotation = "oblimin"
  )

Esaminiamo la soluzione ottenuta.

summary(
  efa_f2,
  fit.measures = TRUE,
  standardized = TRUE,
  rsquare = TRUE
)
#> lavaan 0.6.15 ended normally after 1 iteration
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                        23
#> 
#>   Rotation method                      OBLIMIN OBLIQUE
#>   Oblimin gamma                                      0
#>   Rotation algorithm (rstarts)                GPA (30)
#>   Standardized metric                             TRUE
#>   Row weights                                     None
#> 
#>   Number of observations                           250
#> 
#> Model Test User Model:
#>                                                       
#>   Test statistic                                 9.811
#>   Degrees of freedom                                13
#>   P-value (Chi-square)                           0.709
#> 
#> Model Test Baseline Model:
#> 
#>   Test statistic                              1253.791
#>   Degrees of freedom                                28
#>   P-value                                        0.000
#> 
#> User Model versus Baseline Model:
#> 
#>   Comparative Fit Index (CFI)                    1.000
#>   Tucker-Lewis Index (TLI)                       1.006
#> 
#> Loglikelihood and Information Criteria:
#> 
#>   Loglikelihood user model (H0)              -2211.879
#>   Loglikelihood unrestricted model (H1)      -2206.974
#>                                                       
#>   Akaike (AIC)                                4469.758
#>   Bayesian (BIC)                              4550.752
#>   Sample-size adjusted Bayesian (SABIC)       4477.840
#> 
#> Root Mean Square Error of Approximation:
#> 
#>   RMSEA                                          0.000
#>   90 Percent confidence interval - lower         0.000
#>   90 Percent confidence interval - upper         0.048
#>   P-value H_0: RMSEA <= 0.050                    0.957
#>   P-value H_0: RMSEA >= 0.080                    0.001
#> 
#> Standardized Root Mean Square Residual:
#> 
#>   SRMR                                           0.010
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Latent Variables:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>   f1 =~ efa                                                             
#>     N1                0.874    0.053   16.592    0.000    0.874    0.876
#>     N2                0.851    0.055   15.551    0.000    0.851    0.853
#>     N3                0.826    0.054   15.179    0.000    0.826    0.828
#>     N4                0.896    0.053   16.802    0.000    0.896    0.898
#>     E1               -0.046    0.040   -1.138    0.255   -0.046   -0.046
#>     E2                0.035    0.034    1.030    0.303    0.035    0.035
#>     E3                0.000    0.040    0.010    0.992    0.000    0.000
#>     E4               -0.006    0.049   -0.131    0.896   -0.006   -0.006
#>   f2 =~ efa                                                             
#>     N1               -0.017    0.032   -0.539    0.590   -0.017   -0.017
#>     N2                0.011    0.035    0.322    0.748    0.011    0.011
#>     N3               -0.035    0.036   -0.949    0.343   -0.035   -0.035
#>     N4                0.031    0.031    0.994    0.320    0.031    0.031
#>     E1                0.776    0.059   13.125    0.000    0.776    0.778
#>     E2                0.854    0.058   14.677    0.000    0.854    0.855
#>     E3                0.785    0.060   13.106    0.000    0.785    0.787
#>     E4                0.695    0.063   10.955    0.000    0.695    0.697
#> 
#> Covariances:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>   f1 ~~                                                                 
#>     f2               -0.432    0.059   -7.345    0.000   -0.432   -0.432
#> 
#> Variances:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>    .N1                0.218    0.028    7.790    0.000    0.218    0.219
#>    .N2                0.279    0.032    8.693    0.000    0.279    0.280
#>    .N3                0.287    0.032    8.907    0.000    0.287    0.289
#>    .N4                0.216    0.029    7.578    0.000    0.216    0.217
#>    .E1                0.361    0.044    8.226    0.000    0.361    0.362
#>    .E2                0.292    0.043    6.787    0.000    0.292    0.293
#>    .E3                0.379    0.046    8.315    0.000    0.379    0.381
#>    .E4                0.509    0.053    9.554    0.000    0.509    0.511
#>     f1                1.000                               1.000    1.000
#>     f2                1.000                               1.000    1.000
#> 
#> R-Square:
#>                    Estimate
#>     N1                0.781
#>     N2                0.720
#>     N3                0.711
#>     N4                0.783
#>     E1                0.638
#>     E2                0.707
#>     E3                0.619
#>     E4                0.489

Anche se abbiamo introdotto finora soltanto la misura di bontà di adattamento del chi-quadrato, aggiungiamo qui il calcolo di altre misure di bontà di adattamento che discuteremo in seguito.

# define the fit measures
fit_measures_robust <- c(
  "chisq", "df", "pvalue",
  "cfi", "rmsea", "srmr"
)

Confrontiamo le misure di bontà di adattamento del modello che ipotizza un solo fattore comune e il modello che ipotizza la presenza di due fattori comuni.

# collect them for each model
rbind(
  fitmeasures(efa_f1, fit_measures_robust),
  fitmeasures(efa_f2, fit_measures_robust)
) %>%
  # wrangle
  data.frame() %>%
  mutate(
    chisq = round(chisq, digits = 0),
    df = as.integer(df),
    pvalue = ifelse(pvalue == 0, "< .001", pvalue)
  ) %>%
  mutate_at(vars(cfi:srmr), ~ round(., digits = 3))
#>   chisq df            pvalue  cfi rmsea  srmr
#> 1   375 20            < .001 0.71 0.267 0.187
#> 2    10 13 0.709310449320062 1.00 0.000 0.010

L’evidenza empirica supporta la superiorità del modello a due fattori rispetto a quello ad un solo fattore comune. In particolare, l’analisi fattoriale esplorativa svolta mediante la funzione efa() evidenzia la capacità del modello a due fattori di fornire una descrizione adeguata della struttura dei dati e di distinguere in modo sensato tra i due fattori ipotizzati.

Esercizio 16.5 Si utilizzino i dati dass21.txt che corrispondono alla somministrazione del test DASS-21 a 334 partecipanti. Lo schema di codifica si può trovare seguendo questo link. Si adatti ai dati un modello a tre fattori usando l’analisi fattoriale esplorativa con la funzione lavaan::efa(). Usando le saturazioni fattoriali e la matrice di inter-correlazioni fattoriali, si trovi la matrice di correlazioni riprodotta dal modello. Senza usare l’albebra matriciale, si trovi la correlazione predetta tra gli indicatori DASS-1 e DASS-2.

References

Brown, Timothy A. 2015. Confirmatory Factor Analysis for Applied Research. Guilford publications.