Capitolo 34 Covariate indipendenti dal tempo

Partendo dai modelli di crescita lineare descritti nei capitoli precedenti, ci chiediamo se le differenze individuali nelle traiettorie di cambiamento siano associate ad altre variabili. Questo capitolo introduce modelli che esaminano se la variabilità nell’intercetta e nella pendenza del modello di crescita lineare può essere spiegata da covariate invarianti nel tempo. Queste sono variabili a livello di persona che non cambiano nel tempo, come il genere, le condizioni sperimentali e lo stato socio-economico. Nel contesto dei modelli a crescita latente, tali variabili vengono considerate come variabili indipendenti in un modello di regressione multipla in cui l’intercetta e la pendenza del modello di crescita lineare sono le variabili dipendenti. Le covariate invarianti nel tempo possono essere continue, ordinali o categoriche.

Studiando come le differenze individuali nell’intercetta e nella pendenza sono correlate ad altre variabili a livello della persona, possiamo comprendere le possibili ragioni per cui gli individui cambiano in modi diversi. Tuttavia, i risultati di questi modelli non forniscono ulteriori inferenze sulla causalità rispetto ai tipici modelli di regressione. Gli utenti dovrebbero essere cauti quando discutono tali effetti e tenere presente se la covariata invariante nel tempo è stata assegnata in maniera casuale nel contesto di un protocollo sperimentale, oppure se i dati provengono da ricerche osservazionali.

Per questo esempio considereremo i dati di prestazione matematica dal data set NLSY-CYA Long Data (si veda Grimm, Ram, and Estabrook 2016). Iniziamo a leggere i dati.

# set filepath for data file
filepath <- "https://raw.githubusercontent.com/LRI-2/Data/main/GrowthModeling/nlsy_math_long_R.dat"
# read in the text data file using the url() function
dat <- read.table(
  file = url(filepath),
  na.strings = "."
) # indicates the missing data designator
# copy data with new name
nlsy_math_long <- dat

# Add names the columns of the data set
names(nlsy_math_long) <- c(
  "id", "female", "lb_wght",
  "anti_k1", "math", "grade",
  "occ", "age", "men",
  "spring", "anti"
)

# view the first few observations in the data set
head(nlsy_math_long, 10)
#>      id female lb_wght anti_k1 math grade occ age men spring anti
#> 1   201      1       0       0   38     3   2 111   0      1    0
#> 2   201      1       0       0   55     5   3 135   1      1    0
#> 3   303      1       0       1   26     2   2 121   0      1    2
#> 4   303      1       0       1   33     5   3 145   0      1    2
#> 5  2702      0       0       0   56     2   2 100  NA      1    0
#> 6  2702      0       0       0   58     4   3 125  NA      1    2
#> 7  2702      0       0       0   80     8   4 173  NA      1    2
#> 8  4303      1       0       0   41     3   2 115   0      0    1
#> 9  4303      1       0       0   58     4   3 135   0      1    2
#> 10 5002      0       0       4   46     4   2 117  NA      1    4
# intraindividual change trajetories
ggplot(
  data = nlsy_math_long, # data set
  aes(x = grade, y = math, group = id)
) + # setting variables
  geom_point(size = .5) + # adding points to plot
  geom_line(aes(linetype = factor(lb_wght), alpha = anti_k1)) + # adding lines to plot
  theme_bw() + # changing style/background
  # setting the x-axis with breaks and labels
  scale_x_continuous(
    limits = c(2, 8),
    breaks = c(2, 3, 4, 5, 6, 7, 8),
    name = "Grade at Testing"
  ) +
  # setting the y-axis with limits breaks and labels
  scale_y_continuous(
    limits = c(10, 90),
    breaks = c(10, 30, 50, 70, 90),
    name = "PIAT Mathematics"
  )

Per semplicità, leggiamo gli stessi dati in formato wide da un file.

# set filepath for data file
filepath <- "https://raw.githubusercontent.com/LRI-2/Data/main/GrowthModeling/nlsy_math_wide_R.dat"
# read in the text data file using the url() function
dat <- read.table(
  file = url(filepath),
  na.strings = "."
) # indicates the missing data designator
# copy data with new name
nlsy_math_wide <- dat

# Give the variable names
names(nlsy_math_wide) <- c(
  "id", "female", "lb_wght", "anti_k1",
  "math2", "math3", "math4", "math5", "math6", "math7", "math8",
  "age2", "age3", "age4", "age5", "age6", "age7", "age8",
  "men2", "men3", "men4", "men5", "men6", "men7", "men8",
  "spring2", "spring3", "spring4", "spring5", "spring6", "spring7", "spring8",
  "anti2", "anti3", "anti4", "anti5", "anti6", "anti7", "anti8"
)


# view the first few observations (and columns) in the data set
head(nlsy_math_wide[, 1:11], 10)
#>      id female lb_wght anti_k1 math2 math3 math4 math5 math6 math7 math8
#> 1   201      1       0       0    NA    38    NA    55    NA    NA    NA
#> 2   303      1       0       1    26    NA    NA    33    NA    NA    NA
#> 3  2702      0       0       0    56    NA    58    NA    NA    NA    80
#> 4  4303      1       0       0    NA    41    58    NA    NA    NA    NA
#> 5  5002      0       0       4    NA    NA    46    NA    54    NA    66
#> 6  5005      1       0       0    35    NA    50    NA    60    NA    59
#> 7  5701      0       0       2    NA    62    61    NA    NA    NA    NA
#> 8  6102      0       0       0    NA    NA    55    67    NA    81    NA
#> 9  6801      1       0       0    NA    54    NA    62    NA    66    NA
#> 10 6802      0       0       0    NA    55    NA    66    NA    68    NA

Specifichiamo il modello SEM.

# writing out linear growth model with tic in full SEM way
lg_math_tic_lavaan_model <- "
    #latent variable definitions
            #intercept
              eta1 =~ 1*math2+
                      1*math3+
                      1*math4+
                      1*math5+
                      1*math6+
                      1*math7+
                      1*math8
            #linear slope
              eta2 =~ 0*math2+
                      1*math3+
                      2*math4+
                      3*math5+
                      4*math6+
                      5*math7+
                      6*math8

          #factor variances
            eta1 ~~ eta1
            eta2 ~~ eta2

          #factor covariance
            eta1 ~~ eta2

          #manifest variances (set equal by naming theta)
            math2 ~~ theta*math2
            math3 ~~ theta*math3
            math4 ~~ theta*math4
            math5 ~~ theta*math5
            math6 ~~ theta*math6
            math7 ~~ theta*math7
            math8 ~~ theta*math8

          #latent means (freely estimated)
            eta1 ~ 1
            eta2 ~ 1

          #manifest means (fixed to zero)
            math2 ~ 0*1
            math3 ~ 0*1
            math4 ~ 0*1
            math5 ~ 0*1
            math6 ~ 0*1
            math7 ~ 0*1
            math8 ~ 0*1

        #Time invariant covaraite
          #regression of time-invariant covariate on intercept and slope factors
            eta1~lb_wght+anti_k1
            eta2~lb_wght+anti_k1

        #variance of TIV covariates
            lb_wght ~~ lb_wght
            anti_k1 ~~ anti_k1

        #covariance of TIV covaraites
            lb_wght ~~ anti_k1

        #means of TIV covariates (freely estimated)
            lb_wght ~ 1
            anti_k1 ~ 1
" # end of model definition

Adattiamo il modello ai dati.

lg_math_tic_lavaan_fit <- sem(lg_math_tic_lavaan_model,
  data = nlsy_math_wide,
  meanstructure = TRUE,
  estimator = "ML",
  missing = "fiml"
)

Esaminiamo la soluzione ottenuta.

summary(lg_math_tic_lavaan_fit, fit.measures = TRUE)
#> lavaan 0.6.15 ended normally after 117 iterations
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                        21
#>   Number of equality constraints                     6
#> 
#>   Number of observations                           933
#>   Number of missing patterns                        61
#> 
#> Model Test User Model:
#>                                                       
#>   Test statistic                               220.221
#>   Degrees of freedom                                39
#>   P-value (Chi-square)                           0.000
#> 
#> Model Test Baseline Model:
#> 
#>   Test statistic                               892.616
#>   Degrees of freedom                                36
#>   P-value                                        0.000
#> 
#> User Model versus Baseline Model:
#> 
#>   Comparative Fit Index (CFI)                    0.788
#>   Tucker-Lewis Index (TLI)                       0.805
#>                                                       
#>   Robust Comparative Fit Index (CFI)             0.920
#>   Robust Tucker-Lewis Index (TLI)                0.926
#> 
#> Loglikelihood and Information Criteria:
#> 
#>   Loglikelihood user model (H0)              -9785.085
#>   Loglikelihood unrestricted model (H1)      -9674.975
#>                                                       
#>   Akaike (AIC)                               19600.171
#>   Bayesian (BIC)                             19672.747
#>   Sample-size adjusted Bayesian (SABIC)      19625.108
#> 
#> Root Mean Square Error of Approximation:
#> 
#>   RMSEA                                          0.071
#>   90 Percent confidence interval - lower         0.062
#>   90 Percent confidence interval - upper         0.080
#>   P-value H_0: RMSEA <= 0.050                    0.000
#>   P-value H_0: RMSEA >= 0.080                    0.046
#>                                                       
#>   Robust RMSEA                                   0.100
#>   90 Percent confidence interval - lower         0.000
#>   90 Percent confidence interval - upper         0.183
#>   P-value H_0: Robust RMSEA <= 0.050             0.218
#>   P-value H_0: Robust RMSEA >= 0.080             0.654
#> 
#> Standardized Root Mean Square Residual:
#> 
#>   SRMR                                           0.097
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Observed
#>   Observed information based on                Hessian
#> 
#> Latent Variables:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>   eta1 =~                                             
#>     math2             1.000                           
#>     math3             1.000                           
#>     math4             1.000                           
#>     math5             1.000                           
#>     math6             1.000                           
#>     math7             1.000                           
#>     math8             1.000                           
#>   eta2 =~                                             
#>     math2             0.000                           
#>     math3             1.000                           
#>     math4             2.000                           
#>     math5             3.000                           
#>     math6             4.000                           
#>     math7             5.000                           
#>     math8             6.000                           
#> 
#> Regressions:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>   eta1 ~                                              
#>     lb_wght          -2.716    1.294   -2.099    0.036
#>     anti_k1          -0.551    0.232   -2.369    0.018
#>   eta2 ~                                              
#>     lb_wght           0.625    0.333    1.873    0.061
#>     anti_k1          -0.019    0.059   -0.327    0.743
#> 
#> Covariances:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>  .eta1 ~~                                             
#>    .eta2             -0.078    1.145   -0.068    0.945
#>   lb_wght ~~                                          
#>     anti_k1           0.007    0.014    0.548    0.584
#> 
#> Intercepts:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>    .eta1             36.290    0.497   73.052    0.000
#>    .eta2              4.315    0.122   35.420    0.000
#>    .math2             0.000                           
#>    .math3             0.000                           
#>    .math4             0.000                           
#>    .math5             0.000                           
#>    .math6             0.000                           
#>    .math7             0.000                           
#>    .math8             0.000                           
#>     lb_wght           0.080    0.009    9.031    0.000
#>     anti_k1           1.454    0.050   29.216    0.000
#> 
#> Variances:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>    .eta1             63.064    5.609   11.242    0.000
#>    .eta2              0.713    0.326    2.185    0.029
#>    .math2   (thet)   36.257    1.868   19.406    0.000
#>    .math3   (thet)   36.257    1.868   19.406    0.000
#>    .math4   (thet)   36.257    1.868   19.406    0.000
#>    .math5   (thet)   36.257    1.868   19.406    0.000
#>    .math6   (thet)   36.257    1.868   19.406    0.000
#>    .math7   (thet)   36.257    1.868   19.406    0.000
#>    .math8   (thet)   36.257    1.868   19.406    0.000
#>     lb_wght           0.074    0.003   21.599    0.000
#>     anti_k1           2.312    0.107   21.599    0.000

Creiamo un diagramma di percorso.

semPaths(lg_math_tic_lavaan_fit, what = "path", whatLabels = "par")

34.0.1 Valutare il contributo delle covariate

Per valutare se le covariate considerate forniscono un contributo sostanziale al modello confronteremo il modello precedente con un modello vincolato in cui l’effetto delle covariate viene fissato come uguale a zero.

# writing out linear growth model with tic in full SEM way
lg_math_ticZERO_lavaan_model <- "
    #latent variable definitions
            #intercept
              eta1 =~ 1*math2+
                      1*math3+
                      1*math4+
                      1*math5+
                      1*math6+
                      1*math7+
                      1*math8
            #linear slope
              eta2 =~ 0*math2+
                      1*math3+
                      2*math4+
                      3*math5+
                      4*math6+
                      5*math7+
                      6*math8

          #factor variances
            eta1 ~~ eta1
            eta2 ~~ eta2

          #factor covariance
            eta1 ~~ eta2

          #manifest variances (set equal by naming theta)
            math2 ~~ theta*math2
            math3 ~~ theta*math3
            math4 ~~ theta*math4
            math5 ~~ theta*math5
            math6 ~~ theta*math6
            math7 ~~ theta*math7
            math8 ~~ theta*math8

          #latent means (freely estimated)
            eta1 ~ 1
            eta2 ~ 1

          #manifest means (fixed to zero)
            math2 ~ 0*1
            math3 ~ 0*1
            math4 ~ 0*1
            math5 ~ 0*1
            math6 ~ 0*1
            math7 ~ 0*1
            math8 ~ 0*1

        #Time invariant covaraite
          #regression of time-invariant covariate on intercept and slope factors
          #FIXED to 0
            eta1~0*lb_wght+0*anti_k1
            eta2~0*lb_wght+0*anti_k1

        #variance of TIV covariates
            lb_wght ~~ lb_wght
            anti_k1 ~~ anti_k1

        #covariance of TIV covaraites
            lb_wght ~~ anti_k1

        #means of TIV covariates (freely estimated)
            lb_wght ~ 1
            anti_k1 ~ 1
" # end of model definition

Adattiamo il modello ai dati.

lg_math_ticZERO_lavaan_fit <- sem(lg_math_ticZERO_lavaan_model,
  data = nlsy_math_wide,
  meanstructure = TRUE,
  estimator = "ML",
  missing = "fiml"
)

Esaminiamo il risultato ottenuto.

summary(lg_math_ticZERO_lavaan_fit, fit.measures = TRUE)
#> lavaan 0.6.15 ended normally after 102 iterations
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                        17
#>   Number of equality constraints                     6
#> 
#>   Number of observations                           933
#>   Number of missing patterns                        61
#> 
#> Model Test User Model:
#>                                                       
#>   Test statistic                               234.467
#>   Degrees of freedom                                43
#>   P-value (Chi-square)                           0.000
#> 
#> Model Test Baseline Model:
#> 
#>   Test statistic                               892.616
#>   Degrees of freedom                                36
#>   P-value                                        0.000
#> 
#> User Model versus Baseline Model:
#> 
#>   Comparative Fit Index (CFI)                    0.776
#>   Tucker-Lewis Index (TLI)                       0.813
#>                                                       
#>   Robust Comparative Fit Index (CFI)             0.917
#>   Robust Tucker-Lewis Index (TLI)                0.931
#> 
#> Loglikelihood and Information Criteria:
#> 
#>   Loglikelihood user model (H0)              -9792.208
#>   Loglikelihood unrestricted model (H1)      -9674.975
#>                                                       
#>   Akaike (AIC)                               19606.416
#>   Bayesian (BIC)                             19659.638
#>   Sample-size adjusted Bayesian (SABIC)      19624.703
#> 
#> Root Mean Square Error of Approximation:
#> 
#>   RMSEA                                          0.069
#>   90 Percent confidence interval - lower         0.061
#>   90 Percent confidence interval - upper         0.078
#>   P-value H_0: RMSEA <= 0.050                    0.000
#>   P-value H_0: RMSEA >= 0.080                    0.020
#>                                                       
#>   Robust RMSEA                                   0.097
#>   90 Percent confidence interval - lower         0.000
#>   90 Percent confidence interval - upper         0.173
#>   P-value H_0: Robust RMSEA <= 0.050             0.208
#>   P-value H_0: Robust RMSEA >= 0.080             0.646
#> 
#> Standardized Root Mean Square Residual:
#> 
#>   SRMR                                           0.103
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Observed
#>   Observed information based on                Hessian
#> 
#> Latent Variables:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>   eta1 =~                                             
#>     math2             1.000                           
#>     math3             1.000                           
#>     math4             1.000                           
#>     math5             1.000                           
#>     math6             1.000                           
#>     math7             1.000                           
#>     math8             1.000                           
#>   eta2 =~                                             
#>     math2             0.000                           
#>     math3             1.000                           
#>     math4             2.000                           
#>     math5             3.000                           
#>     math6             4.000                           
#>     math7             5.000                           
#>     math8             6.000                           
#> 
#> Regressions:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>   eta1 ~                                              
#>     lb_wght           0.000                           
#>     anti_k1           0.000                           
#>   eta2 ~                                              
#>     lb_wght           0.000                           
#>     anti_k1           0.000                           
#> 
#> Covariances:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>  .eta1 ~~                                             
#>    .eta2             -0.181    1.150   -0.158    0.875
#>   lb_wght ~~                                          
#>     anti_k1           0.007    0.014    0.548    0.584
#> 
#> Intercepts:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>    .eta1             35.267    0.355   99.229    0.000
#>    .eta2              4.339    0.088   49.136    0.000
#>    .math2             0.000                           
#>    .math3             0.000                           
#>    .math4             0.000                           
#>    .math5             0.000                           
#>    .math6             0.000                           
#>    .math7             0.000                           
#>    .math8             0.000                           
#>     lb_wght           0.080    0.009    9.031    0.000
#>     anti_k1           1.454    0.050   29.216    0.000
#> 
#> Variances:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>    .eta1             64.562    5.659   11.408    0.000
#>    .eta2              0.733    0.327    2.238    0.025
#>    .math2   (thet)   36.230    1.867   19.410    0.000
#>    .math3   (thet)   36.230    1.867   19.410    0.000
#>    .math4   (thet)   36.230    1.867   19.410    0.000
#>    .math5   (thet)   36.230    1.867   19.410    0.000
#>    .math6   (thet)   36.230    1.867   19.410    0.000
#>    .math7   (thet)   36.230    1.867   19.410    0.000
#>    .math8   (thet)   36.230    1.867   19.410    0.000
#>     lb_wght           0.074    0.003   21.599    0.000
#>     anti_k1           2.312    0.107   21.599    0.000

Generiamo il diagramma di percorso.

semPaths(lg_math_ticZERO_lavaan_fit, what = "path", whatLabels = "par")

Eseguiamo il confronto tra i due modelli mediante il test del rapporto tra verosimiglianze.

anova(lg_math_tic_lavaan_fit, lg_math_ticZERO_lavaan_fit)
#> 
#> Chi-Squared Difference Test
#> 
#>                            Df   AIC   BIC  Chisq Chisq diff    RMSEA Df diff
#> lg_math_tic_lavaan_fit     39 19600 19673 220.22                            
#> lg_math_ticZERO_lavaan_fit 43 19606 19660 234.47     14.245 0.052395       4
#>                            Pr(>Chisq)   
#> lg_math_tic_lavaan_fit                  
#> lg_math_ticZERO_lavaan_fit   0.006552 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Il valore-p associato alla statistica chi-quadrato relativa alla differenza tra i due modelli è piccolo. Questo significa che i vincoli aggiuntivi hanno impattato in maniera sostanziale la bontà di adattamento del modello ai dati. Da ciò concludiamo che le covariate forniscono un contributo utile al modello.

References

Grimm, Kevin J, Nilam Ram, and Ryne Estabrook. 2016. Growth Modeling: Structural Equation and Multilevel Modeling Approaches. Guilford Publications.