December 1, 2023
Print list of packages and cite them via Pandoc citation.
We use the HSB dataset from the merTools package (Knowles & Frederick, 2024):
  schid minority female    ses mathach size schtype meanses
1  1224        0      1 -1.528   5.876  842       0  -0.428
2  1224        0      1 -0.588  19.708  842       0  -0.428
3  1224        0      0 -0.528  20.349  842       0  -0.428
4  1224        0      0 -0.668   8.781  842       0  -0.428
5  1224        0      0 -0.158  17.898  842       0  -0.428
6  1224        0      0  0.022   4.583  842       0  -0.428
Environment (often) shows natural hierarchical structures (e.g., students nested within classes/schools, time points within persons)
The hierarchical structures may (!) lead to dependency in data
Hierarchical structures as a research object/topic
Example: The relation between socio-economic status and achievement across all (actually a subsample) schools
dat |>
  (\(d) d[1:550,])() |>
  ggplot(aes(x = ses, y = mathach)) + 
    geom_point(shape=21,
              fill="white",
               color="black",
               size=2) + 
    geom_smooth(method='lm',
                formula=y~x,
                color = "blue",
                se = F,
                linewidth = 1.5) +
    geom_smooth(method='loess',
                formula=y~x,
                color = "red",
                se = F,
                linewidth = 1.5) +
    labs(x = "SES",
         y = "Math Achievemenet") +
    theme_classic() 
Call:
lm(formula = mathach ~ ses, data = (function(d) d[1:550, ])(dat))
Residuals:
     Min       1Q   Median       3Q      Max 
-18.0224  -4.5656   0.1607   4.8815  17.8936 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  12.3956     0.2806   44.17   <2e-16 ***
ses           4.1711     0.3551   11.74   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.58 on 548 degrees of freedom
Multiple R-squared:  0.2011,    Adjusted R-squared:  0.1996 
F-statistic: 137.9 on 1 and 548 DF,  p-value: < 2.2e-16
Example: The relation between socio-economic status and achievement in two selected schools
dat |>
  subset(subset = schid == 1224 | schid == 1462) |>
  ggplot(aes(x = ses, y = mathach)) +
    geom_point(shape=21,
               fill="white",
               color="black",
               size=2) +
    geom_smooth(method='lm',
               formula=y~x, color = "blue", se = F) +
    geom_smooth(method='loess',
               formula=y~x, color = "red", se = F) +
    facet_wrap(~schid, nrow = 1) +
    labs(x = "SES",
         y = "Math Achievemenet") +
    theme_classic() \(𝐸(Y│X)= 10.81 + 2.51X\)
\(E(Y│X)=9.994−0.82X\)
Example: The relation between socio-economic status and achievement separated in J (14) schools
ggplot(dat[1:550,],
       aes(x = ses, y = mathach)) + 
  geom_point(shape=21,
             fill="white",
             color="black",
             size=2) +
  geom_smooth(method='lm',
              formula=y~x, 
              se = F, 
              color = "blue") +
  geom_smooth(method='loess',
              formula=y~x, 
              se = F,
              color = "red") +
  facet_wrap(~schid, nrow = 2) +
  labs(x = "SES",
       y = "Math Achievemenet") +
  theme_classic()One way to address the hierarchical data structure is to include indicator variables (i.e., dummy variables) for the clusters in the model → fixed effects approach
Using reference coding we build \((k-1)\) indicator variables \(I_c\)
Model equation (see Equation 1):
\[ Y = \beta_0 + \beta_1X + \beta_2I_2 + \beta_3I_3 + \dots + \beta_kI_k \qquad(1)\]
Example: The relation between socio-economic status \((X)\) and achievement \((Y)\) in J (160!) schools
Model equation (see Equation 2):
\[ Y = \beta_0 + \beta_1X + \sum\limits_{2}^{160} \beta_kI_k \qquad(2)\]
Instead of estimating a huge number of regression coefficients, it is possible to estimate only the distribution parameters (i.e., expected value and variance) of these parameters across clusters
In doing so, there are additional assumptions:
Level 1 residuals within all clusters follow a (multivariate) normal distribution (when the outcome is continuous): \(r_{ij} \sim N(0,\sigma^2)\)
Level 2 residuals (i.e., random effects) follow a multivariate normal distribution
\[ \left( \begin{matrix}u_{0j} \\ u_{1j}\end{matrix}\right) \sim N\left(\begin{bmatrix} 0 \\ 0\end{bmatrix}, \begin{bmatrix} VAR(u_{0j}) & COV(u_{0j}, u_{1j}) \\ COV(u_{1j}, u_{0j}) & VAR(u_{1j})\end{bmatrix}\right) \]
Description of this relationship within any school \(j\) by the equations
Level-specific equations: \[ \text{Level 1: } Y_{ij} = \beta_{0j} + \beta_{1j}X_{ij} + r_{ij} \qquad(3)\]
\[ \begin{aligned} \text{Level 2 (intercept): }& \beta_{0j} = \gamma_{00} + u_{0j} \end{aligned} \qquad(4)\]
\[ \begin{aligned} \text{Level 2 (slope): } &\beta_{1j} = \gamma_{01} + u_{1j} \end{aligned} \qquad(5)\]
Combined equation: Substituting Equation 4 and Equation 5 into Equation 3 yields the combined model equation (see Equation 6):
\[ Y_{ij} = \gamma_{00} + u_{0j} + (\gamma_{01} + u_{1j})X_{ij} + r_{ij} \qquad(6)\]
Random intercept: \(\beta_{0j} = \gamma_{00} + u_{0j}\)
Random slope: \(\beta_{1j} = \gamma_{01} + u_{1j}\)
Covariance between slopes and intercepts: \(COV(\beta_{0j}, \beta_{1j}) = \tau_{01}\)
Fixed effects: \(\gamma_{00}, \gamma_{01}\)
Random effects/residuals: \(u_{0j}, u_{1j}, r_{ij}\)
Random effects/ level 2 residuals:
\[ \begin{aligned} & \beta_{0j} = \gamma_{00} + u_{0j} \text{(rearrange equation)} \\ & $u_{0j} = \beta_{0j} - \gamma_{00} \end{aligned} \]
→ deviation of the cluster specific intercept (\(\beta_{0j}\)) from the average intercept across cluster (\(\gamma_{00}\))
\(\beta_{1j} = \gamma_{01} + u_{1j}\) (rearrange equation)
\(u_{1j} = \beta_{1j} - \gamma_{01}\) → deviation of the cluster specific slope (\(\beta_{1j}\)) from the average slope across cluster (\(\gamma_{01}\))
Random effects/ level 1 residual:
Recall EQ@ref(eq:comb-eq):
\(Y_{ij} = \gamma_{00} + u_{0j} + (\gamma_{01} + u_{1j})X_{ij} + r_{ij}\) (rearrange equation)
\(r_{ij} = Y_{ij} - \gamma_{00} - u_{0j} - \gamma_{01}X_{ij} - u_{1j}X_{ij}\) → deviation of the observed value (\(Y_{ij}\)) from the conditional expected value, given the predictor (\(X_{ij}\)) in cluster \(j\)
This is a test