Linear mixed models

Emi Tanaka

Australian National University

2025-07-14

Linear mixed model

A general form for the linear mixed model is given by

  • We let \boldsymbol{\sigma} = (\boldsymbol{\sigma}^\top_g, \boldsymbol{\sigma}^\top_r)^\top denote the complete vectors of variance parameters.
  • We need to estimate the fixed effects \boldsymbol{\beta} and variance parameters \boldsymbol{\sigma}.

Variance structures of random effects

  • The q\times 1 vector of random effects is often composed of {n_b} sub-vectors \boldsymbol{b} = (\boldsymbol{b}_1^\top,\boldsymbol{b}_2^\top,\dots,\boldsymbol{b}_{n_b}^\top)^\top where the sub-vectors \boldsymbol{b}_i are of length q_i and \sum_{i=1}^{n_b}q_i = q.
  • Let \text{var}(\boldsymbol{b}_i) = \mathbf{G}_i and assuming \boldsymbol{b}_i are mutually independent, then

\mathbf{G} = \oplus^{n_b}_{i=1} \mathbf{G}_i = \begin{bmatrix} \mathbf{G}_1 & \mathbf{0} & \cdots & \mathbf{0} \\ \mathbf{0} & \mathbf{G}_2 & \cdots & \mathbf{0} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{0} & \mathbf{0} &\cdots & \mathbf{G}_{n_b} \\ \end{bmatrix} .

  • \mathbf{Z} = \begin{bmatrix} \mathbf{Z}_1 & \mathbf{Z}_2 & ... & \mathbf{Z}_{n_b} \\ \end{bmatrix} and \mathbf{Z}\boldsymbol{b} = \mathbf{Z}_1\boldsymbol{b}_1 + \cdots + \mathbf{Z}_{n_b}\boldsymbol{b}_{n_b} = \sum_{i=1}^{n_b}\mathbf{Z}_i\boldsymbol{b}_i.

Variance structures of errors

  • Classical assumption is that the errors are assumed to be independent and identically distributed (i.i.d) \rightarrow\mathbf{R} = \sigma^2 \mathbf{I}_n.
  • In some situations, \boldsymbol{e} will be sub-divided into sections, e.g. multi-clinic trials or multi-environment variety trials.
  • We let \boldsymbol{e} = (\boldsymbol{e}_1^\top, \boldsymbol{e}_2^\top, \dots, \boldsymbol{e}_{n_e}^\top)^\top so that \boldsymbol{e}_j represents the vector of errors of the jth section of the data.

\mathbf{R} = \oplus_{j=1}^{n_e} \mathbf{R}_j = \begin{bmatrix} \mathbf{R}_1 & \mathbf{0} & \dots & \mathbf{0} \\ \mathbf{0} & \mathbf{R}_2 & \dots & \mathbf{0} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{0} & \mathbf{0} & \dots & \mathbf{R}_{n_e} \end{bmatrix}

where \oplus is the direct sum operator.

Variance models

  • There are three types of variance model for \mathbf{R}_j and \mathbf{G}_i:


Correlation

  • diagonal elements are all 1
  • off-diagonal elements are between -1 and 1
  • If \mathbf{C} is a n\times n correlation model, then there are \frac{1}{2}n(n - 1) parameters.

Homogeneous

  • diagonal elements all have the same positive value \sigma^2 (say)
  • \sigma^2\mathbf{C} is a homogeneous model, so it has \frac{1}{2}n(n - 1) + 1 parameters.

Heterogeneous

  • diagonal elements are positive but can differ (\sigma_1^2, \sigma^2_2, \cdots, \sigma^2_n)
  • \mathbf{D}\mathbf{C}\mathbf{D} where \mathbf{D} = \oplus_{i=1}^n \sigma_i is a heteregenous model, so it has \frac{1}{2}n(n - 1) + n parameters.

Variance model functions in asreml

Default

id() = \mathbf{I}_n

idv() = \sigma^2\mathbf{I}_n

idh()

= \begin{bmatrix}\sigma^2_1 & 0 & \cdots & 0 \\ 0 & \sigma^2_2 & \ddots & \vdots\\ \vdots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & \sigma^2_n \end{bmatrix}

Time series


ar1(), ar2(), ar3(), sar(), sar2(), ma1(), ma3(), arma()

  • Suffix name with v or h to convert correlation model to homogeneous or heterogeneous models.
  • E.g. ar1v() or ar1h()

Metric-based


exp(), gau(), lvr(), iexp(), igau(), ieuc(), sph(), cir(), aexp(), agau(), mtrn()

  • Suffix name with v or h to convert correlation model to homogeneous or heterogeneous models.

General structures


cor(), corb(), corg(), diag(), us(), chol(), cholc(), ante(), sfa(), fa(), facv(), rr()

  • For cor(), corb(), and corg(), suffix name with v or h to convert correlation model to homogeneous or heterogeneous models.
  • Some combination are equivalent, e.g. idh() \equiv diag() and corgh() \equiv us(), but default starting values or computation under the hood may differ.

Aliasing of variance parameters

  • Variance models may not be:
    • identifiable if they are over-parameterised,
    • insufficient data to estimate the parameters of the chosen variance model.
  • Note that if you have \mathbf{V}_1 \otimes \mathbf{V}_2, where \mathbf{V}_1 and \mathbf{V}_2 are variance models, then this is over-parameterised. E.g. idv(col):idv(row).
  • \mathbf{C} \otimes \mathbf{V} or \mathbf{V} \otimes \mathbf{C}, where \mathbf{V} is a variance model and \mathbf{C} is a correlation model, may be acceptable. This is referred to as the sigma parameterization in asreml. E.g. idv(col):id(row).
  • \mathbf{C} \otimes \mathbf{C} is generally converted to \sigma^2 \mathbf{C} \otimes \mathbf{C} by asreml. This is referred to as the gamma parameterization in asreml. E.g. id(col):id(row).

Example 🌾 different variance models

library(asreml)
# gamma parameterization.
fiti <- asreml(yield ~ 1,
               random =~ Variety,
               residual =~ id(column):id(row),
               data = naf)

# sigma parameterization
fitv <- asreml(yield ~ 1,
               random =~ Variety,
               residual =~ idv(column):id(row),
               data = naf)

# sigma parameterization
fith <- asreml(yield ~ 1,
               random =~ Variety,
               residual =~ idh(column):id(row),
               data = naf)

Below cannot be fitted.

fitn <- asreml(yield ~ 1,
               random =~ Variety,
               residual =~ idv(column):idv(row),
               data = naf)
library(broom.asreml)

tidy(fiti, "varcomp")
# A tibble: 2 × 5
  term         estimate std.error statistic constraint
  <chr>           <dbl>     <dbl>     <dbl> <chr>     
1 Variety        0.0860    0.0303      2.84 P         
2 column:row!R   0.102     0.0220      4.63 P         
tidy(fitv, "varcomp")
# A tibble: 3 × 5
  term              estimate std.error statistic constraint
  <chr>                <dbl>     <dbl>     <dbl> <chr>     
1 Variety             0.0860    0.0303      2.84 P         
2 column:row!column   0.102     0.0220      4.63 P         
3 column:row!R        1        NA          NA    F         
tidy(fith, "varcomp")
# A tibble: 14 × 5
   term                 estimate std.error statistic constraint
   <chr>                   <dbl>     <dbl>     <dbl> <chr>     
 1 Variety                0.0543   0.0181       3.00 P         
 2 column:row!R           1       NA           NA    F         
 3 column:row!column_1    0.0225   0.0199       1.13 P         
 4 column:row!column_2    0.0153   0.00982      1.56 P         
 5 column:row!column_3    0.0267   0.0201       1.32 P         
 6 column:row!column_4    0.146    0.0738       1.97 P         
 7 column:row!column_5    0.226    0.100        2.25 P         
 8 column:row!column_6    0.0240   0.0179       1.34 P         
 9 column:row!column_7    0.326    0.142        2.30 P         
10 column:row!column_8    0.0939   0.0503       1.87 P         
11 column:row!column_9    0.220    0.102        2.15 P         
12 column:row!column_10   0.0788   0.0435       1.81 P         
13 column:row!column_11   0.116    0.0639       1.82 P         
14 column:row!column_12   0.292    0.131        2.23 P         

BLUEs and BLUPs

  • Suppose that \text{var}(\boldsymbol{y}) = \mathbf{V} = \mathbf{Z}\mathbf{G}\mathbf{Z}^\top + \mathbf{R}.
  • Also assume \boldsymbol{\sigma} is known, \mathbf{V} is non-singular and \mathbf{X} is full rank.
  • Let \mathbf{P} = \mathbf{V}^{-1} - \mathbf{V}^{-1}\mathbf{X}(\mathbf{X}^\top\mathbf{V}^{-1}\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{V}^{-1}

Best linear unbiased estimates (BLUEs)

\hat{\boldsymbol{\beta}} = (\mathbf{X}^\top\mathbf{V}^{-1}\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{V}^{-1}\boldsymbol{y}

Best linear unbiased predictions (BLUPs)

\tilde{\boldsymbol{b}} = \mathbf{G}\mathbf{Z}^\top\mathbf{P}\boldsymbol{y} = \mathbf{G}\mathbf{Z}^\top\mathbf{V}^{-1}(\boldsymbol{y} - \mathbf{X}\hat{\boldsymbol{\beta}})

  • But since \mathbf{V} is often unknown, it is estimated from the data.
  • Correspondingly, we plug the estimate of \mathbf{V} to above to get the E-BLUEs and E-BLUPs (where E = empirical).

Example 🌾 BLUEs and BLUPs by hand

By asreml

fits  <- asreml(yield ~ 1,
               random =~ Variety,
               residual =~ units,
               data = shf)
tidy(fits, "fixed")
# A tibble: 1 × 3
  term        estimate std.error
  <chr>          <dbl>     <dbl>
1 (Intercept)    1470.    0.0161
tidy(fits, "random")
# A tibble: 25 × 3
   term       estimate std.error
   <chr>         <dbl>     <dbl>
 1 Variety_1   -156.       0.103
 2 Variety_2     20.8      0.103
 3 Variety_3     -7.78     0.103
 4 Variety_4    -26.3      0.103
 5 Variety_5    -18.2      0.103
 6 Variety_6      4.43     0.103
 7 Variety_7    -42.2      0.103
 8 Variety_8     45.6      0.103
 9 Variety_9    -92.2      0.103
10 Variety_10  -161.       0.103
# ℹ 15 more rows

By “hand”

vars <- tidy(fits, "varcomp")$estimate
G <- vars[1] * diag(nlevels(shf$Variety))
R <- vars[2] * diag(nrow(shf))
Z <- model.matrix(~Variety - 1, data = shf)
X <- matrix(1, nrow = nrow(shf))
V <- Z %*% G %*% t(Z) + R
y <- shf$yield
(BLUE <- solve(t(X) %*% solve(V) %*% X) %*% t(X) %*% solve(V) %*% y)
        [,1]
[1,] 1470.44
(BLUP <- G %*% t(Z) %*% solve(V) %*% (y - X %*% BLUE))
             [,1]
 [1,] -156.452173
 [2,]   20.841535
 [3,]   -7.779433
 [4,]  -26.339105
 [5,]  -18.231459
 [6,]    4.430877
 [7,]  -42.163667
 [8,]   45.652885
 [9,]  -92.177099
[10,] -161.238614
[11,]  -66.388923
[12,]    7.849764
[13,]   92.247430
[14,]  -54.569343
[15,]  -14.714889
[16,]   -6.997973
[17,]   22.892867
[18,]   87.167941
[19,]  101.038854
[20,]  144.116829
[21,]  -27.804342
[22,]  106.118343
[23,]  -78.696916
[24,]   43.113141
[25,]   78.083470

Best linear unbiased estimator

  • A statistic \hat{\boldsymbol{\beta}} is the best linear unbiased estimator of \boldsymbol{\beta} if
    • \hat{\boldsymbol{\beta}} = \mathbf{A}\boldsymbol{y} for some \mathbf{A},
    • E(\hat{\boldsymbol{\beta}}) = \boldsymbol{\beta}, and
    • \text{var}(\hat{\boldsymbol{\beta}}) is the smallest of all linear unbiased estimator of \boldsymbol{\beta}.
  • Homework: prove \boldsymbol{c}_1^\top\hat{\boldsymbol{\beta}} + \boldsymbol{c}_2^\top\tilde{\boldsymbol{b}} is BLUE for \boldsymbol{c}_1^\top\boldsymbol{\beta} + \boldsymbol{c}_2^\top\boldsymbol{b} for any \boldsymbol{c}_1 \in\mathbb{R}^p and \boldsymbol{c}_2 \in\mathbb{R}^q.

Objective function to derive BLUEs and BLUPs

  • Find (\boldsymbol{\beta}, \boldsymbol{b}) which maximises the log-density function of the joint distribution of \boldsymbol{y} and \boldsymbol{b} assuming \boldsymbol{\sigma} is known.
  • Note \boldsymbol{y} \sim N(\mathbf{X}\boldsymbol{\beta}, \mathbf{V}), \boldsymbol{b} \sim N(\boldsymbol{0}, \mathbf{G}) and \boldsymbol{y}|\boldsymbol{b} \sim N(\mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\boldsymbol{b}, \mathbf{R}).

\begin{align*} \ell &= \log f(\boldsymbol{y}, \boldsymbol{b} ; \boldsymbol{\sigma})\\ &= \log f(\boldsymbol{y}|\boldsymbol{b};\boldsymbol{\sigma}_r) + \log f(\boldsymbol{b};\boldsymbol{\sigma}_g) \\ &= -\frac{1}{2}\log |\mathbf{R}| - \frac{1}{2}(\boldsymbol{y}-\mathbf{X}\boldsymbol{\beta}-\mathbf{Z}\boldsymbol{b})^\top \mathbf{R}^{-1}(\boldsymbol{y}-\mathbf{X}\boldsymbol{\beta}-\mathbf{Z}\boldsymbol{b}) \\ & \qquad - \frac{1}{2}\log|\mathbf{G}| - \frac{1}{2} \boldsymbol{b}^\top\mathbf{G}^{-1}\boldsymbol{b} - \frac{n+q}{2}\log 2\pi. \end{align*}

Mixed model equations

\begin{align*} \frac{\partial \ell}{\partial\boldsymbol{\beta}} &= \mathbf{X}^\top \mathbf{R}^{-1}(\boldsymbol{y} - \mathbf{X}\boldsymbol{\beta} - \mathbf{Z}\boldsymbol{b}) = \boldsymbol{0} \\ \frac{\partial \ell}{\partial\boldsymbol{b}} &= \mathbf{Z}^\top \mathbf{R}^{-1}(\boldsymbol{y} - \mathbf{X}\boldsymbol{\beta} - \mathbf{Z}\boldsymbol{b}) - \mathbf{G}^{-1} \boldsymbol{u}= \boldsymbol{0}. \end{align*}

Rearranging the above results in the mixed model equations:

\underbrace{\begin{bmatrix} \mathbf{X}^\top \mathbf{R}^{-1}\mathbf{X} & \mathbf{X}^\top \mathbf{R}^{-1}\mathbf{Z} \\ \mathbf{Z}^\top \mathbf{R}^{-1}\mathbf{X} & \mathbf{Z}^\top \mathbf{R}^{-1}\mathbf{Z} + \mathbf{G}^{-1} \\ \end{bmatrix}}_{\normalsize\mathbf{C}} \begin{bmatrix} \hat{\boldsymbol{\beta}} \\ \tilde{\boldsymbol{b}} \end{bmatrix} = \begin{bmatrix} \mathbf{X}^\top\mathbf{R}^{-1}\boldsymbol{y}\\ \mathbf{Z}^\top\mathbf{R}^{-1}\boldsymbol{y} \end{bmatrix}

Solving this results in the same BLUE and BLUP as before.

Prediction error variance

Variance of prediction errors

\text{var}\left(\begin{bmatrix}\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\\\tilde{\boldsymbol{b}} - \boldsymbol{b}\end{bmatrix}\right) = \mathbf{C}^{-1} = {\scriptsize\begin{bmatrix}(\mathbf{X}^\top\mathbf{V}^{-1}\mathbf{X})^{-1} & -(\mathbf{X}^\top\mathbf{V}^{-1}\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{V}^{-1}\mathbf{Z}\mathbf{G}\\ -\mathbf{G}\mathbf{Z}^\top\mathbf{V}^{-1}\mathbf{X}(\mathbf{X}^\top\mathbf{V}^{-1}\mathbf{X})^{-1} & \mathbf{G} - \mathbf{G}\mathbf{Z}^\top\mathbf{P}\mathbf{Z}\mathbf{G} \end{bmatrix}}

  • Note \text{var}(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}) = \text{var}(\hat{\boldsymbol{\beta}}).


fitc <- asreml(yield ~ 1,
               random =~ Variety,
               residual =~ units,
               data = shf,
               options = asreml.options(
                 Cfixed = TRUE,
                 Csparse =~ Variety
               ))
fitc$Cfixed
[1] 707.637
sp2mat(fitc$Csparse)[1:5, 1:5]
         [,1]     [,2]     [,3]     [,4]     [,5]
[1,] 4534.929    0.000    0.000    0.000    0.000
[2,]    0.000 4534.929    0.000    0.000    0.000
[3,]    0.000    0.000 4534.929    0.000    0.000
[4,]    0.000    0.000    0.000 4534.929    0.000
[5,]    0.000    0.000    0.000    0.000 4534.929

By “hand”

solve(t(X) %*% solve(V) %*% X)
         [,1]
[1,] 707.8299
P <- solve(V) - solve(V) %*% X %*% solve(t(X) %*% solve(V) %*% X) %*% t(X) %*% solve(V)
CZZ <- G - G %*% t(Z) %*% P %*% Z %*% G
CZZ[1:5, 1:5]
          [,1]      [,2]      [,3]      [,4]      [,5]
[1,] 4535.9150  243.1447  243.1447  243.1447  243.1447
[2,]  243.1447 4535.9150  243.1447  243.1447  243.1447
[3,]  243.1447  243.1447 4535.9150  243.1447  243.1447
[4,]  243.1447  243.1447  243.1447 4535.9150  243.1447
[5,]  243.1447  243.1447  243.1447  243.1447 4535.9150

Estimation of variance parameters

  • Variance parameters \boldsymbol{\sigma} can be estimated using maximum likelihood (ML) approach or residual (or restricted) maximum likelihood (REML) approach.
  • REML estimates takes into account the loss of degrees of freedom associated with estimation of fixed effects, so are less biased than ML estimates.

ML

\begin{align*} \hat{\boldsymbol{\sigma}}_{\text{ML}} &= \operatorname*{arg\,max}_{\boldsymbol{\sigma}} f(\boldsymbol{y}; \boldsymbol{\sigma})\\ &= \operatorname*{arg\,max}_{\boldsymbol{\sigma}}~ \log f(\boldsymbol{y}; \boldsymbol{\sigma})\\ &= \operatorname*{arg\,max}_{\boldsymbol{\sigma}}~ -\frac{1}{2}\left( \log |\mathbf{V}| + (\boldsymbol{y} - \mathbf{X}\hat{\boldsymbol{\beta}})^\top\mathbf{V}^{-1}(\boldsymbol{y} - \mathbf{X}\hat{\boldsymbol{\beta}})\right) \end{align*}

REML

\begin{align*} \hat{\boldsymbol{\sigma}}_{\text{REML}} &= \operatorname*{arg\,max}_{\boldsymbol{\sigma}} \ell_R(\boldsymbol{\sigma}) \quad(\textbf{residual log-likelihood})\\ &= \operatorname*{arg\,max}_{\boldsymbol{\sigma}}~ -\frac{1}{2}\left( \log |\mathbf{V}| + {\color{RoyalBlue} \log|\mathbf{X}^\top\mathbf{V}^{-1}\mathbf{X}|} + (\boldsymbol{y} - \mathbf{X}\hat{\boldsymbol{\beta}})^\top\mathbf{V}^{-1}(\boldsymbol{y} - \mathbf{X}\hat{\boldsymbol{\beta}})\right)\\ &= \operatorname*{arg\,max}_{\boldsymbol{\sigma}}~ -\frac{1}{2}\left(\log |\mathbf{G}| + \log |\mathbf{R}| + \log |\mathbf{C}| + \boldsymbol{y}^\top\mathbf{P}\boldsymbol{y}\right) \end{align*}

REML score equations

REML score equations

REML estimates for \boldsymbol{\sigma}_{m\times1} = (\sigma_1, \dots, \sigma_m)^\top are obtained by solving s_i(\boldsymbol{\sigma}) = \frac{\partial\ell_R(\boldsymbol{\sigma})}{\partial \sigma_i} = 0 \quad\text{for } i = 1, \dots, m.

  • We solve numerically using a Newton-Raphson algorithm: \boldsymbol{\sigma}^{(t + 1)} = \boldsymbol{\sigma}^{(t)} + \left({\color{RoyalBlue}\mathcal{J}(\boldsymbol{\sigma}^{(t)}})\right)^{-1}\boldsymbol{s}(\boldsymbol{\sigma}^{(t)}) where

    • \boldsymbol{s}(\boldsymbol{\sigma}^{(t)}) = (s_1(\boldsymbol{\sigma}^{(t)}), s_2(\boldsymbol{\sigma}^{(t)}), \dots, s_m(\boldsymbol{\sigma}^{(t)}))^\top is the m\times 1 scores at iteration t, and
    • \mathcal{J}(\boldsymbol{\sigma}^{(t)}) is the m\times m observed information matrix at iteration t where the (i,j)-th entry is given as \mathcal{J}_{ij}(\boldsymbol{\sigma}^{(t)}) = -\dfrac{\partial s_i(\boldsymbol{\sigma}^{(t)})}{\partial \sigma_j} = -\frac{\partial^2\ell_R(\boldsymbol{\sigma}^{(t)})}{\delta\sigma_i\delta\sigma_j}.

Information matrices

Let \mathbf{V}'_i = \frac{\partial \mathbf{V}}{\partial \sigma_i} and \mathbf{V}''_i = \frac{\partial^2 \mathbf{V}}{\partial \sigma_i\partial \sigma_j}.

Observed information matrix

\mathcal{J}_{ij} = \frac{1}{2}\text{tr}(\mathbf{P}\mathbf{V}''_{ij}) - \frac{1}{2}\text{tr}(\mathbf{P}\mathbf{V}_{i}'\mathbf{P}\mathbf{V}_j') + \boldsymbol{y}\mathbf{P}\mathbf{V}'_i\mathbf{P}\mathbf{V}_j'\mathbf{P}\boldsymbol{y}-\frac{1}{2}\boldsymbol{y}^\top\mathbf{P}\mathbf{V}''_{ij}\mathbf{P}\boldsymbol{y}

Expected information matrix (Fisher information)

\mathcal{I}_{ij} = E(\mathcal{J}_{ij}) = \frac{1}{2}\text{tr}(\mathbf{P}\mathbf{V}_{i}'\mathbf{P}\mathbf{V}_j')

Average information matrix

\mathcal{A}_{ij} = \frac{1}{2}\boldsymbol{y}\mathbf{P}\mathbf{V}'_i\mathbf{P}\mathbf{V}_j'\mathbf{P}\boldsymbol{y} = \frac{1}{2}(\mathcal{J}_{ij} + \mathcal{I}_{ij})\quad\text{assuming }\boldsymbol{y}^\top\mathbf{P}\mathbf{V}''_{ij}\mathbf{P}\boldsymbol{y} = \text{tr}(\mathbf{P}\mathbf{V}''_{ij})

Algorithm overview

Start with initial estimate of \boldsymbol{\sigma} = \boldsymbol{\sigma}^{(0)}.

For t = 1, \dots, \texttt{maxit},

  1. Calculate \hat{\boldsymbol{\beta}}^{(t - 1)} = (\mathbf{X}^\top(\mathbf{V}^{(t -1)})^{-1}\mathbf{X})^{-1}\mathbf{X}^\top(\mathbf{V}^{(t -1)})^{-1}\boldsymbol{y} and \tilde{\boldsymbol{b}}^{(t - 1)} = \mathbf{G}\mathbf{Z}^\top\mathbf{P}^{(t - 1)}\boldsymbol{y}.
  2. Calculate residual likelihood \ell_R^{(t- 1)} = -\frac{1}{2}\left(\log |\mathbf{G}^{(t-1)}| + \log |\mathbf{R}^{(t-1)}| + \log |\mathbf{C}^{(t-1)}| + \boldsymbol{y}^\top\mathbf{P}^{(t-1)}\boldsymbol{y}\right)
  3. Calculate the scores \boldsymbol{s}(\boldsymbol{\sigma}^{(t - 1)}).
  4. Calculate the average informaton matrix \mathcal{A}_{ij}^{(t - 1)} = \frac{1}{2}\boldsymbol{y}\mathbf{P}^{(t-1)}\mathbf{V}'^{(t-1)}_i\mathbf{P}^{(t-1)}\mathbf{V}_j'^{(t-1)}\mathbf{P}^{(t-1)}\boldsymbol{y}
  5. Update \boldsymbol{\sigma}^{(t)} = \boldsymbol{\sigma}^{(t - 1)} + \left(\mathcal{A}^{(t - 1)}\right)^{-1}\boldsymbol{s}(\boldsymbol{\sigma}^{(t-1)})
  6. Repeat 1-2, if \ell_R^{(t)} - \ell_R^{(t-1)} < \text{threshold} then break, otherwise continue from Step 3.

References

  • Smith (1999) Multiplicative mixed models for the analysis of multi-environment trial data. PhD Thesis.
  • Searle (2006) Matrix Algebra Useful for Statistics.
  • Searle et al (2006) Variance Components.