Analysis of a multi-environmental field trial

Emi Tanaka

Australian National University

2025-07-10

Multi-environment trials

Multi-environment trials (METs) are field experiments conducted in multiple locations and/or seasons to evaluate the performance of different crop genotypes.
Trial location and season/year is used synonymously with “environment”

library(cropscan)
plot_field(met2_pheno)

G = 191 genotypes tested across all E = 5 environments with n = 288 observations and B = 2 blocks per trial.

concurrence_matrix(met2_pheno)

     
      E02 E05 E07 E08 E10
  E02 191 191 191 191 191
  E05 191 191 191 191 191
  E07 191 191 191 191 191
  E08 191 191 191 191 191
  E10 191 191 191 191 191

Approach 1: Fit separate models to each environment

For i = 1, ..., E = 5, assuming observations are ordered by row then column,

\boldsymbol{y}_i = \boldsymbol{1}_{n}{\color{orange}\mu_i} + \mathbf{Z}_{g_i}\boldsymbol{u}_{g_i} + \mathbf{Z}_{b}\boldsymbol{u}_{b_i} + \boldsymbol{e}_i, \boldsymbol{u}_{g_i} \sim N(\boldsymbol{0}, {\color{orange}\sigma_{g_i}^2}\mathbf{I}_G), \boldsymbol{u}_{b_i} \sim N(\boldsymbol{0}, {\color{orange}\sigma_{b_i}^2}\mathbf{I}_B) and \boldsymbol{e}_{i} \sim N(\boldsymbol{0}, {\color{orange}\sigma_{i}^2}\mathbf{I}_{n})

library(tidyverse)
library(asreml)
library(broom.asreml)
fit_singles <- list()
for(trial in c("E02", "E05", "E07", "E08", "E10")) {
  fit_singles[[trial]] <- asreml(yield ~ 1, 
                                 random =~ gen + block, 
                                 data = met2_pheno |> 
                                   filter(env == trial))
}

Approach 2: Fit a DIAG model

\boldsymbol{y} = \begin{bmatrix}\boldsymbol{y}_1\\\boldsymbol{y}_2\\\boldsymbol{y}_3\\\boldsymbol{y}_4\\\boldsymbol{y}_5\end{bmatrix} = (\mathbf{I}_E\otimes\boldsymbol{1}_{n})\boldsymbol{\mu} + \mathbf{Z}_{ge}{\color{orange}\boldsymbol{u}_{ge}} + (\mathbf{I}_E\otimes\mathbf{Z}_b)\boldsymbol{u}_b + \boldsymbol{e}

\boldsymbol{u}_{ge}\sim N\left(\boldsymbol{0}, {\color{orange}\begin{bmatrix}\sigma^2_{g_1}\mathbf{I}_G & \mathbf{0} & \mathbf{0} & \mathbf{0} & \mathbf{0}\\\mathbf{0} & \sigma^2_{g_2}\mathbf{I}_G & \mathbf{0} & \mathbf{0} & \mathbf{0}\\\mathbf{0} & \mathbf{0} & \sigma^2_{g_3}\mathbf{I}_G & \mathbf{0} & \mathbf{0}\\\mathbf{0} & \mathbf{0} & \mathbf{0} & \sigma^2_{g_4}\mathbf{I}_G & \mathbf{0}\\\mathbf{0} & \mathbf{0} & \mathbf{0} & \mathbf{0} & \sigma^2_{g_5}\mathbf{I}_G\end{bmatrix}}\right), \boldsymbol{u}_{b}\sim N\left(\boldsymbol{0}, \text{diag}(\sigma^2_{b_1}, \sigma^2_{b_2}, \sigma^2_{b_3}, \sigma^2_{b_4}, \sigma^2_{b_5})\otimes \mathbf{I}_B\right), \boldsymbol{e}\sim N\left(\boldsymbol{0}, \text{diag}(\sigma^2_{1}, \sigma^2_{2}, \sigma^2_{3}, \sigma^2_{4}, \sigma^2_{5})\otimes \mathbf{I}_n\right)

fit_diag <- asreml(yield ~ env, 
                   random =~ diag(env):gen + diag(env):block, 
                   residual =~ dsum(~units|env),
                   data = met2_pheno)

Single site analysis vs MET analysis

What do you notice from below?

tidy(fit_diag, "varcomp")

# A tibble: 15 × 5
   term                 estimate std.error statistic constraint
   <chr>                   <dbl>     <dbl>     <dbl> <chr>     
 1 env:block!env_E02 0.0276        0.0415      0.666 P         
 2 env:block!env_E05 0.0257        0.0372      0.691 P         
 3 env:block!env_E07 0.000000264  NA          NA     B         
 4 env:block!env_E08 0.00691       0.0101      0.685 P         
 5 env:block!env_E10 0.00127       0.00207     0.611 P         
 6 env:gen!env_E02   0.0648        0.0303      2.14  P         
 7 env:gen!env_E05   0.0654        0.0140      4.66  P         
 8 env:gen!env_E07   0.151         0.0228      6.64  P         
 9 env:gen!env_E08   0.0249        0.00514     4.85  P         
10 env:gen!env_E10   0.0271        0.00495     5.47  P         
11 env_E02!R         0.231         0.0317      7.28  P         
12 env_E05!R         0.0747        0.0105      7.09  P         
13 env_E07!R         0.0781        0.0112      7.01  P         
14 env_E08!R         0.0277        0.00385     7.19  P         
15 env_E10!R         0.0232        0.00325     7.12  P

tidy(fit_singles[["E02"]], "varcomp")

# A tibble: 3 × 5
  term    estimate std.error statistic constraint
  <chr>      <dbl>     <dbl>     <dbl> <chr>     
1 block     0.0276    0.0415     0.665 P         
2 gen       0.0648    0.0303     2.14  P         
3 units!R   0.231     0.0317     7.28  P

tidy(fit_singles[["E05"]], "varcomp")

# A tibble: 3 × 5
  term    estimate std.error statistic constraint
  <chr>      <dbl>     <dbl>     <dbl> <chr>     
1 block     0.0257    0.0372     0.692 P         
2 gen       0.0654    0.0140     4.66  P         
3 units!R   0.0747    0.0105     7.09  P

tidy(fit_singles[["E07"]], "varcomp")

# A tibble: 3 × 5
  term         estimate std.error statistic constraint
  <chr>           <dbl>     <dbl>     <dbl> <chr>     
1 block   0.00000000791   NA          NA    B         
2 gen     0.151            0.0228      6.64 P         
3 units!R 0.0782           0.0112      7.01 P

tidy(fit_singles[["E08"]], "varcomp")

# A tibble: 3 × 5
  term    estimate std.error statistic constraint
  <chr>      <dbl>     <dbl>     <dbl> <chr>     
1 block    0.00692   0.0101      0.684 P         
2 gen      0.0249    0.00514     4.85  P         
3 units!R  0.0277    0.00385     7.18  P

tidy(fit_singles[["E10"]], "varcomp")

# A tibble: 3 × 5
  term    estimate std.error statistic constraint
  <chr>      <dbl>     <dbl>     <dbl> <chr>     
1 block    0.00127   0.00200     0.633 P         
2 gen      0.0271    0.00495     5.47  P         
3 units!R  0.0232    0.00326     7.11  P

Approach 3: Fit a CS model

\boldsymbol{y} = (\mathbf{I}_E\otimes\boldsymbol{1}_{n})\boldsymbol{\mu} + \mathbf{Z}_g{\color{orange}\boldsymbol{u}_g} + \mathbf{Z}_{ge}{\color{orange}\boldsymbol{u}_{ge}} + (\mathbf{I}_E\otimes\mathbf{Z}_b)\boldsymbol{u}_b + \boldsymbol{e}

where \boldsymbol{u}_g \sim N(\boldsymbol{0}, {\color{orange}\sigma^2_g\mathbf{I}_G}), \boldsymbol{u}_{ge} \sim N(\boldsymbol{0}, {\color{orange}\sigma^2_{ge}\mathbf{I}_{GE}}) and other random effects distributed as Approach 2.

fit_cs <- asreml(yield ~ env, 
                 random =~ gen + env:gen + diag(env):block, 
                 residual =~ dsum(~units|env),
                 data = met2_pheno)

\boldsymbol{1}_E \otimes \boldsymbol{u}_g + \boldsymbol{u}_{ge} \sim N\left(\boldsymbol{0}, \underbrace{{\color{orange}\begin{bmatrix}\small\sigma^2_g + \sigma^2_{ge} & \small\sigma^2_g & \small\sigma^2_g & \small\sigma^2_g & \small\sigma^2_g \\\small\sigma^2_g &\small\sigma^2_g + \sigma^2_{ge} &\small\sigma^2_g & \sigma^2_g & \small\sigma^2_g\\\sigma^2_g & \small\sigma^2_g &\small\sigma^2_g + \sigma^2_{ge} & \small\sigma^2_g & \small\sigma^2_g\\\small\sigma^2_g &\small \sigma^2_g &\small\sigma^2_g & \small\sigma^2_g + \sigma^2_{ge} & \small \sigma^2_g\\\small\sigma^2_g &\small\sigma^2_g & \small\sigma^2_g &\small \sigma^2_g & \small\sigma^2_g+ \sigma^2_{ge}\end{bmatrix}}}_{{\color{orange}\text{compound symmetry}}}\otimes \mathbf{I}_G \right)

Note: in this forumlation \sigma^2_g > 0 so covariance/correlation between same genotype across environment is always positive.

Yield distribution across environment

Approach 4: Fit an US model

\boldsymbol{y} = (\mathbf{I}_E\otimes\boldsymbol{1}_{n})\boldsymbol{\mu} + \mathbf{Z}_{ge}{\color{orange}\boldsymbol{u}_{ge}} + (\mathbf{I}_E\otimes\mathbf{Z}_b)\boldsymbol{u}_b + \boldsymbol{e}

\boldsymbol{u}_{ge}\sim N\left(\boldsymbol{0}, \underbrace{{\color{orange}\begin{bmatrix}\small\sigma^2_{g_1} & \small\sigma_{g_{12}} & \small\sigma_{g_{13}} & \small\sigma_{g_{14}} & \small\sigma_{g_{15}}\\\small\sigma_{g_{12}} & \small\sigma_{g_{2}}^2 & \small\sigma_{g_{23}} & \small\sigma_{g_{24}} & \small\sigma_{g_{25}}\\ \small\sigma_{g_{13}} & \small\sigma_{g_{23}} & \small\sigma_{g_{3}}^2 & \small\sigma_{g_{34}} & \small\sigma_{g_{35}}\\ \small\sigma_{g_{14}} & \small\sigma_{g_{24}} & \small\sigma_{g_{34}} & \small\sigma_{g_{4}}^2 & \small\sigma_{g_{45}}\\ \small\sigma_{g_{15}} & \small\sigma_{g_{25}} & \small\sigma_{g_{35}} & \small\sigma_{g_{45}} & \small\sigma_{g_{5}}^2 \end{bmatrix}}}_{\color{orange}\text{unstructured}} \otimes \mathbf{I}_G\right)

fit_us <- asreml(yield ~ env, 
                 random =~ us(env):gen + diag(env):block, 
                 residual =~ dsum(~units|env),
                 data = met2_pheno)

Flexible model that “borrows strength” across environments for more accurate prediction of \boldsymbol{u}_{ge}.

Number of parameters to estimate in an US model

# Environment	# Parameters
2	3
3	6
5	15
10	55
25	325
50	1275
100	5050
E	E(E + 1)/2

\mathbf{G}_{e} = \begin{bmatrix}\small\sigma^2_{g_1} & \small\sigma_{g_{12}} & \small\sigma_{g_{13}} & \small\sigma_{g_{14}} & \small\sigma_{g_{15}}\\\small\color{grey}\sigma_{g_{12}} & \small\sigma_{g_{2}}^2 & \small\sigma_{g_{23}} & \small\sigma_{g_{24}} & \small\sigma_{g_{25}}\\ \color{grey}\small\sigma_{g_{13}} & \color{grey}\small\sigma_{g_{23}} & \small\sigma_{g_{3}}^2 & \small\sigma_{g_{34}} & \sigma_{g_{35}}\\ \color{grey}\small\sigma_{g_{14}} & \color{grey}\small\sigma_{g_{24}} & \color{grey}\small\sigma_{g_{34}} & \small\sigma_{g_{4}}^2 & \small\sigma_{g_{45}}\\ \color{grey}\small\sigma_{g_{15}} & \color{grey}\small\sigma_{g_{25}} & \color{grey}\small\sigma_{g_{35}} & \color{grey}\small\sigma_{g_{45}} & \small\sigma_{g_{5}}^2 \end{bmatrix}

The number of parameters to be estimated grows quadratically with the number of environments so it quickly becomes too many parameters to estimate.

Approach 5: Fit a FA model

For some order K, estimate the unstructured covariance with the factor analytic form:

\mathbf{G}_e \approx \mathbf{\Lambda}\mathbf{\Lambda}^\top + \mathbf{\Psi}

where

\mathbf{\Lambda} = \underbrace{\begin{bmatrix}\lambda_{11} & 0 & \cdots & 0\\\lambda_{21} & \lambda_{22} & \ddots &0\\\lambda_{31} & \lambda_{32} & \cdots &0\\\lambda_{41} & \lambda_{42} & \cdots &0\\\lambda_{51} & \lambda_{52} & \cdots &\lambda_{5K}\end{bmatrix}}_{\text{loading matrix (with corner constraints)}}\quad\text{and}\quad \mathbf{\Psi} = \underbrace{\begin{bmatrix}\psi_1 & 0 & 0 & 0 & 0\\0 & \psi_2 & 0 & 0& 0\\0& 0 & \psi_3 & 0 & 0\\0 & 0 &0 & \psi_4 & 0\\0 & 0& 0&0& \psi_5\end{bmatrix}}_{\text{specific variances}}.

Number of parameters to estimate in a FA model

# Env	US	FA1	FA2	FA3	FA4
2	3	4	5
3	6	6	8	9
5	15	10	14	17	19
10	55	20	29	37	44
25	325	50	74	97	119
50	1275	100	149	197	244
100	5050	200	299	397	494

FA model of order K has (K+1)E - K(K - 1)/2 parameters to estimate.

The number of variance parameters to estimate for FA model grows linearly with the number of environments.
The FA model can be thought of as low-order approximation of the US model.

Latent variable model

FA model is a special case of latent variable model when the responses are conditionally normally distributed

\boldsymbol{u}_{ge} = (\mathbf{\Lambda} \otimes \mathbf{I}_G) \boldsymbol{f} + \boldsymbol{\delta} assuming \boldsymbol{f} \sim N(\boldsymbol{0}, \mathbf{I}_{GK}) \quad\text{and}\quad\boldsymbol{\delta}\sim N(\boldsymbol{0}, \mathbf{\Psi}\otimes \mathbf{I}_G).

Under above assumption:

\boldsymbol{u}_{ge} \sim N(\boldsymbol{0}, (\mathbf{\Lambda}\mathbf{\Lambda}^\top + \mathbf{\Psi})\otimes \mathbf{I}_G).

Interpreting the latent variable model

\boldsymbol{u}_{ge} = \underbrace{(\mathbf{\Lambda} \otimes \mathbf{I}_G) \boldsymbol{f}}_{\text{G}\times\text{E Regression}} + \underbrace{\boldsymbol{\delta}}_{\text{Residual G}\times\text{E}}

Notice that this is like a linear regression model except the covariates (\mathbf{\Lambda}) are estimated from the data.
The \mathbf{\Lambda} represent some shared latent environment covariate.
\boldsymbol{f}, referred to as common factor, represent how the genotype responds to that covariate.
\boldsymbol{\delta}, referred to as specific factor, represent an effect specific to that environment.

Fitting FA models

fit_fa2 <- asreml(yield ~ env, 
                  random =~ fa(env, 2):gen + diag(env):block, 
                  residual =~ dsum(~units|env),
                  data = met2_pheno)

Or equivalently:

fit_rr2 <- asreml(yield ~ env, 
                  random =~ rr(env, 2):gen + diag(env):gen + diag(env):block, 
                  residual =~ dsum(~units|env),
                  data = met2_pheno)

Key takeaways

When you have a MET data, think of the covariance structure in the model.
DIAG models can be equivalent to fitting separate models to each environment.
CS models can be too restrictive to realistically model the data.
US models are flexible and borrow strength across environments but the number of parameters to estimate can become prohitively large.
FA models are low-order approximation to US models.
FA models can also have possible interpretations useful for breeding.

Reference

Smith et al (2015) Factor analytic mixed models for the provision of grower information from national crop variety testing programs. Theor Appl Genet.