Solving mixed model equations with unknown parent groups (UPG)

1) Direct solving (R)
2) Iterative solving (BLUPF90+)

Author: Eula Carrara ()
Created on: 22-Feb-2025
Modified on: 15-Mar-2025
Theory: lecture notes Dr. Daniela Lourenco, UGA, 2025
Numerical example taken from tutorial_blupf90.pdf
Required R packages: igraph, MASS, optiSel, tidyverse

install.packages(c("igraph", "MASS", "optiSel", "tidyverse"))  

Considering an individual a, progeny of s and d,

\[\begin{array}{c} s_i \quad \quad d_i \\ \backslash \quad \, / \\ \, a_i \end{array}\]

its expected breeding value will be:

  • if both parents known: \(a_i = \frac{a_{s_i} + a_{d_i}}{2}\)

  • if only sire known: \(a_i = \frac{a_{s_i} + 0}{2}\)

  • if only dam known: \(a_i = \frac{0 + a_{d_i}}{2}\)

  • if both parents unknown: \(a_i = \frac{0 + 0}{2}\)

That is, the breeding value of an unknown sire/dam is zero. This does not correspond to reality.
One way to fill these gaps in the pedigree is to build Unknown Parent Groups (UPG; or phantom parents, or genetic group):

  • only known sire: \(a_i = \frac{a_{s_i} + \boldsymbol{\mathit{UPG_i}}}{2}\)

  • only known dam: \(a_i = \frac{\boldsymbol{\mathit{UPG_i}} + a_{d_i}}{2}\)

  • both unknown parents: \(a_i = \frac{\boldsymbol{\mathit{UPG_i}} + \boldsymbol{\mathit{UPG_i}}}{2}\)


How ​​to create UPGs?

Using information about breed, year of birth, sex, country, etc…
UPGs are not animals! They are most often treated as a fixed effect in the model.


Animal mixed model

The mixed animal model can be described as:
\(y = Xb + Za + e\)

in which:
- y is the vector of phenotypes,
- b is the vector of fixed effects,
- a is the vector of direct additive genetic effects (breeding values),
- X and Z relate b and a to y,
- e is the vector of residuals.


Animal mixed model, but with UPG:

\(y = Xb + \mathbf{\mathit{ZQg}} + Za + e\)

in which:
- Q is the matrix that relates animals to UPGs,
- g is the effect of UPGs.

So we need to build Q. How?
The Q matrix is ​​constructed with the contribution fractions of each UPG in the expected breeding values ​​of the individuals related to the UPG.


Example of constructing the Q matrix:

Pedigree:

##   ID Sire Dam
## 1 a1   g2  a4
## 2 a2   g3  a1
## 3 a3   g3  a1
## 4 a4   g1  g1
## 5 a5   a3  a6
## 6 a6   g3  g3

First, we need to put all the expected breeding values as a function of the UPGs:

Defining the expected breeding values depending on the UPG
\[\begin{aligned} E(a_4) &= \frac{1}{2} \cdot g_1 + \frac{1}{2} \cdot g_1 = g_1 \\ \\[0.1cm] E(a_1) &= \frac{1}{2} \cdot g_2 + \frac{1}{2} \cdot E(a_4) = \frac{1}{2} g_1 + \frac{1}{2} g_2 \\ \\[0.1cm] E(a_3) &= \frac{1}{2} \cdot g_3 + \frac{1}{2} \cdot E(a_1) = \frac{1}{4} g_1 + \frac{1}{4} g_2 + \frac{1}{2} g_3 \\ \\[0.1cm] E(a_6) &= \frac{1}{2} \cdot g_3 + \frac{1}{2} \cdot g_3 = g_3 \\ \\[0.1cm] E(a_5) &= \frac{1}{2} \cdot E(a_3) + \frac{1}{2} \cdot E(a_6) = \frac{1}{8} g_1 + \frac{1}{8} g_2 + \frac{3}{4} g_3 \\ \\[0.1cm] E(a_2) &= \frac{1}{2} \cdot g_3 + \frac{1}{2} \cdot E(a_1) = \frac{1}{4} g_1 + \frac{1}{4} g_2 + \frac{1}{2} g_3 \end{aligned}\] Rearranging from a1 to a6 (just for better visualization)
\[\begin{aligned} E(a_1) &= \frac{1}{2} \cdot g_1 + \frac{1}{2} \cdot g_2 + 0 \cdot g_3 \\ \\[0.1cm] E(a_2) &= \frac{1}{4} \cdot g_1 + \frac{1}{4} \cdot g_2 + \frac{1}{2} \cdot g_3 \\ \\[0.1cm] E(a_3) &= \frac{1}{4} \cdot g_1 + \frac{1}{4} \cdot g_2 + \frac{1}{2} \cdot g_3 \\ \\[0.1cm] E(a_4) &= 1 \cdot g_1 + 0 \cdot g_2 + 0 \cdot g_3 \\ \\[0.1cm] E(a_5) &= \frac{1}{8} \cdot g_1 + \frac{1}{8} \cdot g_2 + \frac{3}{4} \cdot g_3 \\ \\[0.1cm] E(a_6) &= 0 \cdot g_1 + 0 \cdot g_2 + 1 \cdot g_3 \end{aligned}\]

Q matrix with the fractions of each UPG (cols) for each animal (rows)
\[ \begin{bmatrix} 1/2 & 1/2 & 0 \\ 1/4 & 1/4 & 1/2 \\ 1/4 & 1/4 & 1/2 \\ 1 & 0 & 0 \\ 1/8 & 1/8 & 3/4 \\ 0 & 0 & 1 \end{bmatrix} \]

This is the Q matrix that will be incorporated into the mixed model equations.


Example of solving mixed model equations considering UPG

Solving via R program, step by step

Data

data1 <- data.frame(  
ID = c('ID006', 'ID009', 'ID012', 'ID007', 'ID010', 'ID013', 'ID008', 'ID011', 'ID014', 'ID015'),  
A = c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'),  
S = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2),  
cov = c(1.0, 1.0, 2.0, 2.0, 1.0, 2.0, 2.0, 1.0, 1.0, 2.0),  
obs = c(3.0, 2.0, 4.0, 6.0, 3.0, 6.0, 6.0, 6.0, 8.0, 4.0))  
print(data1)  
##       ID A S cov obs
## 1  ID006 A 1   1   3
## 2  ID009 A 2   1   2
## 3  ID012 A 1   2   4
## 4  ID007 B 2   2   6
## 5  ID010 B 1   1   3
## 6  ID013 B 2   2   6
## 7  ID008 C 1   2   6
## 8  ID011 C 2   1   6
## 9  ID014 C 1   1   8
## 10 ID015 C 2   2   4


Pedigree without UPG

pedigree <- data.frame(  
ID = c('ID001', 'ID002', 'ID003', 'ID004', 'ID005', 'ID006', 'ID007', 'ID008', 'ID009', 'ID010', 'ID011', 'ID012', 'ID013', 'ID014', 'ID015'),  
Sire = c(NA, NA, NA, NA, NA, NA, 'ID002', 'ID001', 'ID002', 'ID007', 'ID007', 'ID011', 'ID011', 'ID009', 'ID011'),  
Dam = c(NA, NA, NA, NA, NA, NA, 'ID005', 'ID004', 'ID003', 'ID006', 'ID004', 'ID008', 'ID010', 'ID013', 'ID010')  
)  
print(pedigree)  
##       ID  Sire   Dam
## 1  ID001  <NA>  <NA>
## 2  ID002  <NA>  <NA>
## 3  ID003  <NA>  <NA>
## 4  ID004  <NA>  <NA>
## 5  ID005  <NA>  <NA>
## 6  ID006  <NA>  <NA>
## 7  ID007 ID002 ID005
## 8  ID008 ID001 ID004
## 9  ID009 ID002 ID003
## 10 ID010 ID007 ID006
## 11 ID011 ID007 ID004
## 12 ID012 ID011 ID008
## 13 ID013 ID011 ID010
## 14 ID014 ID009 ID013
## 15 ID015 ID011 ID010


Pedigree with UPG

pedigree_upg <- data.frame(  
ID = c('ID001', 'ID002', 'ID003', 'ID004', 'ID005', 'ID006', 'ID007', 'ID008', 'ID009', 'ID010', 'ID011', 'ID012', 'ID013', 'ID014', 'ID015'),  
Sire = c('g1', 'g2', 'g1', 'g2', 'g2', 'g1', 'ID002', 'ID001', 'ID002', 'ID007', 'ID007', 'ID011', 'ID011', 'ID009', 'ID011'),  
Dam = c('g4', 'g3', 'g3', 'g3', 'g4', 'g3', 'ID005', 'ID004', 'ID003', 'ID006', 'ID004', 'ID008', 'ID010', 'ID013', 'ID010')  
)  
print(pedigree_upg)  
##       ID  Sire   Dam
## 1  ID001    g1    g4
## 2  ID002    g2    g3
## 3  ID003    g1    g3
## 4  ID004    g2    g3
## 5  ID005    g2    g4
## 6  ID006    g1    g3
## 7  ID007 ID002 ID005
## 8  ID008 ID001 ID004
## 9  ID009 ID002 ID003
## 10 ID010 ID007 ID006
## 11 ID011 ID007 ID004
## 12 ID012 ID011 ID008
## 13 ID013 ID011 ID010
## 14 ID014 ID009 ID013
## 15 ID015 ID011 ID010
# Ordering pedigree without UPG by ID  
pedigree <- pedigree[order(pedigree$ID), ]  
  
# Ordering data by ID  
data1 <- data1[order(data1$ID), ]  


Model

\(y = Xb + ZQg + Za + e\)


Mixed Model Equations

\[ \begin{bmatrix} X'X & 0 & X'Z \\ 0 & Q'A^{-1}Q\alpha & -Q'A^{-1}\alpha \\ Z'X & -A^{-1}Q\alpha & Z'Z + A^{-1}\alpha \end{bmatrix} * \begin{bmatrix} b\\ g\\ Qg + a \end{bmatrix} = \begin{bmatrix} X'y \\ 0 \\ Z'y \end{bmatrix} \]


Elements of mixed model equations

y vector

y <- as.matrix(data1$obs)  
print(y)  
##       [,1]
##  [1,]    3
##  [2,]    6
##  [3,]    6
##  [4,]    2
##  [5,]    3
##  [6,]    6
##  [7,]    4
##  [8,]    6
##  [9,]    8
## [10,]    4


X matrix

library(dplyr)  
X <- data1 %>%  
mutate(A_A = ifelse(A == "A", 1, 0),  
 A_B = ifelse(A == "B", 1, 0),  
 A_C = ifelse(A == "C", 1, 0),  
 S_1 = ifelse(S == 1, 1, 0),  
 S_2 = ifelse(S == 2, 1, 0)) %>%  
select(A_A, A_B, A_C, S_1, S_2, cov)  
  
X <- as.matrix(X)  
print(X)  
##    A_A A_B A_C S_1 S_2 cov
## 1    1   0   0   1   0   1
## 4    0   1   0   0   1   2
## 7    0   0   1   1   0   2
## 2    1   0   0   0   1   1
## 5    0   1   0   1   0   1
## 8    0   0   1   0   1   1
## 3    1   0   0   1   0   2
## 6    0   1   0   0   1   2
## 9    0   0   1   1   0   1
## 10   0   0   1   0   1   2


Z matrix
(expanding Z for all animals to sum with the inverse of numerator relationship matrix, \(A^{-1}\))

animals <- pedigree$ID  
N <- nrow(data1)  
Np <- length(animals)  
  
Z <- matrix(0, nrow = N, ncol = Np)  
for (i in 1:N) {  
animal_index <- match(data1$ID[i], animals)  
Z[i, animal_index] <- 1  
}  
print(Z)  
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
##  [1,]    0    0    0    0    0    1    0    0    0     0     0     0     0     0     0
##  [2,]    0    0    0    0    0    0    1    0    0     0     0     0     0     0     0
##  [3,]    0    0    0    0    0    0    0    1    0     0     0     0     0     0     0
##  [4,]    0    0    0    0    0    0    0    0    1     0     0     0     0     0     0
##  [5,]    0    0    0    0    0    0    0    0    0     1     0     0     0     0     0
##  [6,]    0    0    0    0    0    0    0    0    0     0     1     0     0     0     0
##  [7,]    0    0    0    0    0    0    0    0    0     0     0     1     0     0     0
##  [8,]    0    0    0    0    0    0    0    0    0     0     0     0     1     0     0
##  [9,]    0    0    0    0    0    0    0    0    0     0     0     0     0     1     0
## [10,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0     1


Constructing matrices A and inverse A
(note that the inverse A is constructed using the pedigree without UPG)

library(optiSel)  
# A matrix  
A <- makeA(pedigree)  
# Show first 5 col/row  
print(A[1:5, 1:5], digits = 4)  
##       ID001 ID002 ID003 ID004 ID005
## ID001     1     0     0     0     0
## ID002     0     1     0     0     0
## ID003     0     0     1     0     0
## ID004     0     0     0     1     0
## ID005     0     0     0     0     1
# Inverse matrix A  
Ainv <- solve(A)  
# Show first 5 col/row  
print(Ainv[1:5, 1:5], digits = 4)  
##       ID001 ID002 ID003 ID004 ID005
## ID001   1.5   0.0   0.0   0.5   0.0
## ID002   0.0   2.0   0.5   0.0   0.5
## ID003   0.0   0.5   1.5   0.0   0.0
## ID004   0.5   0.0   0.0   2.0   0.0
## ID005   0.0   0.5   0.0   0.0   1.5


Q matrix
- Function to compute Q matrix:

n_UPG = 4  
  
compute_expectation <- function(pedigree, N) {  
expectations <- list()  
  
base_groups <- unique(c(pedigree$Sire, pedigree$Dam))  
base_groups <- base_groups[grepl("^g[0-9]+$", base_groups)]  
  
for (group in base_groups) {  
expectations[[group]] <- setNames(as.list(rep(0, N)), paste0("g", 1:N))  
expectations[[group]][[group]] <- 1  
}  
  
get_expectation <- function(id) {  
if (!is.null(expectations[[id]])) {  
return(expectations[[id]])  
}  
  
row <- pedigree[pedigree$ID == id, ]  
if (nrow(row) == 0) {  
return(setNames(as.list(rep(0, N)), paste0("g", 1:N)))  
}  
  
sire_exp <- get_expectation(row$Sire)  
dam_exp <- get_expectation(row$Dam)  
  
expectations[[id]] <- setNames(lapply(1:N, function(i) {  
(1/2) * sire_exp[[paste0("g", i)]] + (1/2) * dam_exp[[paste0("g", i)]]  
}), paste0("g", 1:N))  
  
return(expectations[[id]])  
}  
  
animals <- pedigree$ID[grepl("^ID[0-9]+$", pedigree$ID)]  
for (animal in animals) {  
expectations[[animal]] <- get_expectation(animal)  
}  
  
return(expectations)  
}  


Applying the function in pedigree with UPG
(format: ID, Sire, Dam, Sire/Dam with UPG codes)

expectations <- compute_expectation(pedigree_upg, n_UPG)  
  
coefficients <- data.frame(  
ID = names(expectations),  
g1 = sapply(expectations, function(x) x$g1),  
g2 = sapply(expectations, function(x) x$g2),  
g3 = sapply(expectations, function(x) x$g3),  
g4 = sapply(expectations, function(x) x$g4))  

Final Q matrix:

Q <- as.matrix(coefficients[-(1:4), -1])  
print(Q)  
##           g1     g2     g3     g4
## ID001 0.5000 0.0000 0.0000 0.5000
## ID002 0.0000 0.5000 0.5000 0.0000
## ID003 0.5000 0.0000 0.5000 0.0000
## ID004 0.0000 0.5000 0.5000 0.0000
## ID005 0.0000 0.5000 0.0000 0.5000
## ID006 0.5000 0.0000 0.5000 0.0000
## ID007 0.0000 0.5000 0.2500 0.2500
## ID008 0.2500 0.2500 0.2500 0.2500
## ID009 0.2500 0.2500 0.5000 0.0000
## ID010 0.2500 0.2500 0.3750 0.1250
## ID011 0.0000 0.5000 0.3750 0.1250
## ID012 0.1250 0.3750 0.3125 0.1875
## ID013 0.1250 0.3750 0.3750 0.1250
## ID014 0.1875 0.3125 0.4375 0.0625
## ID015 0.1250 0.3750 0.3750 0.1250


Let’s assume that the variance components are known:

Direct additive variance: \(0.5\)

vara <- 0.5  

Residual variance: \(2.0\)

vare <- 2.0  

Therefore, alpha is equal to:

alpha <- vare / vara  
print(alpha)  
## [1] 4


We have the individual elements of the mixed model equations, namely \(X\), \(Z\), \(Q\), \(A^{-1}\), \(\alpha\), and \(y\).
Now, we need to calculate each “block”:

\[ \begin{bmatrix} X'X & 0 & X'Z \\ 0 & Q'A^{-1}Q\alpha & -Q'A^{-1}\alpha \\ Z'X & -A^{-1}Q\alpha & Z'Z + A^{-1}\alpha \end{bmatrix} * \begin{bmatrix} b \\ g \\ Qg + a \end{bmatrix} = \begin{bmatrix} X'y \\ 0 \\ Z'y \end{bmatrix} \]

\[ \begin{bmatrix} \text{block11} & \text{block12} & \text{block13} \\ \text{block21} & \text{block22} & \text{block23} \\ \text{block31} & \text{block32} & \text{block33} \end{bmatrix} * \begin{bmatrix} b \\ g \\ Qg + a \end{bmatrix} = \begin{bmatrix} \text{block1} \\ \text{block2} \\ \text{block3} \end{bmatrix} \]


Remember: \(LHS * sol = RHS\)


LHS - left-hand side of mixed model equations

block_11 <- t(X) %*% X  
block_12 <- matrix(0, nrow = nrow(block_11), ncol = ncol(Q))  
block_13 <- t(X) %*% Z  
  
block_21 <- matrix(0, nrow = ncol(Q), ncol = nrow(block_11))  
block_22 <- t(Q) %*% Ainv %*% Q * alpha  
block_23 <- -t(Q) %*% Ainv * alpha  
  
block_31 <- t(Z) %*% X  
block_32 <- -Ainv %*% Q * alpha  
block_33 <- t(Z) %*% Z + Ainv * alpha  
  
LHS <- rbind(  
cbind(block_11, block_12, block_13),  
cbind(block_21, block_22, block_23),  
cbind(block_31, block_32, block_33))  
dimnames(LHS) <- NULL  
# Show first 5 col/row 
print(LHS[1:5, 1:5], digits = 4)  
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    3    0    0    2    1
## [2,]    0    3    0    1    2
## [3,]    0    0    4    2    2
## [4,]    2    1    2    5    0
## [5,]    1    2    2    0    5

RHS - right-hand side of mixed model equations

block_1 <- t(X) %*% y  
block_2 <- matrix(0, nrow = ncol(Q), ncol = ncol(y))  
block_3 <- t(Z) %*% y  
  
RHS <- rbind(block_1, block_2, block_3)  
dimnames(RHS) <- NULL  
# Show first 5 col/row  
print(RHS[1:5, 1], digits = 4)  
## [1]  9 15 24 24 24


We need to invert the LHS, because:
\(\text{sol} = \text{LHS}^{-1} \times \text{RHS}\)

solve(LHS)  

! system is computationally singular: reciprocal condition number = 1.58711e-18


Notice that the LHS is not a positive definite matrix (system is computationally singular), so it does not have an inverse.
We need to calculate the generalized inverse…

We will use the ginv() function from the MASS package, which calculates the generalized Moore-Penrose inverse (one of the most common ginv)

library(MASS)  
LHSinv <- ginv(LHS)  
# Show first 5 col/row  
print(LHSinv[1:5, 1:5], digits = 4)  
##        [,1]   [,2]   [,3]   [,4]  [,5]
## [1,] 1.4778 0.4631 0.6591 0.9670 1.633
## [2,] 0.4631 0.7888 0.4141 0.6595 1.007
## [3,] 0.6591 0.4141 0.8001 0.6504 1.223
## [4,] 0.9670 0.6595 0.6504 1.0605 1.216
## [5,] 1.6330 1.0065 1.2230 1.2164 2.646

We multiply the inverse of the LHS by the RHS and we will have our vector of solutions.

Final solutions (R):

solutions_R <- LHSinv %*% RHS  
print(solutions_R)  
##              [,1]
##  [1,] -0.53552906
##  [2,]  1.38930712
##  [3,]  2.13580021
##  [4,]  2.38873040
##  [5,]  0.60084788
##  [6,]  1.02099858
##  [7,] -4.60399744
##  [8,]  2.90828845
##  [9,]  3.30986193
## [10,] -5.00557093
## [11,] -4.79790276
## [12,]  3.12605649
## [13,] -0.62320503
## [14,]  3.09897532
## [15,] -1.05552267
## [16,] -0.67781191
## [17,]  1.02838548
## [18,] -0.84258229
## [19,]  1.27528846
## [20,] -0.05636038
## [21,]  2.04669910
## [22,]  0.54680764
## [23,]  1.16463449
## [24,]  1.34945546
## [25,]  0.79807885

The solutions order is:
\[ \begin{bmatrix} b \\ g \\ Qg + a \end{bmatrix} \]
in which:
- b is the vector of solutions for the fixed effects,
- g is the vector of solutions for the UPG effects (also fixed here),
- Qg + a is the vector of breeding values.
Note that the breeding value of the animal will be the effect of the UPG (Qg) + breeding value (a).

that is:


Vector of fixed effects (b)

beta_hat <- solutions_R[1:ncol(X)]  
b <- as.matrix(beta_hat)  
print(b)  
##            [,1]
## [1,] -0.5355291
## [2,]  1.3893071
## [3,]  2.1358002
## [4,]  2.3887304
## [5,]  0.6008479
## [6,]  1.0209986


Vector of UPG effects (g)

upg_hat <- solutions_R[(ncol(X) + 1):(ncol(X) + ncol(Q))]  
g <- as.matrix(upg_hat)  
print(g)  
##           [,1]
## [1,] -4.603997
## [2,]  2.908288
## [3,]  3.309862
## [4,] -5.005571


Vector of direct additive genetic effects - breeding value (Qg+a)

animal_hat <- solutions_R[(ncol(X) + ncol(Q) + 1):(ncol(X) + ncol(Q) + ncol(Z))]  
Qg_a <- as.matrix(animal_hat)  
print(Qg_a)  
##              [,1]
##  [1,] -4.79790276
##  [2,]  3.12605649
##  [3,] -0.62320503
##  [4,]  3.09897532
##  [5,] -1.05552267
##  [6,] -0.67781191
##  [7,]  1.02838548
##  [8,] -0.84258229
##  [9,]  1.27528846
## [10,] -0.05636038
## [11,]  2.04669910
## [12,]  0.54680764
## [13,]  1.16463449
## [14,]  1.34945546
## [15,]  0.79807885

And those were our final solutions!


y predicted

Now, let’s calculate the vector y (predicted y) from the solutions we found.
To do this, we will use the first and third rows of the mixed model equations, since the second row of the RHS is equal to zero.

From first row:
\[ X'y = X'Xb + X'Z(Qg+a) \]

rhs1 <- t(X) %*% X %*% b + t(X) %*% Z %*% Qg_a  

From third row:
\[ Z'y = Z'Xb - A^{-1}Q\alpha g + (Z'Z + A^{-1} \alpha)(Qg + a) \]

rhs3 <- t(Z) %*% X %*% b - Ainv %*% Q %*% (alpha * g) + (t(Z) %*% Z + Ainv * alpha) %*% Qg_a  

Let’s create a system with these two equations, to solve for y
\[ \left\{ \begin{aligned} X'y &= X'bX + X'Z(Qg+a) \\ Z'y &= Z'Xb - A^{-1}Q\alpha g + (Z'Z + A^{-1} \alpha)(Qg + a) \end{aligned} \right. \]

Whatever the animal model, the part that multiplies y will always be an mxn matrix. Let’s call it M.
Similarly, the whole part after the equal sign will be an nx1 vector. Let’s call it c.

So, we have that \(My=c\).


M matrix

M <- rbind(t(X), t(Z))  


c vector

c <- rbind(rhs1, rhs3)  


Solving for y…
If we try to invert the matrix M, we see that it is also not positive definite, so we need a generalized inverse: ginv().

y_hat <- ginv(M) %*% c  
  
final <- data.frame(y_predicted = y_hat, y_original = y)  
print(final)  
##    y_predicted y_original
## 1            3          3
## 2            6          6
## 3            6          6
## 4            2          2
## 5            3          3
## 6            6          6
## 7            4          4
## 8            6          6
## 9            8          8
## 10           4          4

The predicted y is identical to the original y.


This is easy for 15 animals. But if we have 1 million of them?
We need a more efficient program.


Solving via program BLUPF90+

Let’s solve this same example using the BLUPF90 family of programs
(Misztal I., Tsuruta S., Lourenco D.A.L., Aguilar I., Legarra A., and Vitezica Z. 2014. Manual for BLUPF90 family of programs.).


Data
ID006 A 1 1.0 3.0
ID009 A 2 1.0 2.0
ID012 A 1 2.0 4.0
ID007 B 2 2.0 6.0
ID010 B 1 1.0 3.0
ID013 B 2 2.0 6.0
ID008 C 1 2.0 6.0
ID011 C 2 1.0 6.0
ID014 C 1 1.0 8.0
ID015 C 2 2.0 4.0

Pedigree with UPG
ID001 0 0 -1 -4
ID002 0 0 -2 -3
ID003 0 0 -1 -3
ID004 0 0 -2 -3
ID005 0 0 -2 -4
ID006 0 0 -1 -3
ID007 ID002 ID005 ID002 ID005
ID008 ID001 ID004 ID001 ID004
ID009 ID002 ID003 ID002 ID003
ID010 ID007 ID006 ID007 ID006
ID011 ID007 ID004 ID007 ID004
ID012 ID011 ID008 ID011 ID008
ID013 ID011 ID010 ID011 ID010
ID014 ID009 ID013 ID009 ID013
ID015 ID011 ID010 ID011 ID010

Here we encode the UPGs with negative numbers instead of letters. This is a requirement of the BLUPF90+ program!

Parameter file
DATAFILE
data1.txt
TRAITS
5
FIELDS_PASSED TO OUTPUT

WEIGHT(S)

RESIDUAL_VARIANCE
2.0
EFFECT
2 cross alpha
EFFECT
3 cross alpha
EFFECT
4 cov
EFFECT
1 cross alpha
RANDOM
animal
FILE
ped2.txt
FILE_POS
1 4 5 0 0 # id, sire, dam - with upg code
UPG_TYPE
in_ped
INBREEDING
no-inbreeding # no inbreeding
(CO)VARIANCES
0.5


We run the program RENUMF90 and then the program BLUPF90+.
The solutions (solutions file) will be:

sol_blup <- data.frame(  
trait = rep(1, 25),  
effect = c(rep(1, 3), rep(2, 2), 3, rep(4, 19)),  
level = c(1,2,3, 1,2, 1, 1:19),  
solution = c(  
-0.09929111, 1.82312339, 2.56875029, 2.43588849, 0.65341111,   
1.01898764, 0.31554213, -1.15906490, 0.54803039, -1.31841690,   
-0.53693713, 0.78918582, 1.56194108, 0.06661379, 0.68188768,   
0.87340751, -1.52607734, 2.63594054, 2.60934675, -5.25998297,  
-1.10486630, -4.38182541, 3.11359067, 2.12479542, -6.15194294  
))  
print(sol_blup)  
##    trait effect level    solution
## 1      1      1     1 -0.09929111
## 2      1      1     2  1.82312339
## 3      1      1     3  2.56875029
## 4      1      2     1  2.43588849
## 5      1      2     2  0.65341111
## 6      1      3     1  1.01898764
## 7      1      4     1  0.31554213
## 8      1      4     2 -1.15906490
## 9      1      4     3  0.54803039
## 10     1      4     4 -1.31841690
## 11     1      4     5 -0.53693713
## 12     1      4     6  0.78918582
## 13     1      4     7  1.56194108
## 14     1      4     8  0.06661379
## 15     1      4     9  0.68188768
## 16     1      4    10  0.87340751
## 17     1      4    11 -1.52607734
## 18     1      4    12  2.63594054
## 19     1      4    13  2.60934675
## 20     1      4    14 -5.25998297
## 21     1      4    15 -1.10486630
## 22     1      4    16 -4.38182541
## 23     1      4    17  3.11359067
## 24     1      4    18  2.12479542
## 25     1      4    19 -6.15194294


The solutions of the BLUPF90+ program have the order:
[fixed effects], [breeding values], [UPG effects].
Let’s put them in the same order as the solutions we obtained in the R program, for comparison.
You can find the original animal ID in the renadd04.ped file and match it with the solutions file to identify the order.

sol_blup$order <- c("A", "B", "C", "S1", "S2", "cov",   
"ID015", "ID006", "ID007", "ID008", "ID010", "ID009", "ID011", "ID012",  
"ID013", "ID014", "ID005", "ID002", "ID004", "ID001", "ID003",   
"g1", "g2", "g3", "g4")  
  
R_blup_order <- c("A", "B", "C", "S1", "S2", "cov",   
 "g1", "g2", "g3", "g4",   
 "ID001", "ID002", "ID003", "ID004", "ID005", "ID006", "ID007", "ID008",  
 "ID009", "ID010", "ID011", "ID012", "ID013", "ID014", "ID015")  

Reordering

sol_blup$order <- factor(sol_blup$order, levels = R_blup_order)  
sol_blup2 <- sol_blup %>% arrange(order)  

Final solutions (BLUPF90+)

solutions_blup <- as.matrix(sol_blup2$solution)  
print(solutions_blup)  
##              [,1]
##  [1,] -0.09929111
##  [2,]  1.82312339
##  [3,]  2.56875029
##  [4,]  2.43588849
##  [5,]  0.65341111
##  [6,]  1.01898764
##  [7,] -4.38182541
##  [8,]  3.11359067
##  [9,]  2.12479542
## [10,] -6.15194294
## [11,] -5.25998297
## [12,]  2.63594054
## [13,] -1.10486630
## [14,]  2.60934675
## [15,] -1.52607734
## [16,] -1.15906490
## [17,]  0.54803039
## [18,] -1.31841690
## [19,]  0.78918582
## [20,] -0.53693713
## [21,]  1.56194108
## [22,]  0.06661379
## [23,]  0.68188768
## [24,]  0.87340751
## [25,]  0.31554213

Split the effects

# Fixed effects  
beta_hat <- solutions_blup[1:ncol(X)]; b <- as.matrix(beta_hat)  
print(beta_hat)  
## [1] -0.09929111  1.82312339  2.56875029  2.43588849  0.65341111  1.01898764
# UPG effects  
upg_hat <- solutions_blup[(ncol(X) + 1):(ncol(X) + ncol(Q))]; g <- as.matrix(upg_hat)  
print(upg_hat)  
## [1] -4.381825  3.113591  2.124795 -6.151943
# Breeding values  
animal_hat <- solutions_blup[(ncol(X) + ncol(Q) + 1):(ncol(X) + ncol(Q) + ncol(Z))]; Qg_a <- as.matrix(animal_hat)  
print(animal_hat)  
##  [1] -5.25998297  2.63594054 -1.10486630  2.60934675 -1.52607734 -1.15906490  0.54803039 -1.31841690  0.78918582 -0.53693713  1.56194108  0.06661379  0.68188768  0.87340751  0.31554213


The solutions are different from those obtained directly, since we used the generalized inverse.
In other words, we will have different solutions, depending on which inverse we use.
However, our predicted y must be the same in both predictions, since estimable functions are invariant to the choice of the generalized inverse (this is a topic for another post).

Let’s get the predicted y vector, just like we did for the R solutions.

rhs1 <- t(X) %*% X %*% b + t(X) %*% Z %*% Qg_a  
rhs3 <- t(Z) %*% X %*% b - Ainv %*% Q %*% (alpha * g) + (t(Z) %*% Z + Ainv * alpha) %*% Qg_a  

\(My=c\)

M matrix

M <- rbind(t(X), t(Z))  

c vector

c <- rbind(rhs1, rhs3)  

Solving y…

y_hat <- ginv(M) %*% c  
  
final_blup <- data.frame(y_predicted = y_hat, y_original = y)  
print(final_blup)  
##    y_predicted y_original
## 1     2.999027          3
## 2     6.021592          6
## 3     5.976333          6
## 4     1.987161          2
## 5     2.991506          3
## 6     5.996045          6
## 7     4.004215          4
## 8     5.984827          6
## 9     8.044676          8
## 10    4.001232          4

Let’s compare the results via R and via blupf90+!

final_solutions <- data.frame(  
 ID = data1$ID,  
 y_original = data1$obs,  
 y_predicted_R = final$y_predicted,  
 y_predicted_blupf90 = round(final_blup$y_predicted,3),  
 residual_blupf90 = round((final_blup$y_predicted - data1$obs),4))  
print(final_solutions)  
##       ID y_original y_predicted_R y_predicted_blupf90 residual_blupf90
## 1  ID006          3             3               2.999          -0.0010
## 2  ID007          6             6               6.022           0.0216
## 3  ID008          6             6               5.976          -0.0237
## 4  ID009          2             2               1.987          -0.0128
## 5  ID010          3             3               2.992          -0.0085
## 6  ID011          6             6               5.996          -0.0040
## 7  ID012          4             4               4.004           0.0042
## 8  ID013          6             6               5.985          -0.0152
## 9  ID014          8             8               8.045           0.0447
## 10 ID015          4             4               4.001           0.0012

Note that the predicted y obtained either directly (R) or iteratively (BLUPF90+) matches the original y.
The solutions obtained via BLUPF90+ are not exact, as they were generated by an iterative process. However, the residual is minimal.
R is limited in many aspects, while BLUPF90+ is much more efficient.