Author: Eula Carrara (eulacarrara@gmail.com)
Created on: 22-Feb-2025
Modified on: 15-Mar-2025
Theory: lecture notes Dr. Daniela Lourenco, UGA,
2025
Numerical example taken from tutorial_blupf90.pdf
Required R packages: igraph, MASS, optiSel,
tidyverse
install.packages(c("igraph", "MASS", "optiSel", "tidyverse"))
Considering an individual a, progeny of s and d,
\[\begin{array}{c} s_i \quad \quad d_i \\ \backslash \quad \, / \\ \, a_i \end{array}\]its expected breeding value will be:
if both parents known: \(a_i = \frac{a_{s_i} + a_{d_i}}{2}\)
if only sire known: \(a_i = \frac{a_{s_i} + 0}{2}\)
if only dam known: \(a_i = \frac{0 + a_{d_i}}{2}\)
if both parents unknown: \(a_i = \frac{0 + 0}{2}\)
That is, the breeding value of an unknown sire/dam is zero. This does
not correspond to reality.
One way to fill these gaps in the pedigree is to build Unknown Parent
Groups (UPG; or phantom parents, or genetic group):
only known sire: \(a_i = \frac{a_{s_i} + \boldsymbol{\mathit{UPG_i}}}{2}\)
only known dam: \(a_i = \frac{\boldsymbol{\mathit{UPG_i}} + a_{d_i}}{2}\)
both unknown parents: \(a_i = \frac{\boldsymbol{\mathit{UPG_i}} + \boldsymbol{\mathit{UPG_i}}}{2}\)
How to create UPGs?
Using information about breed, year of birth, sex, country,
etc…
UPGs are not animals! They are most often treated as a fixed effect in
the model.
The mixed animal model can be described as:
\(y = Xb + Za + e\)
in which:
- y is the vector of phenotypes,
- b is the vector of fixed effects,
- a is the vector of direct additive genetic effects
(breeding values),
- X and Z relate b and
a to y,
- e is the vector of residuals.
\(y = Xb + \mathbf{\mathit{ZQg}} + Za + e\)
in which:
- Q is the matrix that relates animals to UPGs,
- g is the effect of UPGs.
So we need to build Q. How?
The Q matrix is constructed with the contribution
fractions of each UPG in the expected breeding values of the
individuals related to the UPG.
Pedigree:
## ID Sire Dam
## 1 a1 g2 a4
## 2 a2 g3 a1
## 3 a3 g3 a1
## 4 a4 g1 g1
## 5 a5 a3 a6
## 6 a6 g3 g3
First, we need to put all the expected breeding values as a function of the UPGs:
Defining the expected breeding values depending on the UPGQ matrix with the fractions of each UPG (cols) for each
animal (rows)
\[
\begin{bmatrix}
1/2 & 1/2 & 0 \\
1/4 & 1/4 & 1/2 \\
1/4 & 1/4 & 1/2 \\
1 & 0 & 0 \\
1/8 & 1/8 & 3/4 \\
0 & 0 & 1
\end{bmatrix}
\]
This is the Q matrix that will be incorporated into the
mixed model equations.
Data
data1 <- data.frame(
ID = c('ID006', 'ID009', 'ID012', 'ID007', 'ID010', 'ID013', 'ID008', 'ID011', 'ID014', 'ID015'),
A = c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'),
S = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
cov = c(1.0, 1.0, 2.0, 2.0, 1.0, 2.0, 2.0, 1.0, 1.0, 2.0),
obs = c(3.0, 2.0, 4.0, 6.0, 3.0, 6.0, 6.0, 6.0, 8.0, 4.0))
print(data1)
## ID A S cov obs
## 1 ID006 A 1 1 3
## 2 ID009 A 2 1 2
## 3 ID012 A 1 2 4
## 4 ID007 B 2 2 6
## 5 ID010 B 1 1 3
## 6 ID013 B 2 2 6
## 7 ID008 C 1 2 6
## 8 ID011 C 2 1 6
## 9 ID014 C 1 1 8
## 10 ID015 C 2 2 4
Pedigree without UPG
pedigree <- data.frame(
ID = c('ID001', 'ID002', 'ID003', 'ID004', 'ID005', 'ID006', 'ID007', 'ID008', 'ID009', 'ID010', 'ID011', 'ID012', 'ID013', 'ID014', 'ID015'),
Sire = c(NA, NA, NA, NA, NA, NA, 'ID002', 'ID001', 'ID002', 'ID007', 'ID007', 'ID011', 'ID011', 'ID009', 'ID011'),
Dam = c(NA, NA, NA, NA, NA, NA, 'ID005', 'ID004', 'ID003', 'ID006', 'ID004', 'ID008', 'ID010', 'ID013', 'ID010')
)
print(pedigree)
## ID Sire Dam
## 1 ID001 <NA> <NA>
## 2 ID002 <NA> <NA>
## 3 ID003 <NA> <NA>
## 4 ID004 <NA> <NA>
## 5 ID005 <NA> <NA>
## 6 ID006 <NA> <NA>
## 7 ID007 ID002 ID005
## 8 ID008 ID001 ID004
## 9 ID009 ID002 ID003
## 10 ID010 ID007 ID006
## 11 ID011 ID007 ID004
## 12 ID012 ID011 ID008
## 13 ID013 ID011 ID010
## 14 ID014 ID009 ID013
## 15 ID015 ID011 ID010
Pedigree with UPG
pedigree_upg <- data.frame(
ID = c('ID001', 'ID002', 'ID003', 'ID004', 'ID005', 'ID006', 'ID007', 'ID008', 'ID009', 'ID010', 'ID011', 'ID012', 'ID013', 'ID014', 'ID015'),
Sire = c('g1', 'g2', 'g1', 'g2', 'g2', 'g1', 'ID002', 'ID001', 'ID002', 'ID007', 'ID007', 'ID011', 'ID011', 'ID009', 'ID011'),
Dam = c('g4', 'g3', 'g3', 'g3', 'g4', 'g3', 'ID005', 'ID004', 'ID003', 'ID006', 'ID004', 'ID008', 'ID010', 'ID013', 'ID010')
)
print(pedigree_upg)
## ID Sire Dam
## 1 ID001 g1 g4
## 2 ID002 g2 g3
## 3 ID003 g1 g3
## 4 ID004 g2 g3
## 5 ID005 g2 g4
## 6 ID006 g1 g3
## 7 ID007 ID002 ID005
## 8 ID008 ID001 ID004
## 9 ID009 ID002 ID003
## 10 ID010 ID007 ID006
## 11 ID011 ID007 ID004
## 12 ID012 ID011 ID008
## 13 ID013 ID011 ID010
## 14 ID014 ID009 ID013
## 15 ID015 ID011 ID010
# Ordering pedigree without UPG by ID
pedigree <- pedigree[order(pedigree$ID), ]
# Ordering data by ID
data1 <- data1[order(data1$ID), ]
\(y = Xb + ZQg + Za + e\)
\[ \begin{bmatrix} X'X & 0 & X'Z \\ 0 & Q'A^{-1}Q\alpha & -Q'A^{-1}\alpha \\ Z'X & -A^{-1}Q\alpha & Z'Z + A^{-1}\alpha \end{bmatrix} * \begin{bmatrix} b\\ g\\ Qg + a \end{bmatrix} = \begin{bmatrix} X'y \\ 0 \\ Z'y \end{bmatrix} \]
y vector
y <- as.matrix(data1$obs)
print(y)
## [,1]
## [1,] 3
## [2,] 6
## [3,] 6
## [4,] 2
## [5,] 3
## [6,] 6
## [7,] 4
## [8,] 6
## [9,] 8
## [10,] 4
X matrix
library(dplyr)
X <- data1 %>%
mutate(A_A = ifelse(A == "A", 1, 0),
A_B = ifelse(A == "B", 1, 0),
A_C = ifelse(A == "C", 1, 0),
S_1 = ifelse(S == 1, 1, 0),
S_2 = ifelse(S == 2, 1, 0)) %>%
select(A_A, A_B, A_C, S_1, S_2, cov)
X <- as.matrix(X)
print(X)
## A_A A_B A_C S_1 S_2 cov
## 1 1 0 0 1 0 1
## 4 0 1 0 0 1 2
## 7 0 0 1 1 0 2
## 2 1 0 0 0 1 1
## 5 0 1 0 1 0 1
## 8 0 0 1 0 1 1
## 3 1 0 0 1 0 2
## 6 0 1 0 0 1 2
## 9 0 0 1 1 0 1
## 10 0 0 1 0 1 2
Z matrix
(expanding Z for all animals to sum with the inverse of numerator
relationship matrix, \(A^{-1}\))
animals <- pedigree$ID
N <- nrow(data1)
Np <- length(animals)
Z <- matrix(0, nrow = N, ncol = Np)
for (i in 1:N) {
animal_index <- match(data1$ID[i], animals)
Z[i, animal_index] <- 1
}
print(Z)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
## [1,] 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
## [2,] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## [3,] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
## [4,] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
## [5,] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
## [6,] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
## [7,] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
## [8,] 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
## [9,] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## [10,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Constructing matrices A and inverse A
(note that the inverse A is constructed using the pedigree without
UPG)
library(optiSel)
# A matrix
A <- makeA(pedigree)
# Show first 5 col/row
print(A[1:5, 1:5], digits = 4)
## ID001 ID002 ID003 ID004 ID005
## ID001 1 0 0 0 0
## ID002 0 1 0 0 0
## ID003 0 0 1 0 0
## ID004 0 0 0 1 0
## ID005 0 0 0 0 1
# Inverse matrix A
Ainv <- solve(A)
# Show first 5 col/row
print(Ainv[1:5, 1:5], digits = 4)
## ID001 ID002 ID003 ID004 ID005
## ID001 1.5 0.0 0.0 0.5 0.0
## ID002 0.0 2.0 0.5 0.0 0.5
## ID003 0.0 0.5 1.5 0.0 0.0
## ID004 0.5 0.0 0.0 2.0 0.0
## ID005 0.0 0.5 0.0 0.0 1.5
Q matrix
- Function to compute Q matrix:
n_UPG = 4
compute_expectation <- function(pedigree, N) {
expectations <- list()
base_groups <- unique(c(pedigree$Sire, pedigree$Dam))
base_groups <- base_groups[grepl("^g[0-9]+$", base_groups)]
for (group in base_groups) {
expectations[[group]] <- setNames(as.list(rep(0, N)), paste0("g", 1:N))
expectations[[group]][[group]] <- 1
}
get_expectation <- function(id) {
if (!is.null(expectations[[id]])) {
return(expectations[[id]])
}
row <- pedigree[pedigree$ID == id, ]
if (nrow(row) == 0) {
return(setNames(as.list(rep(0, N)), paste0("g", 1:N)))
}
sire_exp <- get_expectation(row$Sire)
dam_exp <- get_expectation(row$Dam)
expectations[[id]] <- setNames(lapply(1:N, function(i) {
(1/2) * sire_exp[[paste0("g", i)]] + (1/2) * dam_exp[[paste0("g", i)]]
}), paste0("g", 1:N))
return(expectations[[id]])
}
animals <- pedigree$ID[grepl("^ID[0-9]+$", pedigree$ID)]
for (animal in animals) {
expectations[[animal]] <- get_expectation(animal)
}
return(expectations)
}
Applying the function in pedigree with UPG
(format: ID, Sire, Dam, Sire/Dam with UPG codes)
expectations <- compute_expectation(pedigree_upg, n_UPG)
coefficients <- data.frame(
ID = names(expectations),
g1 = sapply(expectations, function(x) x$g1),
g2 = sapply(expectations, function(x) x$g2),
g3 = sapply(expectations, function(x) x$g3),
g4 = sapply(expectations, function(x) x$g4))
Final Q matrix:
Q <- as.matrix(coefficients[-(1:4), -1])
print(Q)
## g1 g2 g3 g4
## ID001 0.5000 0.0000 0.0000 0.5000
## ID002 0.0000 0.5000 0.5000 0.0000
## ID003 0.5000 0.0000 0.5000 0.0000
## ID004 0.0000 0.5000 0.5000 0.0000
## ID005 0.0000 0.5000 0.0000 0.5000
## ID006 0.5000 0.0000 0.5000 0.0000
## ID007 0.0000 0.5000 0.2500 0.2500
## ID008 0.2500 0.2500 0.2500 0.2500
## ID009 0.2500 0.2500 0.5000 0.0000
## ID010 0.2500 0.2500 0.3750 0.1250
## ID011 0.0000 0.5000 0.3750 0.1250
## ID012 0.1250 0.3750 0.3125 0.1875
## ID013 0.1250 0.3750 0.3750 0.1250
## ID014 0.1875 0.3125 0.4375 0.0625
## ID015 0.1250 0.3750 0.3750 0.1250
Let’s assume that the variance components are known:
Direct additive variance: \(0.5\)
vara <- 0.5
Residual variance: \(2.0\)
vare <- 2.0
Therefore, alpha is equal to:
alpha <- vare / vara
print(alpha)
## [1] 4
We have the individual elements of the mixed model equations, namely
\(X\), \(Z\), \(Q\), \(A^{-1}\), \(\alpha\), and \(y\).
Now, we need to calculate each “block”:
\[ \begin{bmatrix} X'X & 0 & X'Z \\ 0 & Q'A^{-1}Q\alpha & -Q'A^{-1}\alpha \\ Z'X & -A^{-1}Q\alpha & Z'Z + A^{-1}\alpha \end{bmatrix} * \begin{bmatrix} b \\ g \\ Qg + a \end{bmatrix} = \begin{bmatrix} X'y \\ 0 \\ Z'y \end{bmatrix} \]
\[ \begin{bmatrix} \text{block11} & \text{block12} & \text{block13} \\ \text{block21} & \text{block22} & \text{block23} \\ \text{block31} & \text{block32} & \text{block33} \end{bmatrix} * \begin{bmatrix} b \\ g \\ Qg + a \end{bmatrix} = \begin{bmatrix} \text{block1} \\ \text{block2} \\ \text{block3} \end{bmatrix} \]
Remember: \(LHS * sol = RHS\)
LHS - left-hand side of mixed model equations
block_11 <- t(X) %*% X
block_12 <- matrix(0, nrow = nrow(block_11), ncol = ncol(Q))
block_13 <- t(X) %*% Z
block_21 <- matrix(0, nrow = ncol(Q), ncol = nrow(block_11))
block_22 <- t(Q) %*% Ainv %*% Q * alpha
block_23 <- -t(Q) %*% Ainv * alpha
block_31 <- t(Z) %*% X
block_32 <- -Ainv %*% Q * alpha
block_33 <- t(Z) %*% Z + Ainv * alpha
LHS <- rbind(
cbind(block_11, block_12, block_13),
cbind(block_21, block_22, block_23),
cbind(block_31, block_32, block_33))
dimnames(LHS) <- NULL
# Show first 5 col/row
print(LHS[1:5, 1:5], digits = 4)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 3 0 0 2 1
## [2,] 0 3 0 1 2
## [3,] 0 0 4 2 2
## [4,] 2 1 2 5 0
## [5,] 1 2 2 0 5
RHS - right-hand side of mixed model equations
block_1 <- t(X) %*% y
block_2 <- matrix(0, nrow = ncol(Q), ncol = ncol(y))
block_3 <- t(Z) %*% y
RHS <- rbind(block_1, block_2, block_3)
dimnames(RHS) <- NULL
# Show first 5 col/row
print(RHS[1:5, 1], digits = 4)
## [1] 9 15 24 24 24
We need to invert the LHS, because:
\(\text{sol} = \text{LHS}^{-1} \times
\text{RHS}\)
solve(LHS)
! system is computationally singular: reciprocal condition number = 1.58711e-18
Notice that the LHS is not a positive definite
matrix (system is computationally singular), so it
does not have an inverse.
We need to calculate the generalized inverse…
We will use the ginv() function from the
MASS package, which calculates the generalized
Moore-Penrose inverse (one of the most common ginv)
library(MASS)
LHSinv <- ginv(LHS)
# Show first 5 col/row
print(LHSinv[1:5, 1:5], digits = 4)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.4778 0.4631 0.6591 0.9670 1.633
## [2,] 0.4631 0.7888 0.4141 0.6595 1.007
## [3,] 0.6591 0.4141 0.8001 0.6504 1.223
## [4,] 0.9670 0.6595 0.6504 1.0605 1.216
## [5,] 1.6330 1.0065 1.2230 1.2164 2.646
We multiply the inverse of the LHS by the RHS and we will have our vector of solutions.
Final solutions (R):
solutions_R <- LHSinv %*% RHS
print(solutions_R)
## [,1]
## [1,] -0.53552906
## [2,] 1.38930712
## [3,] 2.13580021
## [4,] 2.38873040
## [5,] 0.60084788
## [6,] 1.02099858
## [7,] -4.60399744
## [8,] 2.90828845
## [9,] 3.30986193
## [10,] -5.00557093
## [11,] -4.79790276
## [12,] 3.12605649
## [13,] -0.62320503
## [14,] 3.09897532
## [15,] -1.05552267
## [16,] -0.67781191
## [17,] 1.02838548
## [18,] -0.84258229
## [19,] 1.27528846
## [20,] -0.05636038
## [21,] 2.04669910
## [22,] 0.54680764
## [23,] 1.16463449
## [24,] 1.34945546
## [25,] 0.79807885
The solutions order is:
\[
\begin{bmatrix}
b \\
g \\
Qg + a
\end{bmatrix}
\]
in which:
- b is the vector of solutions for the fixed effects,
- g is the vector of solutions for the UPG effects (also
fixed here),
- Qg + a is the vector of breeding values.
Note that the breeding value of the animal will be the
effect of the UPG (Qg) + breeding value (a).
that is:
Vector of fixed effects (b)
beta_hat <- solutions_R[1:ncol(X)]
b <- as.matrix(beta_hat)
print(b)
## [,1]
## [1,] -0.5355291
## [2,] 1.3893071
## [3,] 2.1358002
## [4,] 2.3887304
## [5,] 0.6008479
## [6,] 1.0209986
Vector of UPG effects (g)
upg_hat <- solutions_R[(ncol(X) + 1):(ncol(X) + ncol(Q))]
g <- as.matrix(upg_hat)
print(g)
## [,1]
## [1,] -4.603997
## [2,] 2.908288
## [3,] 3.309862
## [4,] -5.005571
Vector of direct additive genetic effects - breeding value (Qg+a)
animal_hat <- solutions_R[(ncol(X) + ncol(Q) + 1):(ncol(X) + ncol(Q) + ncol(Z))]
Qg_a <- as.matrix(animal_hat)
print(Qg_a)
## [,1]
## [1,] -4.79790276
## [2,] 3.12605649
## [3,] -0.62320503
## [4,] 3.09897532
## [5,] -1.05552267
## [6,] -0.67781191
## [7,] 1.02838548
## [8,] -0.84258229
## [9,] 1.27528846
## [10,] -0.05636038
## [11,] 2.04669910
## [12,] 0.54680764
## [13,] 1.16463449
## [14,] 1.34945546
## [15,] 0.79807885
And those were our final solutions!
Now, let’s calculate the vector y (predicted y) from the
solutions we found.
To do this, we will use the first and third rows of the mixed model
equations, since the second row of the RHS is equal to zero.
From first row:
\[
X'y = X'Xb + X'Z(Qg+a)
\]
rhs1 <- t(X) %*% X %*% b + t(X) %*% Z %*% Qg_a
From third row:
\[
Z'y = Z'Xb - A^{-1}Q\alpha g + (Z'Z + A^{-1} \alpha)(Qg +
a)
\]
rhs3 <- t(Z) %*% X %*% b - Ainv %*% Q %*% (alpha * g) + (t(Z) %*% Z + Ainv * alpha) %*% Qg_a
Let’s create a system with these two equations, to solve for
y
\[
\left\{
\begin{aligned}
X'y &= X'bX + X'Z(Qg+a) \\
Z'y &= Z'Xb - A^{-1}Q\alpha g + (Z'Z + A^{-1} \alpha)(Qg
+ a)
\end{aligned}
\right.
\]
Whatever the animal model, the part that multiplies y
will always be an mxn matrix. Let’s call it
M.
Similarly, the whole part after the equal sign will be an
nx1 vector. Let’s call it c.
So, we have that \(My=c\).
M matrix
M <- rbind(t(X), t(Z))
c vector
c <- rbind(rhs1, rhs3)
Solving for y…
If we try to invert the matrix M, we see that it is also
not positive definite, so we need a generalized inverse:
ginv().
y_hat <- ginv(M) %*% c
final <- data.frame(y_predicted = y_hat, y_original = y)
print(final)
## y_predicted y_original
## 1 3 3
## 2 6 6
## 3 6 6
## 4 2 2
## 5 3 3
## 6 6 6
## 7 4 4
## 8 6 6
## 9 8 8
## 10 4 4
The predicted y is identical to the
original y.
This is easy for 15 animals. But if we have 1 million of them?
We need a more efficient program.
BLUPF90+Let’s solve this same example using the BLUPF90 family of
programs
(Misztal I., Tsuruta S., Lourenco D.A.L., Aguilar I., Legarra A.,
and Vitezica Z. 2014. Manual for BLUPF90 family of programs.).
Data
ID006 A 1 1.0 3.0
ID009 A 2 1.0 2.0
ID012 A 1 2.0 4.0
ID007 B 2 2.0 6.0
ID010 B 1 1.0 3.0
ID013 B 2 2.0 6.0
ID008 C 1 2.0 6.0
ID011 C 2 1.0 6.0
ID014 C 1 1.0 8.0
ID015 C 2 2.0 4.0
Pedigree with UPG
ID001 0 0 -1 -4
ID002 0 0 -2 -3
ID003 0 0 -1 -3
ID004 0 0 -2 -3
ID005 0 0 -2 -4
ID006 0 0 -1 -3
ID007 ID002 ID005 ID002 ID005
ID008 ID001 ID004 ID001 ID004
ID009 ID002 ID003 ID002 ID003
ID010 ID007 ID006 ID007 ID006
ID011 ID007 ID004 ID007 ID004
ID012 ID011 ID008 ID011 ID008
ID013 ID011 ID010 ID011 ID010
ID014 ID009 ID013 ID009 ID013
ID015 ID011 ID010 ID011 ID010
Here we encode the UPGs with negative numbers instead of letters. This is a requirement of the BLUPF90+ program!
Parameter file
DATAFILE
data1.txt
TRAITS
5
FIELDS_PASSED TO OUTPUT
WEIGHT(S)
RESIDUAL_VARIANCE
2.0
EFFECT
2 cross alpha
EFFECT
3 cross alpha
EFFECT
4 cov
EFFECT
1 cross alpha
RANDOM
animal
FILE
ped2.txt
FILE_POS
1 4 5 0 0 # id, sire, dam - with upg code
UPG_TYPE
in_ped
INBREEDING
no-inbreeding # no inbreeding
(CO)VARIANCES
0.5
We run the program RENUMF90
and then the program BLUPF90+.
The solutions (solutions file) will be:
sol_blup <- data.frame(
trait = rep(1, 25),
effect = c(rep(1, 3), rep(2, 2), 3, rep(4, 19)),
level = c(1,2,3, 1,2, 1, 1:19),
solution = c(
-0.09929111, 1.82312339, 2.56875029, 2.43588849, 0.65341111,
1.01898764, 0.31554213, -1.15906490, 0.54803039, -1.31841690,
-0.53693713, 0.78918582, 1.56194108, 0.06661379, 0.68188768,
0.87340751, -1.52607734, 2.63594054, 2.60934675, -5.25998297,
-1.10486630, -4.38182541, 3.11359067, 2.12479542, -6.15194294
))
print(sol_blup)
## trait effect level solution
## 1 1 1 1 -0.09929111
## 2 1 1 2 1.82312339
## 3 1 1 3 2.56875029
## 4 1 2 1 2.43588849
## 5 1 2 2 0.65341111
## 6 1 3 1 1.01898764
## 7 1 4 1 0.31554213
## 8 1 4 2 -1.15906490
## 9 1 4 3 0.54803039
## 10 1 4 4 -1.31841690
## 11 1 4 5 -0.53693713
## 12 1 4 6 0.78918582
## 13 1 4 7 1.56194108
## 14 1 4 8 0.06661379
## 15 1 4 9 0.68188768
## 16 1 4 10 0.87340751
## 17 1 4 11 -1.52607734
## 18 1 4 12 2.63594054
## 19 1 4 13 2.60934675
## 20 1 4 14 -5.25998297
## 21 1 4 15 -1.10486630
## 22 1 4 16 -4.38182541
## 23 1 4 17 3.11359067
## 24 1 4 18 2.12479542
## 25 1 4 19 -6.15194294
The solutions of the BLUPF90+ program have the
order:
[fixed effects], [breeding values],
[UPG effects].
Let’s put them in the same order as the solutions we obtained in the R
program, for comparison.
You can find the original animal ID in the renadd04.ped
file and match it with the solutions file to identify the
order.
sol_blup$order <- c("A", "B", "C", "S1", "S2", "cov",
"ID015", "ID006", "ID007", "ID008", "ID010", "ID009", "ID011", "ID012",
"ID013", "ID014", "ID005", "ID002", "ID004", "ID001", "ID003",
"g1", "g2", "g3", "g4")
R_blup_order <- c("A", "B", "C", "S1", "S2", "cov",
"g1", "g2", "g3", "g4",
"ID001", "ID002", "ID003", "ID004", "ID005", "ID006", "ID007", "ID008",
"ID009", "ID010", "ID011", "ID012", "ID013", "ID014", "ID015")
Reordering
sol_blup$order <- factor(sol_blup$order, levels = R_blup_order)
sol_blup2 <- sol_blup %>% arrange(order)
Final solutions (BLUPF90+)
solutions_blup <- as.matrix(sol_blup2$solution)
print(solutions_blup)
## [,1]
## [1,] -0.09929111
## [2,] 1.82312339
## [3,] 2.56875029
## [4,] 2.43588849
## [5,] 0.65341111
## [6,] 1.01898764
## [7,] -4.38182541
## [8,] 3.11359067
## [9,] 2.12479542
## [10,] -6.15194294
## [11,] -5.25998297
## [12,] 2.63594054
## [13,] -1.10486630
## [14,] 2.60934675
## [15,] -1.52607734
## [16,] -1.15906490
## [17,] 0.54803039
## [18,] -1.31841690
## [19,] 0.78918582
## [20,] -0.53693713
## [21,] 1.56194108
## [22,] 0.06661379
## [23,] 0.68188768
## [24,] 0.87340751
## [25,] 0.31554213
Split the effects
# Fixed effects
beta_hat <- solutions_blup[1:ncol(X)]; b <- as.matrix(beta_hat)
print(beta_hat)
## [1] -0.09929111 1.82312339 2.56875029 2.43588849 0.65341111 1.01898764
# UPG effects
upg_hat <- solutions_blup[(ncol(X) + 1):(ncol(X) + ncol(Q))]; g <- as.matrix(upg_hat)
print(upg_hat)
## [1] -4.381825 3.113591 2.124795 -6.151943
# Breeding values
animal_hat <- solutions_blup[(ncol(X) + ncol(Q) + 1):(ncol(X) + ncol(Q) + ncol(Z))]; Qg_a <- as.matrix(animal_hat)
print(animal_hat)
## [1] -5.25998297 2.63594054 -1.10486630 2.60934675 -1.52607734 -1.15906490 0.54803039 -1.31841690 0.78918582 -0.53693713 1.56194108 0.06661379 0.68188768 0.87340751 0.31554213
The solutions are different from those obtained
directly, since we used the generalized inverse.
In other words, we will have different solutions, depending on which
inverse we use.
However, our predicted y must be the same in both
predictions, since estimable functions are invariant to the choice of
the generalized inverse (this is a topic for another post).
Let’s get the predicted y vector, just like we did for
the R solutions.
rhs1 <- t(X) %*% X %*% b + t(X) %*% Z %*% Qg_a
rhs3 <- t(Z) %*% X %*% b - Ainv %*% Q %*% (alpha * g) + (t(Z) %*% Z + Ainv * alpha) %*% Qg_a
\(My=c\)
M matrix
M <- rbind(t(X), t(Z))
c vector
c <- rbind(rhs1, rhs3)
Solving y…
y_hat <- ginv(M) %*% c
final_blup <- data.frame(y_predicted = y_hat, y_original = y)
print(final_blup)
## y_predicted y_original
## 1 2.999027 3
## 2 6.021592 6
## 3 5.976333 6
## 4 1.987161 2
## 5 2.991506 3
## 6 5.996045 6
## 7 4.004215 4
## 8 5.984827 6
## 9 8.044676 8
## 10 4.001232 4
final_solutions <- data.frame(
ID = data1$ID,
y_original = data1$obs,
y_predicted_R = final$y_predicted,
y_predicted_blupf90 = round(final_blup$y_predicted,3),
residual_blupf90 = round((final_blup$y_predicted - data1$obs),4))
print(final_solutions)
## ID y_original y_predicted_R y_predicted_blupf90 residual_blupf90
## 1 ID006 3 3 2.999 -0.0010
## 2 ID007 6 6 6.022 0.0216
## 3 ID008 6 6 5.976 -0.0237
## 4 ID009 2 2 1.987 -0.0128
## 5 ID010 3 3 2.992 -0.0085
## 6 ID011 6 6 5.996 -0.0040
## 7 ID012 4 4 4.004 0.0042
## 8 ID013 6 6 5.985 -0.0152
## 9 ID014 8 8 8.045 0.0447
## 10 ID015 4 4 4.001 0.0012
Note that the predicted y obtained either directly
(R) or iteratively (BLUPF90+) matches the
original y.
The solutions obtained via BLUPF90+ are not exact, as they
were generated by an iterative process. However, the residual is
minimal.
R is limited in many aspects, while BLUPF90+ is much more efficient.