|
|
||||||||
ANIMAL GENETICS |
Danish Institute of Agricultural Sciences, Department of Genetics and Biotechnology, DK-8830, Tjele, Denmark
| Abstract |
|---|
|
|
|---|
Key Words: environmental sensitivity environmental value genotype x environment interaction genetic parameter Gibbs sampler reaction norm model
| INTRODUCTION |
|---|
|
|
|---|
Including a function of the data as a covariable in the sampling model for the data is clearly unsatisfactory. Apart from the understatement of uncertainty due to treating phenotypic means as known parameters, one can imagine situations that would result in misleading representations of environmental values using this approach. An example would be the presence of a genetic trend. Because in the reaction norm model a breeding value is defined as a function of the environmental gradient, inappropriate inferences about environmental values may result in incorrect ranking based on predicted genetic values.
It is therefore important to find more adequate methods to account for unknown covariates in a reaction norm model. An appealing alternative is to infer environmental values simultaneously with the other parameters of the model. The objectives of this study were 1) to describe a method and its Bayesian Markov Chain Monte Carlo (MCMC) implementation that makes this possible, and 2) using a simulation study, to test the expectation that the proposed method leads to more satisfactory inferences about genetic parameters than the approximate method mentioned previously.
| MODEL AND METHODS |
|---|
|
|
|---|
When genetic and nongenetic environmental sensitivities are taken into consideration, a reaction norm model can be written as
![]() | (1) |
where y is the data vector (order n), b is the vector of fixed effects (order nb), h is the vector of environmental values (order nh), u0 is the vector of intercepts (order nu), uh is the vector of slopes (order nu) of reaction norms for nongenetic random effects (e.g., permanent effects), a0 is the vector of intercepts (order ng), ah is the vector of slopes (order ng) of additive genetic reaction norms, and e is the vector of residual effects (order n). X, E, Zu, Hu, Za, and Ha are incidence matrices. When the covariate associated with the reaction norm is treated as unknown, the row k of matrices Hu and Ha has exactly one element equal to the effect of the environment (hi or a function of hi) where the observation is recorded, and the others equal to zero.
In principle h can be treated as a fixed or a random vector. Here it is treated as random to better meet identifiability requirements. In the present model identifiability is a complex topic. We limit ourselves to making the statement that the functions of the parameters that are estimated and reported later are identifiable.
The conditional distribution of y is assumed to be normal having the form
![]() |
where R is the matrix (order n) of random residual covariances. Without loss of generality, it is assumed that residuals are homoscedastic and independent of each other, so that R = I
2e, where I is the identity matrix and
2e is the residual variance.
Prior Distribution of Location Parameters
The prior distribution of vector b is assumed to be improper uniform. The random vectors h, (u0', uh')', and (a0', ah')' are assumed to have normal, mutually independent prior distributions.
The prior distributions of the variance of hi (
2h) and the residual variance (
2e) are assumed to be scaled inverse
2 distributions. The (co) variances of u0i and
and the (co) variances of a0i and
are assumed to follow inverse Wishart distributions.
Joint Posterior Distribution of All the Parameters
Let
be the vector of all location parameters except h; i.e.,
= (b', u0', uh', a0', ah')'. Then the joint posterior distribution of all the parameters is
![]() | (2) |
Fully Conditional Posterior Distribution of the Location Parameters 
The fully conditional posterior distribution of
can be directly derived from (2) by extracting terms involving
. This results in
![]() | (3) |
Further, assuming h is known, define
![]() | (4) |
Because p(y |
, h,
2e) = p(y
|
, h,
2e), the fully conditional posterior distribution of
is
![]() | (5) |
Using results in Lindley and Smith (1972)
, Gianola and Fernando (1986)
, and Sorensen and Gianola (2002)
, it is easy to show that the posterior distribution of location parameters (
), given dispersion parameters and h, is multivariate normal. Now, write the mixed model equations associated with (4) as C
= r
. Then
| U0, G0,
2e, h, y
is normally distributed with mean equal to
and variance equal to C1
, i.e.,
![]() | (6) |
Let
i denote an arbitrary element (or set of elements) of
, and let
i denote the vector
with
i excluded. From standard multivariate normal theory, it can readily be established that, if the distribution of
| U0, G0,
2e, h, y
is normal, the distribution of
i |
i, U0, G0,
2e, h, y
is
![]() | (7) |
Fully Conditional Posterior Distribution of h
From (2), the density of the fully conditional posterior distribution of h is
![]() | (8) |
Based on (1), an observation y can be described as
![]() |
Therefore, an alternative formulation of the reaction-norm model (1) is
![]() | (9) |
where E* is the coefficient matrix obtained by replacing the nonzero element in the row k of matrix E with (1 + zuk'uh + zak'ah).
Assuming
is known, define
![]() | (10) |
Because p(y |
, h,
2e) = p(yh |
, h,
2e), the fully conditional posterior distribution of h is
![]() | (11) |
Now, write the mixed model equations associated with (10) as Chĥ = rh. Then
![]() | (12) |
and for the element i, the fully conditional posterior distribution is
![]() | (13) |
Fully Conditional Posterior Distribution of Dispersion Parameters
The fully conditional posterior distribution of dispersion parameters is deduced from (2). Let H be the vector of all of the location parameters, and let W = (X: E: Zu:Hu: Za: Ha). For the residual variance one obtains
![]() | (14) |
which is recognized as a scaled inverse
2 distribution with degrees of freedom
e + n and scale parameter [(y W
) (y W
) +
es2e]/(
e + n), where
e is the degrees of freedom for the prior distribution of
2e, and n is the number of observations.
The fully conditional posterior distribution of the variance of environmental values is
![]() | (15) |
which is again recognized as a scaled inverse
2 distribution with degrees of freedom
h + nh and scale parameter (h'h +
hs2h)/(
h + nh), where
h is the degrees of freedom for the prior distribution of
2h, and nh is the order of h.
The fully conditional posterior distribution of the covariance matrix of the reaction norm of the nongenetic random effects is
![]() | (16) |
This is an inverse Wishart distribution of dimension ku = 2, with degrees of freedom
u + nu and scale matrix (S2u + V1u)1, where
Vu is the scale matrix,
u is the degrees of freedom for the prior distribution of U0, and nu is the order of u0 or uh.
The fully conditional posterior distribution of the covariance matrix of the additive genetic reaction norm is
![]() | (17) |
which is an inverse Wishart distribution of dimension kg = 2 with degrees of freedom
g + ng and scale matrix (S2g + V1g)1, where
Vg is the scale matrix,
g is the degrees of freedom for the prior distribution of G0, and ng is the order of a0 or ah.
Implementation of the Gibbs Sampler
The Gibbs sampler is a Monte Carlo method for obtaining samples from joint or marginal posterior distributions of all parameters in the model, by repeated sampling from fully conditional posterior distributions. The algorithm for the proposed model is as follows:
, C
, and r
and draw
i from N(C1
(i, i)(r
i C
(i, i)
i),C1
(i, i)).
from Inv X2(
h + nh, (h'h +
hs2h)/(
h + nh)).
u + nu).
g + ng).
2e from Inv X2[
e + n, ((y W
)'(y W
) +
es2e)/(
e + n)].
The Gibbs sampling algorithm was implemented using the DMU package (Madsen and Jensen, 2004
). This package uses the iteration on data technique to avoid storing C
and Ch.
Simulation Studies
Data Generation. The proposed method was evaluated using simulated data. Observations were generated with the model y = 1µ + Eh + Za0 + Hah + e, where h was the vector of environmental values (herd-year effects), a0 was the vector of levels, ah was the vector of slopes of additive genetic reaction norms, and e was the vector of random residuals. Vectors h, (a0', ah')' and e were assumed to be mutually independent and were sampled from
![]() |
To simplify the simulation procedure, but without loss of generality, data were generated according to a structure mimicking swine. Thus, 5 generations (5 yr) of data were simulated and distributed over 50 herds. In each generation, 50 sires were mated to 1,000 dams, and each dam produced 5 offspring with records. Sires and dams were chosen randomly. Sires were used across herds, and each sire was mated to 20 dams from 5 herds. Dams were used within herds. Consequently, there were 100 individuals from 5 sires and 20 dams in each herd each generation.
The parameters used in the simulation were:
2h = 80,
2a0 = 100,
2ah = 1, ra0,ah = 0.5, and
2e = 300. This corresponds to a G x E variance as follows: Var(ahh) =
2ah·
2h = 80; and a marginal variance of a datum (phenotypic variance across herd-years) as follows:
2P =
2h +
+
2ah
2h +
2e = 560.
Statistical Analysis.
The simulated data were analyzed using the following models:
The model with unknown covariate of the reaction norm, treating herd-years as random effects (the proposed approach),
![]() | (M1) |
the model using true herd-year effects as covariate (Ht) of the reaction norm and including herd-years as fixed effects,
![]() | (M2) |
the model using phenotypic means of herd-years as proxies for the unknown covariates (Hm) of the reaction norm and including herd-years as fixed effects,
![]() | (M3) |
Note that in models M2 and M3, the covariates of the reaction norm (Ht, Hm) are not necessarily equivalent to the corresponding elements of h, whereas in M1, the nonzero elements of H are equivalent to those of h.
The additive genetic variance (
) and heritability (h2a) in a particular herd-year were calculated as
2a =
2a0 +
2ahh2 + 2
a0ahh and
Because the covariate features in the additive genetic and phenotypic variances, for ease of comparison of heritabilities among methods (Figure 1
) the covariate was expressed in units of the appropriate standard deviation (h* = h/
h). On defining
2ah* =
2ah
2h and
a0ah* =
a0ah
h, then,
2a =
2a0 +
2ahh2 + 2
a0ahh =
2a0 +
2ah*h*2 + 2
a0ah*h*, where, using M1,
2h was the empirical variance of the estimated herd-year effect; using M2,
2h was the variance of true herd-year effects; and using M3,
2h was the variance of herd-year averages.
|
| RESULTS |
|---|
|
|
|---|
As shown in Table 1
, the proposed method (M1) yielded estimates (based on posterior means) of variance components with no detectable bias, whereas using herd-year averages as proxies for herd-year effects (M3) resulted in biased estimates. Averaged over the 20 replicates, the variance components estimated from the proposed method and from the model using true herd-year effects as covariates in the reaction norm (M2) resulted in similar inferences. These estimates agreed well with the true values. On the other hand, using herd-year averages as covariates in the reaction norm caused an overestimation of the variance component associated with level (
2a0) and an underestimation of the variance components associated with the slope (
2ah, and
a0,ah). These biases were statistically significant. After scaling herd-year averages using the ratio of the standard deviation of herd-year averages to the standard deviation of true herd-year effects, the estimates were 0.78 for
2ah and 4.28 for
a0,ah, whereas the realized values in the simulated data were equal to 1.01 and 5.11, respectively. The sampling standard deviation of the estimates of
2ah was largest using M1, lowest using M2, and intermediate using M3, whereas the standard deviation of the estimates of
2a0 was largest using M3. Mean squared errors favored M1 over M3 in all cases.
|
2ah. The effect of underestimation of
2h on the total additive genetic variance was partly compensated by the larger variation of herd-year averages (relative to the variance of true herd-year effects). Despite this, the bias was still considerable. As can be seen from Figure 1| DISCUSSION |
|---|
|
|
|---|
The approximate procedure has some shortcomings. The variance among phenotypic means of production environments includes a genetic component. This results in an overestimation of the variation of environmental values. Even in a random mating population, as is the case in the current study, the variance among phenotypic means was 35% larger than the variance of true herd-year effects. Further, the correlation between herd-year means and true herd-year effects was 0.901, obviously lower than the correlation of 0.970 obtained using the method proposed in this paper. Because in the reaction norm model a breeding value is defined as a function of the environmental gradient, improper estimates of environmental values may result in incorrect ranking based on predicted genetic values.
The approximated method also results in biased estimation of variance components. Thus, the variance component associated with the slope (
2ah) was underestimated by 42%, and that associated with level (
2a0) was overestimated by 11%. The underestimation of the variance of the slope (
2ah) and the covariance between the level and the slope (
a0,ah) was partly due to the fact that the variation of herd-year averages was larger than that of true herd-year effects. After scaling herd-year averages using the ratio of the standard deviation of herd-year averages to the standard deviation of true herd-year effects, the estimates were 0.78 for
2ah and 4.28 for
a0,ah, whereas the realized values in the simulated data were equal to 1.01 and 5.11, respectively. Thus, the bias persisted despite the fact that the scaling involved the use of the standard deviation of true herd-year effects. Underestimation of the variance of the slope understates the importance of G x E interaction. Further, inadequate inferences about production environmental effects and the resulting biased estimates of other genetic parameters reduce the accuracy of prediction of breeding values. This would be expected to have an unfavorable effect on genetic progress by selection.
The amount and sign of the bias associated with the approximate method depend on the data structure. To illustrate, an additional study was carried out with data simulated from the same sampling model as reported earlier, but with the difference that from generation 1 onward individuals were selected on the basis of their predicted additive genetic values for level. The results showed that the correlation between herd-year averages and true herd-year effects was approximately 0.80 and the variance of herd-year averages was approximately 5 times larger than the variance of the true herd-year effects. Using the herd-year average as a covariate of the reaction norm,
2a0 was overestimated by 50%, whereas
2ah was underestimated by 88%.
Many approximations and ad-hoc procedures have been reported in previous studies to account for unknown covariates in reaction norm models. In a study of production and fertility traits in dairy cattle, Kolmodin et al. (2002)
estimated herd-year values using herd-year means computed from data that had been preadjusted for fixed effects other than herd-years. In addition, herd-year values were estimated using herd-year means that were computed from data including animals with records in the appropriate herd-year, whereas dispersion parameters and breeding values were inferred from data that only included individuals whose sires were to be evaluated. The adequacy of this approximation could not be tested because it was applied using real (as opposed to simulated) data. Kolmodin et al. (2002)
made a plea in their conclusion for the development of alternative procedures that avoid using functions of the data in the sampling model for the data.
Calus et al. (2004)
proposed to estimate environmental values via an iterative procedure whereby the estimated environmental effect in a given iteration replaces the value of the covariate in the next. Using computer simulation, the authors observed a negligible reduction in bias of estimates of variance components using this approach when compared with the standard procedure of replacing covariates with phenotypic averages. They suggested replacing environmental values with estimates of herd effects obtained from a large number of animals per herd, instead of from herd-years, at the cost of losing information on G x E interaction.
The overall picture that emerges is that the conventional approximations do not always produce reliable results, and it is difficult to decide a priori how they will behave in any given data set/modeling scenario. In contrast, the method that we propose here avoids ad-hoc constructs, is theoretically coherent, easy to implement, and leads to adequate inferences. An important caveat associated with the reaction norm model with unknown covariates is that of identifiability of parameters in the likelihood. This is a technically elaborate problem that is presently under investigation and hopefully will be reported elsewhere.
| IMPLICATIONS |
|---|
|
|
|---|
| Footnotes |
|---|
2 Corresponding author: guosheng.su{at}agrsci.dk
Received for publication September 14, 2005. Accepted for publication January 31, 2006.
| LITERATURE CITED |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
E. Strandberg, S. Brotherstone, E. Wall, and M. P. Coffey Genotype by environment interaction for first-lactation female fertility traits in UK dairy cattle J Dairy Sci, July 1, 2009; 92(7): 3437 - 3446. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Su, P. Madsen, and M. S. Lund Reaction norm model with unknown environmental covariate to analyze heterosis by environment interaction J Dairy Sci, May 1, 2009; 92(5): 2204 - 2213. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Shariati, G. Su, P. Madsen, and D. Sorensen Analysis of Milk Production Traits in Early Lactation Using a Reaction Norm Model with Unknown Covariates J Dairy Sci, December 1, 2007; 90(12): 5759 - 5766. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Piles, M. Garcia-Tomas, O. Rafel, J. Ramon, N. Ibanez-Escriche, and L. Varona Individual efficiency for the use of feed resources in rabbits J Anim Sci, November 1, 2007; 85(11): 2846 - 2853. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |