|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
,2
* Department of Genetics and Biotechnology, Danish Institute of Agricultural Sciences, PO Box 50, 8830 Tjele, Denmark; and
and
Department of Natural Sciences, Royal Veterinary and Agricultural University, Thorvaldsensvej 40, 1870 Frederiksberg C, Denmark
| Abstract |
|---|
|
|
|---|
Key Words: Bayesian inference Winbugs genetics
| INTRODUCTION |
|---|
|
|
|---|
Although well-documented software often exists for the fixed analog of the additive genetic model, and in some cases also a simple mixed model, the applicability of such software in animal breeding is often limited. The reason is that it is impossible to set up the additive genetic relationship matrix for models with a pedigree structure more complex than that of a sire model. Hence, the limiting factors for applying new models suggested in the statistical literature to analyze quantitative genetic data are the time and resources that necessarily must be allocated to the development of new software.
In this paper the goal was first to develop a computationally efficient method for updating additive genetic effects, and second to show how this method could be used to analyze genetic data using Bayesian probability models with the flexible computer package Winbugs (Spiegelhalter et al., 2006
). The idea was first proposed by Mrode and Thompson (1989)
, in which they parameterized the mixed linear model in terms of the transformed additive genetic effects based on the usual partition of the additive genetic relationship matrix (Henderson, 1976
). Finally, the approach was illustrated in a simulation study, in which the data were generated according to a linear model, with i.i.d errors having Students t distributions.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The Bayesian Additive Genetic Model
Bayesian probability models are constructed by specifying the joint distribution, say p(y,
), of the observable y, which is the data, and the unobservable
, which comprises the model parameters, missing data, and so on. As soon as the data are observed, model inferences are based on the posterior distribution p(y |
). Using Bayess theorem, the posterior distribution is, up to proportionality, given by the product of the sampling distribution (likelihood) p(y |
) and the prior p(
):
![]() |
For the underlying genetic model, we assume an additive genetic infinitesimal model (Bulmer, 1980
). We will classify the parameter vector into a vector of additive genetic effects, a, and associated covariance matrix, G, and the remaining parameters,
. Moreover, we adapt the often-used assumption of conditional independence, which implies that the data are independent given the model parameters. A priori, we assume that p(
) = p(
)p(a | G)p(G). According to quantitative genetic theory, a | G is assumed to be multivariate normally distributed, with a mean vector 0 and a covariance matrix G
A, where A is the additive genetic relationship matrix. The prior specification for
and G will depend on the problem at hand and will be left unspecified until, in the last part of this paper, an example is considered. Under the above assumption, a Bayesian additive genetic model may be represented by
![]() | [1] |
Reparameterized Model
In the following model, the additive genetic effects are reparameterized according to a = (L
TD1/2)
, where L is the lower triangular Cholesky decomposition of G (G = LL'), and T and D correspond to the factorization A = TDT' (Henderson, 1976
). This implies that the vector
is standard, multivariate normally distributed. By the rules for the multidimensional change of random variables, the posterior distribution for the reparameterized model is, up to proportionality, given by
![]() | [2] |
where a has been exchanged by (L
TD1/2)
, and the prior distribution for L is given by the prior distribution for G and the transformation G = LL'. For G to be positive definite, the diagonal elements of L must be greater than zero.
The parameter vector
is given by (
,
,L). Therefore, if Markov chain Monte Carlo methods are used for the inferences, then posterior samples are obtained of
and L (Metropolis et al., 1953
; Hastings, 1970
). These are, however, easily transformed to posterior samples of a and G, and Bayesian inferences including posterior point estimates and credibility intervals are done as usual.
It is important to note that the prior distribution is not generally invariant to the reparameterization. As an example, consider the situation in which we have only a direct additive genetic effect, set G =
2a, from which it follows that
. If here we assume a priori a bounded uniform prior for L on the interval (a, b) with 0 < a < b, then the prior distribution for
2a is given by
(
2a)1/2/(b2 a2) on the interval (a2,b2), which is bounded but not uniform.
Gibbs Updating
In a Gibbs sampler, posterior samples are obtained by repeatedly updating from the full conditionals (Sorensen and Gianola, 2002
). These are easily derived, at least up to proportionality, by retaining the parameters of interest from the posterior distribution.
From the reparameterized posterior distribution (equation [2]), it follows that updating from the full conditional for
requires the computation of (L
TD1/2)
. Because the element Tij of T is the fraction of genes of animal i inherited by animal j, the average number of nonzero elements in one row of T is the average number of ancestors per animal. The T matrix is therefore relatively sparse, and it would be an inefficient approach to form T and evaluate (L
TD1/2)
explicitly. Fortunately, Quaas (1989)
presented a recursive algorithm for calculating v = Ts (where s is an arbitrary vector), working directly from a list of sires and dams and therefore with cost independent of the pedigree depth. Thus, the matrix T is never formed. Assuming that animals have been sorted from the oldest to the youngest and renumbered from 1 to q (parents precede offspring), the recursive formula is
![]() | [3] |
where the subscripts si and di are the sire and dam index of animal i. If a sire (dam) is unknown, then vsi = 0 (vdi = 0). This result easily extends to the calculation of a = (L
TD1/2)
.
Clearly, this result is useful for updating the elements of
= (
1, ...,
q), irrespective of whether it is done jointly or univariately. We will focus on the latter, because this allows us to use the standard computer package Winbugs. In Table 1
, the general idea of how to update the elements
1, ...,
q for the univariate case is sketched. A specific example of how to fit an animal model in Winbugs is given in the next section. For ease of representation, one trait with a direct additive genetic effect is considered, in which case L is a scalar and equal to
. Extension to several additive genetic effects and multivariate models is straightforward but notationally cumbersome and will not be considered here.
|
| EXAMPLE |
|---|
|
|
|---|
2,
). Here,
2 is a positive scale parameter and
is a degrees of freedom parameter. If
is
2, then the variance of the Students t process
is infinite. When
goes toward infinity, the Students t process tends toward a normal process (e.g., t(0,
2,
)
N(0,
2) for
). Compared with the normal distribution, the Students t distribution has thicker tails and is therefore less sensitive to outliers, which may have a marked impact on inferences in the normal model (Zellner, 1976Let Y = (Y1,...,Yn) denote the random vector of observations. The sampling distribution is given by
![]() | [4] |
where xi' is the ith row of the n x p design matrix X, ß is the p x 1 vector of fixed effects, zi' is the ith row of the n x q design matrix Z, and a is the q x 1 vector of additive genetic effects that, according to quantitative genetic theory, is assumed to be multivariate normally distributed with mean vector 0 and covariance matrix A
2a. The matrix A is the additive genetic relationship matrix and
2a is the additive genetic variance in the base population
is the complete parameter vector, and a priori we assume that ß1, ..., ßp, (a,
2a),
2, and
are independent.
Data Simulation
The simulation study was designed to mimic a pedigree structure from a potential study in pig breeding. More specifically, the pedigree was generated by selecting a multiplier herd from Danbreds database, which is the major pig breeding organization in Denmark, and tracing the sows with first farrowing in the period from January 1, 2002, to December 31, 2002, back 3 generations. Altogether, the pedigree included 2,284 animals and the records were simulated for the 2,084 oldest animals according to model [4]. In addition to
2 = 100 and
= 10, the model included a fixed effect with 2 levels equally assigned to the animals (ß1 = 100, ß2 = 120), and an animal effect with additive genetic variance (
2a = 30).
Data Analysis
The data were analyzed with the same model as used to generate the data. The assumptions were vague, normally distributed priors for fixed effects [N(0,1000)], gamma distributed priors for the additive genetic variance component and the scale parameter [gamma(0.001,0.001)], and a bounded uniform prior for the degrees of freedom parameter [uniform(2,100)].
By studying the instructive examples given in the manual for Winbugs (Spiegelhalter et al., 2006
), it follows that data and model specification can be done in various ways. Data for the present analysis are given in 2 files (Table 2
). The first data file included the actual data for animals without and with records, in which a zero indicated missing data or an unknown parent. Because the "if-then" statement does not exist in Win-bugs and animals without and with records are handled differently, the second data file included 2 vectors that pointed to the position of the animals without and with records in the first data file.
|
|
|
In the post-Gibbs analysis, one may take advantage of the tools available in Winbugs. Alternatively, one can use the facility for outputting Gibbs samples in a format that is compatible with the post-Gibbs analysis computer package, CODA (Best et al., 1997
).
Results
The posterior mean values and 95% highest posterior density regions given in Table 4
show that the parameters can be correctly inferred using the proposed approach in Winbugs. In all cases, the 95% highest posterior density regions assign relatively high probability mass to values of the parameters in the neighborhood of the true values.
| DISCUSSION |
|---|
|
|
|---|
The idea sketched here for updating additive genetic effects is computationally efficient and relevant not only for fitting models in Winbugs. The reason is that the cost associated with updating additive genetic effects is independent of the pedigree depth and increases linearly only with the size of the pedigree. The idea of parameterizing an animal model in terms of transformed additive genetic effects is not new (Mrode and Thompson, 1989
). In a Bayesian implementation of a genetically structured residual variance model, Sorensen and Waagepetersen (2003)
used Langevin-Hastings (Besag, 1994
) to jointly update transformed additive genetic effects. According to these authors, it is easier to obtain a well-mixing Markov chain Monte Carlo algorithm for the posterior distribution of the transformed variables than the original genetic effects. We have had a similar experience in a Bayesian implementation of the semiparametric Cox model with time varying genetic effects (Damgaard, 2006
).
The flexibility of Winbugs is best illustrated by looking at the long list of examples provided with the documentation. Among many others, these examples include models for continuous data, counting data, binary and categorical data, and survival data. Following the example in this paper, it should be straightforward to extend these models to animal models and use Winbugs to make additive genetic inferences. In addition to the possibility of fitting a wide range of standard Bayesian models, it is also possible for the user to specify his or her own distribution at each step of Bayesian model building.
As soon as data have been set up for Winbugs, it is very easy to fit alternative models. This feature enables the analyst to consider a large group of competing models and use Bayesian model selection criteria to select the best one for drawing inferences (e.g., Sorensen and Gianola, 2002
). This is clearly an advantage in that it will decrease the risk of using an incorrect or inadequate model to interpret data. The reason is that a larger group of models will tend to be considered, compared with a situation in which resources must be allocated to implement new software to fit alternative models.
Unfortunately, the use of Winbugs in quantitative genetics is limited to relatively small data sets, beginning from a few thousand records. The analysis performed in this paper took about 1 h on a Pentium computer (1.6 GHz, 512 MB RAM). Hence, it is unlikely that we will see Winbugs used for genetic evaluation. Its applicability is limited to inference in small populations and experiments designed with the number of records, for example, limited by a high recording cost, such as in many behavioral studies. In addition, Winbugs may in turn be useful for educational purposes.
| Footnotes |
|---|
2 Corresponding author: lars.damgaard{at}agrsci.dk
Received for publication August 8, 2006. Accepted for publication December 21, 2006.
| LITERATURE CITED |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. Waldmann, J. Hallander, F. Hoti, and M. J. Sillanpaa Efficient Markov Chain Monte Carlo Implementation of Bayesian Analysis of Additive and Dominance Genetic Variances in Noninbred Pedigrees Genetics, June 1, 2008; 179(2): 1101 - 1112. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |