|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ANIMAL GENETICS |
Department of Animal and Dairy Science, University of Georgia, Athens 30602
| Abstract |
|---|
|
|
|---|
Key Words: genetic evaluation genetic marker molecular information
| INTRODUCTION |
|---|
|
|
|---|
With a large number EDQ fit as covariables, the model contains a large number of effects. Subsequently, the computing time can be long, especially with multiple-trait models, and convergence problems may appear. Assume that the system of equations is solved by using iterations on data with a preconditioned conjugate gradient algorithm (PCG; Strandén and Lidauer, 1999
). Decreased computing time and increased stability can be obtained by using a block-diagonal preconditioner (Strandén and Lidauer, 1999
). Two types of blocks are of interest: those attributable to EDQ and those attributable to traits. The block preconditioning transforms the corresponding blocks on the left-hand side of the system of equations to identity matrices. Subsequently, the convergence rate with the QTL effects should be close to that without those effects, and the convergence rate in the multiple-trait model should be similar to that with a single-trait model. However, both block preconditioners increase the time per round and the amount of memory required. The purpose of this technical note was to examine the impact of both types of block preconditioners on computing requirements of models with a large number of EDQ fit as covariables on simulated and commercial data sets.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Data Sets
The simulation assumed a 10-trait model with contemporary group, 160 covariables on 80 QTL, and animal genetic and residual effects. The total number of animals was 24,000 in 10 generations. All animals were assumed to have records. The values for each contemporary group effect and QTL effect were simulated from normal distributions N(0,10) and N(0,2), respectively; the first covariable for QTL i was generated as qi1
UN(0,1), and the second covariable as qi2 = 1 – qi1. The QTL effects were simulated without imposing a realistic structure, because such a structure was considered unimportant from the computational point of view; this assumption could be indirectly evaluated by the results obtained with the commercial data set. Two data sets were simulated, according to the variances shown in Table 1
. The first set had low correlations among the traits, whereas the second set had high correlations.
|
Preconditioner
Let the system of equations be
![]() |
In PCG, one solves the system
![]() |
where M is a preconditioner. It is desired that M be close to A but that it can be easily inverted. The simplest preconditioner is a diagonal preconditioner M = diag(A), which seems to converge for many models used in animal breeding although the convergence for more complicated models can be slow (Tsuruta et al., 2001
). Let A = {Aij}, where Aij is the block corresponding to the ith set of rows and jth set of columns, The block preconditioner is
![]() |
Blocks can be due to traits, resulting in dense blocks of t x t matrices, where t is the number of traits, or they can be due to sets of several effects; for example, all fixed effects with a low number of levels (Strandén et al., 2002
).
Assume a system of equations resulting from multiple-trait models with t traits. A PCG program with a diagonal preconditioner requires 5 variables per equation, including one for the preconditioner. A PCG program with t x t blocks would use 4 variables per equation plus t(t + 1)/2 variables per t equations (assuming half-storage), for a total of 4 + t(t + 1)/2 variables per equation. The increase in memory over the diagonal preconditioner is [4 + t(t + 1)/2]/5, which is 25% for a 3-trait model or 2 times for an 11-trait model. Numerical stability in the PCG requires that the 4 variables need to be in double precision; however, the preconditioner in simpler models can be in single precision, resulting in a further decrease in the memory requirements.
The preconditioner in PCG is not used directly, as above, but indirectly in a multiplication:
![]() |
where r and z are vectors as in Tsuruta et al. (2001)
. The vector z can be available indirectly by solving
![]() |
The first form requires the inversion to be done just once, but may be less accurate numerically. The second form does not require an inversion and may involve fewer computations and thus be numerically more stable; for example, allowing one to use single rather than double precision for M. If some diagonal blocks of M are large but sparse, the second form can use sparse storage and solving by either a sparse factorization (e.g., Misztal and Perez-Enciso, 1998
) or by an iterative method. However, the PCG algorithm is sensitive to numerical errors, and incomplete convergence in the second form can result in divergence.
Preconditioning requires extra computations. For finite solving or dense matrix inversion, the cost is cubic with the size of the block. Therefore, the relative cost of preconditioning is likely to be negligible for small blocks but can increase rapidly with large blocks. For example, if for a block size of m the inversion takes 20% of the time of one round of PCG, for a block size of 2m the inversion would take 66% of that time, and the time per round would increase by 2.4 times. Similarly, for a block size of m/2, the inversion would take only 6% of the time of one round of PCG, and the time per round would decrease by 15%. To make sense for block preconditioning, the improvement in the convergence rate must exceed the increase in computations per round.
Models of Analysis
Computing was by the modified BLUP90IOD program (Misztal et al., 2002
). Modifications included a block preconditioner for traits (BT) and a block preconditioner for EDQ (BQ). The BT and BQ were implemented by using the first form (inversion). The data sets were analyzed in several combinations. Changes to the models included analyzing the first trait only or all traits, and including or excluding EDQ. Preconditioners included diagonal, BQ, BT, and BT + BQ. Computing was on a 32-bit processor with a clock speed of 2.8 GHz. The convergence criterion was set at 10–12. Values recorded were the amount of memory needed, the number of iterations, and the computing time.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
Results with the commercial data sets are shown in Table 3
. In general, they are relatively similar to those with the simulated data set with low correlations. However, the numbers of rounds were several times greater and the computer times and memory requirements were less affected by EDQ. These were due to a more complicated model and to a smaller number of EDQ per trait. Consequently, the ratio of EDQ effects to non-EDQ effects was lower. The block in BQ in the commercial data set was sparse, because only fractions of EDQ were fit for each trait. Large reductions in the number of rounds plus computing time with BQ could be obtained by using the second form of the preconditioning with a sparse Cholesky decomposition, for example, by FSPAK90 (Misztal and Perez-Enciso, 1998
).
|
In the simulation, the covariables generated for EDQ had a random, nonrealistic structure. With the commercial data set, the increase in the number of rounds when EDQ were fit was very similar to that with the simulated data set in multiple-trait models, although there were differences in single-trait models where the computing time was trivial. One explanation is that, with the commercial data set, the animal effect took over variation from some EDQ, reducing EDQ to nearly random variation, similar to that in the simulated data set. Therefore, the method of simulation most likely did not have an important impact on conclusions from this study.
In conclusion, the computing resources in a genetic evaluation involving a large number of EDQ fit as covariables seemed to be reasonable with a procedure using the iteration on data and a diagonal preconditioner. Using the block preconditioner for EDQ seemed to have a limited impact on computing time. The block preconditioner for traits had a dramatic influence on computing time when correlations among traits were very high, and a smaller but noticeable influence otherwise. The increase in memory requirements with both preconditioners was moderate.
Models as discussed here are being replaced by models of "genomic selection" in which tens of thousands of SNP or haplotype effects are considered (Meuwissen et al. (2001
). In such models, the number of effects is very large, the number of equations is smaller, and the system of equations is fairly dense. Legarra and Misztal (2007) evaluated several computing methodologies useful in genomic selection and found that the PCG algorithm with the diagonal preconditioner was both efficient and stable. The conclusions from this paper may apply to genomic selection if the number of useful SNP or haplotypes is small (e.g., 100) and fitting of the polygenic effects is justified, at least for some traits.
1 Corresponding author: shogo{at}uga.edu
Received for publication June 4, 2007. Accepted for publication February 19, 2008.
| LITERATURE CITED |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
B. J. Hayes, P. J. Bowman, A. J. Chamberlain, and M. E. Goddard Invited review: Genomic selection in dairy cattle: Progress and challenges J Dairy Sci, February 1, 2009; 92(2): 433 - 443. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |