J. Anim Sci.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J. Anim Sci. 2008. 86:1514-1518. doi:10.2527/jas.2007-0324
© 2008 American Society of Animal Science

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
jas.2007-0324v1
86/7/1514    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Tsuruta, S.
Right arrow Articles by Misztal, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tsuruta, S.
Right arrow Articles by Misztal, I.

ANIMAL GENETICS

Technical note: Computing options for genetic evaluation with a large number of genetic markers

S. Tsuruta1 and I. Misztal

Department of Animal and Dairy Science, University of Georgia, Athens 30602


    Abstract
 Top
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 LITERATURE CITED
 
Two simulated data sets and one commercial data set were used to evaluate computing options for models in which the effects attributable to QTL were fit as covariables. The simulated data sets included records on 24,000 animals for 10 traits. Data sets 1 and 2 were simulated with low and high correlations among traits, respectively. The model included an overall mean, 160 covariables as effects attributable to QTL, the random animal genetic effect, and the random residual effect. A commercial data set included records on approximately 110,000 animals for 11 growth, reproduction, and other traits. The model included the effects usually fitted for these traits as well as 116 covariables as effects attributable to QTL; models including the number of covariables varied by trait. Initial computing was by the BLUP90IOD program, which applies iterations on data by using a preconditioned conjugate gradient algorithm with a diagonal preconditioner. Modifications included adding block preconditioners for effects attributable to QTL (BQ) and for traits (BT). With the simulated data sets and the original program, one-trait analyses without the covariables took 7 s, whereas the 10-trait analyses with the covariables took 15 min for a data set with low correlations and 1 h 40 m for a data set with high correlations. The BQ improved the convergence rate but increased the computing time. The BT decreased the computing time from 1.5 times (low correlations) to 7 times (high correlation) at a cost of greater memory requirements. For the commercial data and the complete model, computing took 10.3 h with the unmodified program and was reduced to 6 h with BT. Relative changes in computing time and convergence rate with the commercial data set were close to those of the simulated data set, with low correlations among the traits. The BQ decreased the number of rounds by less than expected. Genetic evaluation with a large number of effects attributable to QTL fit as covariables is feasible.

Key Words: genetic evaluation • genetic marker • molecular information


    INTRODUCTION
 Top
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 LITERATURE CITED
 
The large number of genetic markers available for several traits and species creates an opportunity to use those markers in genetic evaluation. This can be accomplished by using 2 basic methods. In the first method, effects attributable to QTL (EDQ) are accounted for through fixed regressions, for example, as in Haley and Knott (1992)Go, and Kerr and Kinghorn (1996)Go in half-sib, full-sib, or crossbred populations (e.g., F2, BC). In the second method, one constructs an identical-by-decent matrix (Fernando and Grossman, 1989Go). In both methods, one or more covariables are fitted per QTL; in the second method, a relationship matrix is used. The first method is simpler and computationally less expensive, but only the second one is useful in general pedigrees (Weller, 2001Go; Dekkers, 2004Go).

With a large number EDQ fit as covariables, the model contains a large number of effects. Subsequently, the computing time can be long, especially with multiple-trait models, and convergence problems may appear. Assume that the system of equations is solved by using iterations on data with a preconditioned conjugate gradient algorithm (PCG; Strandén and Lidauer, 1999Go). Decreased computing time and increased stability can be obtained by using a block-diagonal preconditioner (Strandén and Lidauer, 1999Go). Two types of blocks are of interest: those attributable to EDQ and those attributable to traits. The block preconditioning transforms the corresponding blocks on the left-hand side of the system of equations to identity matrices. Subsequently, the convergence rate with the QTL effects should be close to that without those effects, and the convergence rate in the multiple-trait model should be similar to that with a single-trait model. However, both block preconditioners increase the time per round and the amount of memory required. The purpose of this technical note was to examine the impact of both types of block preconditioners on computing requirements of models with a large number of EDQ fit as covariables on simulated and commercial data sets.


    MATERIALS AND METHODS
 Top
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 LITERATURE CITED
 
Animal Care and Use Committee approval was not obtained for this study because the data were from simulated data sets or from an existing database in a commercial company.

Data Sets

The simulation assumed a 10-trait model with contemporary group, 160 covariables on 80 QTL, and animal genetic and residual effects. The total number of animals was 24,000 in 10 generations. All animals were assumed to have records. The values for each contemporary group effect and QTL effect were simulated from normal distributions N(0,10) and N(0,2), respectively; the first covariable for QTL i was generated as qi1 ~UN(0,1), and the second covariable as qi2 = 1 – qi1. The QTL effects were simulated without imposing a realistic structure, because such a structure was considered unimportant from the computational point of view; this assumption could be indirectly evaluated by the results obtained with the commercial data set. Two data sets were simulated, according to the variances shown in Table 1Go. The first set had low correlations among the traits, whereas the second set had high correlations.


View this table:
[in this window]
[in a new window]

 
Table 1. (Co)variance structures of the 2 simulated data sets
 
The commercial data set included records on approximately 110,000 animals for 11 growth, reproduction, and other traits. The model included the effects usually fit for these traits, for a total of 57, as well as 116 covariables attributable to EDQ; on average, only 38 covariables were fit for one trait, with the minimum of 32 and the maximum of 44. The EDQ were fitted as individual random covariates for additive and dominance effects, with priors based on estimated marker effects and their standard errors, from analyses by Kerr and Kinghorn (1996)Go. Many covariables were used for multiple traits. The correlations among the traits varied from –0.3 to 0.8.

Preconditioner

Let the system of equations be


Formula

In PCG, one solves the system


Formula

where M is a preconditioner. It is desired that M be close to A but that it can be easily inverted. The simplest preconditioner is a diagonal preconditioner M = diag(A), which seems to converge for many models used in animal breeding although the convergence for more complicated models can be slow (Tsuruta et al., 2001Go). Let A = {Aij}, where Aij is the block corresponding to the ith set of rows and jth set of columns, The block preconditioner is


Formula

Blocks can be due to traits, resulting in dense blocks of t x t matrices, where t is the number of traits, or they can be due to sets of several effects; for example, all fixed effects with a low number of levels (Strandén et al., 2002Go).

Assume a system of equations resulting from multiple-trait models with t traits. A PCG program with a diagonal preconditioner requires 5 variables per equation, including one for the preconditioner. A PCG program with t x t blocks would use 4 variables per equation plus t(t + 1)/2 variables per t equations (assuming half-storage), for a total of 4 + t(t + 1)/2 variables per equation. The increase in memory over the diagonal preconditioner is [4 + t(t + 1)/2]/5, which is 25% for a 3-trait model or 2 times for an 11-trait model. Numerical stability in the PCG requires that the 4 variables need to be in double precision; however, the preconditioner in simpler models can be in single precision, resulting in a further decrease in the memory requirements.

The preconditioner in PCG is not used directly, as above, but indirectly in a multiplication:


Formula

where r and z are vectors as in Tsuruta et al. (2001)Go. The vector z can be available indirectly by solving


Formula

The first form requires the inversion to be done just once, but may be less accurate numerically. The second form does not require an inversion and may involve fewer computations and thus be numerically more stable; for example, allowing one to use single rather than double precision for M. If some diagonal blocks of M are large but sparse, the second form can use sparse storage and solving by either a sparse factorization (e.g., Misztal and Perez-Enciso, 1998Go) or by an iterative method. However, the PCG algorithm is sensitive to numerical errors, and incomplete convergence in the second form can result in divergence.

Preconditioning requires extra computations. For finite solving or dense matrix inversion, the cost is cubic with the size of the block. Therefore, the relative cost of preconditioning is likely to be negligible for small blocks but can increase rapidly with large blocks. For example, if for a block size of m the inversion takes 20% of the time of one round of PCG, for a block size of 2m the inversion would take 66% of that time, and the time per round would increase by 2.4 times. Similarly, for a block size of m/2, the inversion would take only 6% of the time of one round of PCG, and the time per round would decrease by 15%. To make sense for block preconditioning, the improvement in the convergence rate must exceed the increase in computations per round.

Models of Analysis

Computing was by the modified BLUP90IOD program (Misztal et al., 2002Go). Modifications included a block preconditioner for traits (BT) and a block preconditioner for EDQ (BQ). The BT and BQ were implemented by using the first form (inversion). The data sets were analyzed in several combinations. Changes to the models included analyzing the first trait only or all traits, and including or excluding EDQ. Preconditioners included diagonal, BQ, BT, and BT + BQ. Computing was on a 32-bit processor with a clock speed of 2.8 GHz. The convergence criterion was set at 10–12. Values recorded were the amount of memory needed, the number of iterations, and the computing time.


    RESULTS AND DISCUSSION
 Top
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 LITERATURE CITED
 
Numbers of iterations, computing times, and memory requirements for a simulated data set with low and high correlations among the traits are presented in Table 2Go. Compared with a single-trait model and no EDQ in the model, the increase to 10 traits resulted in approximately doubling the number of rounds of the data set with low correlations and increasing the number of rounds by more than 25 times for the data set with high correlations. The computing time per round was approximately 6 times greater for either data set.


View this table:
[in this window]
[in a new window]

 
Table 2. Number of iterations (upper number), processing time (middle number), and memory use (lower number) when using simulated data with effects attributable to QTL for 1 and 10 traits
 
Fitting EDQ in the 10-trait model resulted in approximately doubling the number of rounds and increasing the computing time by up to 25 times. Applying BQ alone resulted in a decrease in the number of rounds by approximately 3 times; however, the computing times were greater than without BQ because of a greater cost per round. That greater cost and the more than doubled memory requirement were due to a large block size of 1,600. Applying BT alone resulted in a decrease in the number of rounds by approximately 30% for the data set with low correlations and by approximately 7 times for the data set with high correlations; however, the memory requirements almost doubled. Applying both BT and BQ resulted in the timing and the number of rounds being approximately the same as if the effect of these 2 modifications were compounded.

Results with the commercial data sets are shown in Table 3Go. In general, they are relatively similar to those with the simulated data set with low correlations. However, the numbers of rounds were several times greater and the computer times and memory requirements were less affected by EDQ. These were due to a more complicated model and to a smaller number of EDQ per trait. Consequently, the ratio of EDQ effects to non-EDQ effects was lower. The block in BQ in the commercial data set was sparse, because only fractions of EDQ were fit for each trait. Large reductions in the number of rounds plus computing time with BQ could be obtained by using the second form of the preconditioning with a sparse Cholesky decomposition, for example, by FSPAK90 (Misztal and Perez-Enciso, 1998Go).


View this table:
[in this window]
[in a new window]

 
Table 3. Number of iterations (upper umber), processing time (middle number), and memory use (lower number) when using real data with effects attributable to QTL for 11 traits
 
In general, the convergence rate with BT should be similar, but not better than, for a single-trait model, and the convergence rate with BQ should be no better than with the model without EDQ. In several similar cases, it seemed not to be so; for example, the number of rounds with EDQ and BQ was lower than for the model with no EDQ. These anomalies could partly be due to specific confounding of EDQ for that specific trait with the genetic effect and could partly be due to properties of the convergence criterion, which in PCG exhibits large fluctuations from round to round. Strandén and Lidauer (1999)Go used quasi-true EBV to assess properties of convergence criteria in PCG. Weller et al. (2003)Go found that the estimated effect of a major gene modeled as a covariable decreased close to 0 when the additive effect was in the model. In fact, the true additive polygenic effect is the sum of all additive effects of all genes, and if polygenic effects have high accuracy, they could take over variations of single-gene effects.

In the simulation, the covariables generated for EDQ had a random, nonrealistic structure. With the commercial data set, the increase in the number of rounds when EDQ were fit was very similar to that with the simulated data set in multiple-trait models, although there were differences in single-trait models where the computing time was trivial. One explanation is that, with the commercial data set, the animal effect took over variation from some EDQ, reducing EDQ to nearly random variation, similar to that in the simulated data set. Therefore, the method of simulation most likely did not have an important impact on conclusions from this study.

In conclusion, the computing resources in a genetic evaluation involving a large number of EDQ fit as covariables seemed to be reasonable with a procedure using the iteration on data and a diagonal preconditioner. Using the block preconditioner for EDQ seemed to have a limited impact on computing time. The block preconditioner for traits had a dramatic influence on computing time when correlations among traits were very high, and a smaller but noticeable influence otherwise. The increase in memory requirements with both preconditioners was moderate.

Models as discussed here are being replaced by models of "genomic selection" in which tens of thousands of SNP or haplotype effects are considered (Meuwissen et al. (2001Go). In such models, the number of effects is very large, the number of equations is smaller, and the system of equations is fairly dense. Legarra and Misztal (2007) evaluated several computing methodologies useful in genomic selection and found that the PCG algorithm with the diagonal preconditioner was both efficient and stable. The conclusions from this paper may apply to genomic selection if the number of useful SNP or haplotypes is small (e.g., 100) and fitting of the polygenic effects is justified, at least for some traits.

1 Corresponding author: shogo{at}uga.edu

Received for publication June 4, 2007. Accepted for publication February 19, 2008.


    LITERATURE CITED
 Top
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 LITERATURE CITED
 


Dekkers, J. C. M. 2004. Commercial application of marker- and gene-assisted selection in livestock: Strategies and lessons. J. Anim. Sci. 82(E. Suppl.):E313–E328.[Abstract/Free Full Text]

Fernando, R. L., and N. Grossman. 1989. Marker-assisted selection using best linear unbiased prediction. Genet. Sel. Evol. 21:467–477.[CrossRef]

Haley, C. S., and S. A. Knott. 1992. A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315–324.[Medline]

Kerr, R. J., and B. P. Kinghorn. 1996. An efficient algorithm for segregation analysis in large populations. J. Anim. Breed. Genet. 113:457–469.

Legarra, A., and I. Misztal. 2008. Computing strategies in genome-wide selection. J. Dairy Sci. 91:360–366.[Abstract/Free Full Text]

Meuwissen, T. H. E., B. J. Hayes, and M. E. Goddard. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829.[Abstract/Free Full Text]

Misztal, I., and M. Perez-Enciso. 1998. FSPAK90—A Fortran 90 interface to sparse-matrix package FSPAK with dynamic memory allocation and sparse matrix structure. Pages 467–468 in Proc. 6th World Congr. Genet. Appl. Livest. Prod., Armidale, Australia. Vol. 27.

Misztal, I., S. Tsuruta, T. Strabel, B. Auvray, T. Druet and D. H. Lee. 2002. BLUPF90 and related programs (BGF90). In Proc. 7th World Congr. Genet. Appl. Livest. Prod., Montpellier, France. CD-ROM Commun. 28:07.

Strandén, I., and M. Lidauer. 1999. Solving large mixed linear models using preconditioned conjugate gradient iteration. J. Dairy Sci. 82:2779–2787.[Abstract]

Strandén, I., S. Tsuruta, and I. Misztal. 2002. Simple preconditioners for the conjugate gradient method: Experience with test day models. J. Anim. Breed. Genet. 119:166–174.[CrossRef]

Tsuruta, S., I. Misztal, and I. Stranden. 2001. Use of the preconditioned conjugate gradient algorithm as a generic solver for mixed-model equations in animal breeding applications. J. Anim. Sci. 79:1166–1172.[Abstract/Free Full Text]

Weller, J. I. 2001. Quantitative Trait Loci Analysis in Animals. CABI, New York, NY.

Weller, J. I., M. Golik, E. Seroussi, E. Ezra, and M. Ron. 2003. Population-wide analysis of a QTL affecting milk-fat production in the Israeli Holstein population. J. Dairy Sci. 86:2219–2227.[Abstract/Free Full Text]


This article has been cited by other articles:


Home page
J DAIRY SCIHome page
B. J. Hayes, P. J. Bowman, A. J. Chamberlain, and M. E. Goddard
Invited review: Genomic selection in dairy cattle: Progress and challenges
J Dairy Sci, February 1, 2009; 92(2): 433 - 443.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
jas.2007-0324v1
86/7/1514    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Tsuruta, S.
Right arrow Articles by Misztal, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tsuruta, S.
Right arrow Articles by Misztal, I.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS