|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ANIMAL GENETICS |
,
,2


* Animal Breeding and Genomics Centre, Animal Sciences Group, PO Box 65, 8200 AB, Lelystad, the Netherlands;
and
Grange Beef Research Centre, Teagasc, Dunsany, Co. Meath, Ireland;
and
School of Agriculture, Food and Veterinary Medicine, College of Life Sciences, University College Dublin, Belfield, Dublin 4, Ireland; and
Irish Cattle Breeding Federation, Shinagh House, Bandon, Co. Cork, Ireland
| Abstract |
|---|
|
|
|---|
Key Words: accuracy bias data quality genetic groups multiple breed genetic evaluations
| INTRODUCTION |
|---|
|
|
|---|
To account for heterogeneous means in founders, genetic groups in the relationship matrix are used in BLUP models (Westell et al., 1988
). Such models are common in within-breed (Phocas and Laloe, 2004
), across-breed (Sullivan et al., 1999
), and international (Schaeffer, 1994
) evaluations. In these circumstances the accuracy of an EBV across genetic group depends on the accuracy of the estimated genetic group effects and the EBV within genetic group, and any sampling covariances among these effects (Van Vleck et al., 1992
). Bias in estimated genetic group effects or EBV within genetic group may bias EBV across genetic group and may be due to sampling covariances among these.
Ideally the exact method would be used to calculate the accuracy of EBV within or across genetic group. Because this is infeasible in most national data sets, several methods to approximate accuracy within genetic group have been developed. These give biased approximations for certain data structures (e.g., Tier and Meyer, 2004
). Unbiased approximations of accuracy within genetic group can be calculated using sampling (Garcia-Cortes et al., 1995
; Fouilloux and Laloe, 2001
). No method exists to approximate accuracy across genetic group and methods exist only to calculate retrospective bias (Reverter et al., 1994
).
The aim of this work was to extend the sampling method to evaluate genetic group effects and EBV within and across genetic groups, in terms of accuracy and bias, and to use this method to evaluate the Irish multiple-breed beef cattle data set.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Accuracy of an EBV is calculated as the correlation between the EBV and its corresponding true breeding value. Bias is calculated as the difference between the EBV and its corresponding true breeding value. For animal i,
![]() |
where r wgi is the accuracy of the EBV within genetic group (ûwgi) and u wgi is the true breeding value within genetic group. Using a sampling process, the variances and covariances required for r wgi can be estimated from the empirical distribution of ûwgi and u wgi (Fouilloux and Laloe, 2001
), and bias can be detected using the mean error across all samples.
When genetic groups are used in the relationship matrix, u agi = bi + u wgi and ûagi =
i + û wgi, where u agi is the true breeding value across genetic group and bi is the true genetic group effect for animal i. The sampling method can be extended to account for this and for accuracy and bias of
i and ûagi, and comparisons among different
i and ûagi can be calculated.
Methodology
Simulation of True Genetic Values. In a genetic evaluation model with genetic groups, all animals in the pedigree trace to a founder genetic group. A matrix, B, of dimension g x n, of founder mean values for each genetic group g represented in the pedigree values for each of the n traits, is simulated. For each trait, the founder mean values for each genetic group are simulated from a normal distribution with a mean of zero and a variance equal to the breed variance for that trait in the population under analysis.
For each animal i in the pedigree, a vector uagi = bi + uwgi of true breeding values for each of n traits is simulated, dependent upon the status of is parents j and k. If both j and k are unknown, then each element in bi is given the average of its founder genetic groups and each element of uwgi is simulated as LGz. LG is obtained by Cholesky decomposition of VG, the genetic covariance matrix of the traits, and z is a multivariate random-sampled vector with a mean of zero and a covariance matrix I.
If one parent, say j, is known, then bi is given the average of the genetic group value of the known parent, bj, and the founder genetic group value for the unknown parent, Bg, and
. If both parents j and k are known, then bi is taken to be the average of bj and bk, and
. This results in a matrix of true breeding values with a distribution N(Qm, A
VG), where Q is an incidence matrix relating animals to m, the means of the founder genetic groups of which they are composed, and A is the relationship matrix between all animals in the pedigree.
Simulation of Phenotypic Values.
A vector yi of phenotypic values for each trait is generated for each animal i as yi = uagi + ei, where ei = LEz is a vector of random residual values for each trait, and LE is obtained by Cholesky decomposition of VE, the residual covariance matrix for the traits. Values of fixed effects do not affect the distribution of random variables (Garcia-Cortes et al., 1995
) and are simulated with values of zero.
Estimation of Genetic Group Values and Breeding Values. Mixed model equations are set up using genetic groups in the relationship matrix and appropriate fixed effects and solved to obtain estimated genetic group values and EBV across genetic group. The EBV within genetic group are obtained by subtracting estimated genetic group values, weighted by the appropriate breed composition, from the EBV across genetic group.
Sampling Process and Calculation of True Accuracies.
The whole process is repeated several times, and the accuracy of the genetic group effects, the accuracy of the EBV within genetic group, the accuracy of the EBV across genetic group, and the accuracies of comparisons among the different genetic groups are calculated as the correlations between the true and estimated values across all of the replicates. Bias in these estimates is calculated as the average of the differences in the true and estimated values across all of the replicates. As the number of replications increases, estimates of accuracy and bias converge to their true values. To determine the number of replicates required to negate the effects of sampling error, confidence intervals for different values of true accuracy for different numbers of replicates were calculated using Fisher z transformations (Figure 1
). As the true accuracy was reduced, the confidence interval increased. As a compromise between computing time and sampling error, we deemed 350 replicates to be sufficient, because the confidence interval varied between 0.002 and 0.104.
|
This method was applied to the Irish multiple-breed beef cattle data set used for the January 2007 routine national genetic evaluation, which evaluated the direct genetic effects for 15 traits using data on purebred and crossbred animals of 35 breeds, of which 8 breeds dominated. Maternal effects are currently not included in the Irish beef genetic evaluation due to data limitations. Of the 15 traits, 8 (carcass weight, carcass conformation, carcass fatness, cull cow BW, weaning weight, live BW, feed intake, and calf quality) were breeding goal traits, and 7 were correlated linear type traits included as predictors. Most of the 493,092 animals with records on at least one trait had information only on subsets of traits (Table 1
), with different breeds tending to have records on particular subsets of traits. Most of the information for some breeds and traits came from crossbreds; for others, it came from pure-breds. For example, data on carcass weight, conformation, and fatness were dominated by Holsteins. Data for these traits in most of the other breeds were from the offspring of sires of these other breeds and Holstein dams. Feed intake was primarily recorded on purebred sires of terminal sire breeds at a central performance test station under carefully controlled data recording and environmental conditions. Linear type traits were primarily recorded on purebred Charolais and Limousin. With the exception of the feed intake and some weaning and live BW records, all data were recorded under field conditions.
|
|
|
Computation was carried out on several computers, one of which was a 64-bit PC, with a 2.40-GHz, AMD Operaton dual-core processor, and 8 gigabytes of RAM. A program was written in Fortran 90 to simulate the true breed effects, true breeding values, and phenotypes. Mixed model equations were solved using a version of PEST (Groeneveld, 1990
) compiled for a 32-bit PC.
| RESULTS |
|---|
|
|
|---|
On the 64-bit PC, 4 replicates could be run simultaneously in 462 min, with the solving of the mixed model equations taking 98.6% of this time. All 350 replicates could have been completed on this machine in 674 h.
Bias and Accuracy of Estimated Effects
Results presented here cover the 7 numerically most important breeds in the data set for 6 breeding goal traits. Results are not presented for the minor breeds, for the linear type traits, or for carcass weight and carcass fatness. Carcass weight and carcass fatness have data structure almost identical to carcass conformation, with any differences in accuracy and bias being due to differences in relevant variance components and correlations. Accuracy and bias of estimated genetic group values, of the comparisons between different estimated genetic group values, and of EBV within genetic group and across genetic group of AI sires are presented.
Bias and Accuracy of Genetic Group Effects.
Small biases in the estimated genetic group effects were observed for most traits (Table 4
). When expressed as a percentage of the phenotypic SD of the traits involved, most of these were not important (Table 2
) and only in 3 cases were the biases significantly different from zero (P < 0.05): carcass conformation for the Simmental genetic group, and calf quality for the Belgian Blue and Hereford genetic groups. For carcass conformation the bias ranged from 0.00 to 1.63% of the phenotypic SD. Calf quality was an exception. It displayed much more bias than the other traits, ranging from –10.31 to 5.85% of the phenotypic SD across the different estimated genetic group effects.
|
|
|
|
|
|
Average accuracies of EBV across genetic group ranged from 0.20 for calf quality in Simmental to 0.93 for carcass conformation in Holstein and were dependent upon the accuracy of the estimated genetic group effect and accuracy of EBV within genetic group. The accuracy of EBV across genetic group may benefit or suffer from having high or low accuracy of the estimated genetic group effects or from having high or low accuracy of EBV within genetic group. For example, average accuracy of EBV within genetic group for carcass conformation (0.77) and weaning weight (0.79) in Limousin have similar values, but the large differences in the values of the accuracy of the estimated genetic group effect for these traits (carcass conformation: 0.99, weaning weight: 0.57) create large differences in average accuracy of EBV across genetic group: 0.90 and 0.75.
| DISCUSSION |
|---|
|
|
|---|
Accuracy of EBV Across Genetic Group
The accuracy of EBV across genetic group of an animal was determined by the accuracy of the estimated genetic group effects of the genetic groups that constitute an animal and the animals accuracy of EBV within genetic group. The relative importance of these 2 components depended on the ratio of breed to genetic variance, the accuracy with which these components are estimated, and the sampling covariances among the genetic group effects and among the animals individual genetic effect and the genetic group effects. Where there is poor partitioning of these effects, the sampling covariances among them may become important.
The accuracy of the estimated genetic group effect and the accuracy of EBV across genetic group provide information about how well these effects are estimated for individuals. However, as selection decisions involve comparing alternatives, the accuracy of comparisons among these effects may be more important. Publishing the accuracies of every pair-wise comparison would be cumbersome under the current system, in which an active bull list of the top 75 sires for total beef merit is published in Ireland. Web-based delivery of sire breeding advice as part of customized selection indices (Garrick, 2005
) would make the delivery of these accuracies feasible.
Advantages of the Methodology
The quality of genetic evaluations depends on the quality of the data used in the analysis and on the quality of the model used to analyze it. The quality of models can be assessed with goodness-of-fit tests such as the R2, Akaike (Akaike, 1973
), and Bayesian (Schwarz, 1978
) information criterion, or based on the predictive ability of models by estimating effects on a randomly selected portion of the data and then using these estimated effects to predict records for the remaining data (e.g., Stone, 1974
; Perez-Enciso et al., 1993
; Olesen et al., 1994
; Urioste et al., 2003
).
Assessing the quality of the data involved in genetic evaluations has received less attention than the assessment of the quality of models, and the methods developed do not provide complete assessment. Current methods to check data quality use basic statistics, determine the consistency of results and sire variances from consecutive evaluations, check for trends in Mendelian sampling (Klei et al., 2002
), or approximate the accuracy of individual EBV within genetic group (e.g., Tier and Meyer, 2004
). These generate summary statistics that are difficult to interpret, or give incomplete assessment, or fail to account for all of the issues known to affect data quality. Data mining techniques are being developed to assess data quality in genetic evaluations (Banos et al., 2003
), but although these promise undoubtedly powerful insights, they require specialist training.
The sampling method used in this study assesses the data quality in genetic evaluations using statistical parameters and techniques with which practitioners of animal breeding are familiar. It takes full account of issues influencing data quality, such as effective numbers of direct and correlated records contributing to an animals EBV, any number of fixed effects each with any number of levels, missing records, and connectedness among animals and sub-populations.
In addition, the sampling method can be used to assess a breeding program in terms of its potential to provide genetic improvement. Deterministic methods can be used to predict the genetic gain within a breeding program (Wray and Hill, 1989
) and these have been expanded to situations involving crossbred as well as purebred data (e.g., Bijma and van Arendonk, 1998
). However, deterministic methods have some drawbacks. The sampling approach used in this study can take full account of the structure existing in the data to determine the accuracy of EBV. The accuracy of EBV are proportional to the potential of response to selection (Falconer and Mackay, 1996
). In a breeding program involving multiple genetic groups, response to selection is affected by both the accuracy of EBV within genetic group and the accuracy of EBV across genetic groups. Although the accuracy of EBV within genetic group may be acceptable (e.g., for feed intake in Holstein due to correlated information) and genetic gain can be made within a breed, the accuracy of the estimated genetic group effect, and, consequently, the accuracy of EBV across genetic groups may not be acceptable and efficiency of across breed selection would be reduced.
The sampling method allows alternative data recording scenarios to be tested. Phenotypic records could be simulated for animals in the pedigree that do not have phenotypes in the real data to determine the effect that recording this information would have on the accuracy and bias of the estimated effects before embarking on potentially expensive data recording. By looking at areas of strength and weakness for the different traits, suggestions could be made about where the priorities lie for efforts to increase or decrease the effective numbers of records.
The sampling method calculates accuracy and bias assuming that the true genetic model is used in the evaluations. This is obviously a simplification of the real genetic model and parameters are only estimated. For example, in the Irish model no account is taken of heterogeneity of variances in the different genetic groups or of effects of nonadditive genetic variance. Modeling of these effects is difficult because of the number of parameters to be estimated and the quality of the data needed for these estimations (e.g., Lutaaya et al., 2002
; Legarra et al., 2007
). In addition, because the method is based on BLUP, it does not account for possible biases due to unrecorded selection or unrecorded preferential treatment of animals. The true accuracy and bias of genetic evaluations might therefore be somewhat different from their calculated accuracy.
The computing required for this analysis was extensive, especially for the calculation of accuracies for routine genetic evaluations. However, several steps can be taken to reduce the computing time required and therefore allow routine application. Using more modern breeding value estimation software could reduce the overall time requirements. Using MiX99 (Lidauer et al., 2006
) compiled for a 64-bit PC, the mixed model equations for each replicate could have been solved in 366 min instead of the 462 min required with PEST. To use this method to routinely assess the quality of genetic evaluations with genetic groups in the relationship matrix, the seed number used in the simulation and the solutions of the mixed model equations could be stored for each replicate. Then, for the subsequent genetic evaluation, one could update the files of simulated values with values simulated for new records included since the previous genetic evaluation and use the solutions of the matching replicate from the previous genetic evaluation as starting values for solving the mixed model equations. This study used 350 replicates to calculate accuracy and bias of estimated effects. Using fewer replicates would have reduced the computing time. For animals with a true accuracy of 0.50, reducing the number of replicates from 350 to 200 would have only increased the 95% confidence interval of their accuracy from 0.07 to 0.10 (Figure 1
).
Bias and Accuracy of Estimated Effects in the Irish Beef Cattle Population
The Irish multiple-breed beef cattle data set is extremely unbalanced. Certain breeds are only mated with certain others, purebred data are unavailable for certain breeds, and the amount of data recorded on the different traits differs within breeds. In spite of such imbalances the genetic evaluation model applied to the Irish multiple-breed beef cattle population allows the estimation of breed effects and comparisons among breed effects without serious systematic bias. The large differences in accuracy of the estimated genetic group effects and in the accuracy of the comparisons among the genetic group effects reflect large differences in the quality and quantity of the information used to estimate these effects and comparisons. From the results it is clear that some feed intake records on Holstein animals are required to get a reasonable estimate of the breed effect. The EBV within breed are reasonably accurate due to correlated information; hence, there is no need for continuous recording of Holstein animals for feed intake. For the beef breeds the focus should be on increasing the effective numbers of records to improve the accuracy of estimated genetic group effects and consequently the accuracy of EBV across Limousin, Simmental, and Charolais breeds for cull cow BW, and Aberdeen Angus and Hereford for weaning weight. The calf quality trait exhibited more bias than feed intake even though calf quality tended to have more records per sire in the different breeds. Feed intake was recorded on animals that were selected for performance testing based on their EBV for other traits. At the same time, performance testing was done by comparing animals from different breeds under equal circumstances. Calf quality was recorded on commercial animals sold for export. Although these were likely to be superior commercial animals, they were not selected on the basis of the other traits recorded. Therefore, better correction was likely for selective recording of feed intake compared with calf quality.
In conclusion, a method was developed to assess the quality of genetic evaluations in which animals from different genetic groups are compared, in terms of accuracy and bias of estimated effects and of comparisons among effects. Accuracy of EBV within genetic group and EBV across genetic group was very different in some instances, with the differences due to the accuracies of the estimated genetic group effects. Further theoretical work is required to quantify the effect of the ratio of breed group variance to genetic variance on the relative importance of the different components contributing to the accuracy of EBV across genetic group and to quantify the effect of sampling covariances among these components. This method can be used to calculate accuracy of multiple-breed EBV.
| Footnotes |
|---|
2 Corresponding author: John.Hickey{at}wur.nl
Received for publication October 12, 2007. Accepted for publication January 29, 2008.
| LITERATURE CITED |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |