J. Anim Sci.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Carabaño, M. J.
Right arrow Articles by Díaz, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Carabaño, M. J.
Right arrow Articles by Díaz, C.
J. Anim. Sci. 2004. 82:3447-3457
© 2004 American Society of Animal Science


ANIMAL GENETICS

Comparing alternative definitions of the contemporary group effect in Avileña Negra Ibérica beef cattle using classical and Bayesian criteria1

M. J. Carabaño*,2, A. Moreno*, P. López-Romero{dagger} and C. Díaz*

* Departamento de Mejora Genética Animal, INIA, Madrid, Spain; and and {dagger} Unidad de Bioinformática, CBMSO-CSIC, Madrid, Spain


    Abstract
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Implications
 Literature Cited
 
Data on weaning weight from 12,740 animals were used to compare different definitions of contemporary groups (CG) for the genetic evaluation of the Avileña Negra Ibérica beef cattle breed. Six alternative definitions for the CG effect were considered: herd-year-season of calving (HYS), with seasons defined according to the four natural seasons; herd-year-month of calving (HYM); herd clusters of 30 d (HC30-30) or 90 d (HC90-90); and adaptive herd clusters with two time limits, 30 and 90 d (HC30-90), and 30 and 180 d (HC30-180). A minimum of five observations in each CG class was required. This rendered substantial differences in loss of information, ranging from 0.7% of the total number of records for HC30-180 to 14% for HYM. Several classical statistics and Bayesian criteria for statistical model comparison were used. The use of classical criteria, such as the between- and within-CG variation and the accuracy of prediction, can be controversial because of their dependency on the unknown variance components. Residual variance decreased with the decrease in time span associated with the definition of CG. This was expected in this population because environmental conditions are highly variable throughout the year. However, estimates of the additive genetic variance for direct effects, which should not be affected by the definition of CG, were substantially larger for definitions involving larger time periods (HYS, HC90-90). When parameters used in the current evaluation procedure were used with all data sets, CG involving 30 d (HYM and HC30-30) were optimal in terms of providing the lowest/largest within-/between-CG variation. On the other hand, CG involving 90 d (HYS and HC90-90) yielded the poorest within-/between CG variation, with only a slight improvement of accuracy of prediction of direct genetic values over the other definitions. Bayes factors and cross-validation predictive densities allowed for improved discrimination among models. Models including CG spanning 30 d were more plausible and showed better predicting ability than models spanning 90 d. Adaptive CG showed intermediate results. Overall, it seems that average time span rendered by the different definitions had a major effect on the ranking of models. However, from the breeder’s point of view, the loss of information associated with definitions involving shorter periods of time, such as HYM or HC30-30, might be unacceptable.

Key Words: Bayesian Analysis • Beef Cattle • Clusters • Contemporary Groups • Genetic Evaluation


    Introduction
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Implications
 Literature Cited
 
Genetic evaluation models include the contemporary group (CG) effect to remove variation due to changes in herd environmental conditions over time. Establishing the period of time that identifies animals belonging to the same CG within a herd is controversial. A balance between large accuracy and small bias needs to be reached to optimize the definition of CG. Genetic evaluations for weaning weight in the Avileña-Negra Ibérica population use predetermined seasons to form CG. A problem associated with this definition is the arbitrary assignment of seasons, which does not correspond either to the maximal accuracy or minimal bias criteria. Alternative approaches have been proposed to try to account for this problem (Wiggans et al., 1988Go; Schmitz et al., 1991Go; Crump et al., 1997Go). Those procedures require establishing the numerical value of the size and time span parameters of the CG that will result in an optimized definition.

Several ad hoc criteria have been used to compare alternative definitions of CG (Schmitz et al., 1991Go; Sivarajasingam, 1993Go; Crump et al., 1997Go; Van Bebber et al., 1998Go). These criteria include estimates of within- CG variance, residual variances, effective number of progeny, and accuracy of genetic evaluations. Other criteria based on the likelihood have been explored to a lesser extent. In most situations, alternative models arising from different definitions of CG are not nested, so likelihood ratio tests are not appropriate. Akaike’s information criterion can be used in these cases, but it does not account for the amount of information available and tends to favor more complex models when the amount of data is sufficiently large. The Bayesian analysis provides the tools for model selection in a more general framework.

The goal of this study was to compare alternative definitions of CG for weaning weight of Avileña Negra Ibérica beef cattle using several criteria that include classical and Bayesian criteria for comparison of models.


    Materials and Methods
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Implications
 Literature Cited
 
Data
Weaning weight records from 12,740 purebred animals from the Avileña-Negra Ibérica beef cattle breed, born between 1984 and 1999 in 83 herds, were used in this study. These records formed the genetic evaluation database for this breed. The Avileña Negra Ibérica is a Spanish local breed managed under very extensive conditions. Artificial insemination is not commonly used, except to generate genetic ties among herds for the genetic evaluations. In the data set used in this study, approximately 10% of sires of calves with records had progeny in more than one herd. Although more genetic ties derive from grandparents and other genetic relationships, a relatively low degree of connectedness among herds is expected in this data set. Herd size for this breed is relatively large, but a variable proportion of cows are bred with terminal sires from specialized beef breeds. The average size of herd-year classes in the data set, which only contains records from purebred animals, was 37 records. Five percent of the data were in herd-year classes with 10 or less records or in classes with 135 or more records. Calving distribution throughout the year is shown in Figure 1Go. Calvings are distributed relatively evenly over the year except for the summer months, when grazing conditions are very poor, with less than 5% of calvings occurring from June to August.



View larger version (12K):
[in this window]
[in a new window]
 
Figure 1. Pattern of calvings over the year in the data set.

 
The pedigree file included 21,483 animals. Totals of 325 bulls and 5,773 cows were sires and dams of animals with records. Of those, 77 bulls and 966 cows also had records as calves. One hundred seventy-eight bulls were sires and also maternal grandsires of animals with records, and 131 bulls were maternal grandsires of cows with their own record, which also were dams of calves with records. The pedigree structure and records were considered adequate for the estimation of maternal effects.

Models—Definition of CG
The following model, currently used in genetic evaluations for this breed, was used:


[1]

where yijklno is the weaning weight of animal n, CGi is the effect of the ith contemporary group, sxj is the effect of the sex of the animal (j = 1,2), dagek is the effect of the kth class of the age of the dam (k = 1,6), where groups of age consisted of dams being 2, 3, 4, 5 to 9, >9 yr old or having unknown age at calving, sfl is the effect of the lth class for supplement feed (l = 1,2), with animals receiving or not supplementary feed before weaning, b is the linear regression coefficient of weight on age of calf at weaning (agen), udn is the direct additive genetic effect of the nth animal, umo is the maternal additive genetic effect of o, dam of animal n, po is the maternal permanent environmental effect of o, and eijklno is the error term for record yijklno.

Six alternative definitions of CG were used. A minimum of five observations per CG class was required in all cases. Contemporary groups were formed within herd according to "conventional" seasons (spring, summer, autumn, and winter [HYS]) or months (HYM), or, following the "natural" calving pattern. The CG classes for the "natural" clusters were obtained in two steps. In a first step, animals were sorted by birth date within herd. Starting from the first date of birth, a CG was formed if adding 30 or 90 d to the first birth date in the group resulted in groups that included five or more animals. Once the list of ordered birth dates was read, some animals could not be assigned to any CG and were discarded. Two data sets resulted from this first step, HC30-30 and HC90-90, for natural fixed period herd clusters, spanning 30 or 90 d, respectively. In the second step, adaptive strategies involving two time limits were developed. Animals not attached to any CG for the HC30-30 definition were assigned to the previous/subsequent CG if the difference between its date of birth and the date of birth of the first/last animal in that CG did not exceed 90 d, for the HC30-90 definition, or 180 d, for the HC30-180 definition. After this second step, some animals still remained unassigned to a CG and had to be discarded from the corresponding data sets.

The six data sets with alternative definitions of CG were analyzed with Model [1]. Both classical approaches and Bayesian procedures were used. For the classical analyses, BLUP procedures were employed. Variance components used in the current evaluation system were obtained from the literature and were also used in these analyses. These values were 111.7, 64.7, and –10.6 kg2 for the additive genetic direct and maternal variances and direct-maternal genetic covariance, respectively, 49.3 kg2 for the maternal permanent environmental variance, and 363.3 kg2 for the residual variance.

For the Bayesian analyses, variances and other parameters involved in the model were considered unknown. Data were assumed to be generated from a multivariate normal distribution (MVN), according to the stochastic process:



where b is the vector containing the systematic effects previously defined (CG, sx, dage, sf, and b); ud is the vector of direct genetic effects; um is the vector of maternal additive genetic effects; p is the vector of maternal permanent environmental effects; is the residual variance; X, Zd, Zm, and W are the corresponding incidence matrices; and R is the (co)variance matrix of residuals.

Conjugate priors were used for location and scale parameters. Proper-vague priors were employed in order to assign low degree of belief to the prior information and to allow the computation of the marginal density of the data and the Bayes factor (BF), which is not well defined for improper priors (Gelman et al., 1995Go). For the location parameters, MVN distributions were assumed:



where {zeta} is a positive, scalar hyper parameter, large enough to give small weight to prior information (more precisely, {zeta} = 106); u contains the vectors for direct and maternal additive genetic effects; G = {sum} u{otimes}A and P = I are the additive genetic and maternal permanent environmental (co)variance matrices, respectively; A is the relationship matrix; {sum}u is the matrix containing the (co)variances among the additive direct and maternal genetic; and is the maternal permanent environmental variance.

For the dispersion parameters, scaled inverse {chi}–2 and inverse Wishart (IW) distributions were used:


where parameters {nu}i and are interpreted as degrees of belief and a priori values for the residual, maternal permanent environmental variances and for the additive genetic (co)variance matrix. A value of four was assigned to the degrees of belief for the {chi} –2 distributions to provide low weight to the prior information. Degrees of belief for the IW distribution were 12. In both cases, degrees of belief were larger than the respective matrix dimensions to avoid degenerate forms (Gelman et al., 1995Go). Numerical values for the scalars and and the matrix Su2 were those currently used in genetic evaluations of this breed.

Posterior marginal inferences on parameters of interest were drawn from their corresponding conditional posterior distributions through a Gibbs sampling scheme. The specifications for the Gibbs sampling implementation were based on the Raftery and Lewis (1992)Go criterion, which was determined using the public domain program Gibbsit v.2.0 (http://lib.stat.cmu.edu/general/gibbsit). Alternatively, the coupling method analysis (Johnson, 1996Go; García-Cortés et al., 1998Go) was performed. Figure 2Go presents the convergence of two independent chains run with the same seed and different starting values for two components, the direct genetic variance and the direct-maternal genetic covariance, for two definitions of CG, HYM, and HC30-90, to illustrate convergence of the parameters of interest. Other components of variance under other definitions showed similar patterns. As shown in this figure, convergence to the stationary distribution seems to be reached at approximately 4,000 iterations. The Raftery and Lewis (1992)Go criterion yielded a substantially lower number of iterations for convergence. Finally, a safe burn-in period of 20,000 iterations was carried out for all analyses, and a total of 100,000 iterates was carried out after burn-in. Posterior mean and standard deviation of the variance components were obtained with results from all iterates after burn-in. However, posterior distributions (not shown), including modes and high-density regions were obtained using a thinning parameter of 40 iterations. Effective chain size ranged from 27 for the genetic covariance in the HC30-30 model to 254 for the maternal permanent environmental variance in the HYS model.



View larger version (31K):
[in this window]
[in a new window]
 
Figure 2. Coupling chains to determine the burning period for the estimation of direct genetic variance (DGV) and direct-maternal genetic covariance (GCOV) with herd-year-month (HYM) and cluster 30 to 90 (HC30-90) definitions of contemporary groups (CG).

 
Comparison of Models
Classical Criteria.
Classical ad hoc statistics used in other studies dealing with the comparison of alternative definitions of CG were used. The between- and within-CG variances, the effective number of progeny for direct effects, and the accuracy of breeding values for groups of animals were computed.

The between and within CG variance were obtained using the SAS proc Varcomp (SAS Inst., Inc., Cary, NC) under the type I option for the model,


[2]

where yij* is the observation corrected by the corresponding BLUE and BLUP solutions for other than the CG effect in Model [1].

Effective number of progeny for the evaluation of sires for the direct genetic effect was obtained for sire j as:


where nij is the number of progeny of sire j in CG i, ni. is the total number of records in CG i, and cj is the number of CG where sire j has progeny.

The accuracy of prediction of breeding values was measured as reliability, computed by the well-known formula:


where Ri is the reliability of the prediction of genetic value for animal i; PEVi is the corresponding prediction error variance; is the direct or maternal additive genetic variance when considering prediction of direct or maternal genetic values, respectively; and Fi is the inbreeding coefficient for animal i.

Variance components used in the routine genetic evaluation and estimates obtained from the Bayesian analysis were used to obtain the between and within CG variances and the accuracy of predicted breeding values for all models.

Bayesian Criteria.
The BF (Newton and Raftery, 1994Go; Kass and Raftery, 1995Go) and the cross-validation predictive densities of the data (Gelfand et al., 1992Go), considered to be common tools for model checking and sensitivity analysis in the Bayesian framework (e.g., Gelman et al., 1995Go), were computed to assess the performance of the alternative models based on the different definitions of the CG. Both criteria have the advantage of being applicable in a general framework, without the requirements needed in other procedures such as the likelihood ratio test, only applicable to nested models.

The BF for two competing models, Models [1] and [2], was computed as the ratio of their two corresponding marginal densities of the data (MD) under each model. The MD for each model was obtained from the Newton and Raftery (1994)Go estimator:


where H is the total number of iterations after the burn-in period and f(y |{Phi}(i)) is the conditional distribution of the data given the parameters involved in each model at iteration i.

The cross-validation predictive densities were defined as the set of univariate densities (Gelfand et al., 1992Go; Gelfand, 1996Go),


where y(r) is the vector of observations with observation yr excluded, and Yr is a random variable which has distribution f(Yr |y(r)), which gives the values of the unobserved Yr that would most likely be observed when the model has been fitted to y(r); Gelfand, 1996Go). Comparison of models was done by computing the expected value of the checking function, g = Yr yr, for all the data with respect to its univariate predictive density, dr = EYr|y(r) [Yr – yr], with the best model being the one n having minimum An importance sampling, with the joint posterior density of the parameters given the data as an importance distribution, was implemented to evaluate every dr. The log of the MD (LMD) and D statistics were obtained for each datum within the Gibbs sampling process (details of the implementation can be found in López-Romero et al., 2003Go).

Because the value of MD varies with the size of the data set, the total data set containing all 12,740 initial observations (data in CG with less than five observations were not discarded) was used to compute the LMD for each alternative definition of CG. The D statistic was computed for both the total data set (DT) and the specific data sets of different size associated to the alternative CG models (Ds).

Computer programs employed to solve the BLUP equations and to obtain corresponding accuracies, as well as programs used to carry out the Bayesian analyses, were developed by the authors in Fortran90 language.


    Results
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Implications
 Literature Cited
 
Table 1Go presents information about the loss of data and weighted averages for size and time span of the CG under the six alternative definitions. Different definitions resulted in different CG group sizes, time spans, and loss of information in terms of the number of records and number of herds involved. Definitions with 90-d periods (HYS and HC90-90) resulted in CG of the largest size (more than 20 observations per contemporary group, on average), but also in larger average time spans (approximately 60 d). Loss of information was low for these strategies (4.7 and 2.7% for HYS and HC90-90, respectively), but not the lowest. Definitions with 30 d, with one (HYM, HC30-30) or two time limits (HC30-90 and HC30-180), yielded CG of similar average size (ranging from 13.2 to 14.1 observations) but quite different average time spans (from 21.4 for HYM to 43.0 for HY30-180). The loss of information was also variable among these strategies. The HYM definition resulted in the largest loss of information (14.2% of the data and 19.3% of the herds), and HC30-180 resulted in the lowest loss (0.7% of the data and 7.2% of the herds) over all definitions. The use of two time limits improved the distribution of CG size, in the sense of decreasing the percentage of small CG. The percentage of data with CG of size five and six decreased from 9.5% for the HYM strategy to 6.8% and 6.0% for the HC30-90 and HC30-180 strategies. The use of fixed periods with the "natural" grouping strategies, HC30-30 and HC90-90, had a minor effect on the average and distribution of CG size compared with the corresponding conventional HYM and HYS. However, the fixed period "natural" definitions resulted in longer time spans (approximately 4 d for HC30-30 vs. HYM and 12 d for HC90-90 vs. HYS), but yielded lower losses of information than HYM and HYS.


View this table:
[in this window]
[in a new window]
 
Table 1. Number of herds and records, with corresponding percentage of discarded information when forming contemporary groups (CG), number of CG, and weighted averages for CG size and time span for different definitions of CG
 
Table 2Go shows the mean, mode, standard deviation, and 95% high posterior density intervals (HPD) of the posterior distributions of the variance components associated with Model [1] for the data sets corresponding to the six definitions of CG. Posterior distributions for the genetic components were not symmetrical, showing quite large discrepancies between mean and mode estimates, especially for the direct genetic variance. Estimated posterior modes for the direct genetic variance for HYM and HC30-30 models were much smaller than the estimated posterior means. This resulted in decreased differences among estimates for HYM, HC30-30, and the adaptive definitions compared with differences among posterior means for these models. The genetic direct component also showed the largest standard deviation and largest 95% HPD. The definition of CG had a larger effect on the direct genetic component, genetic covariance, and residual variance. The estimates of the direct genetic component were particularly large for the HYS and HC90-90 definitions, which were somehow counterbalanced by larger negative estimates of the genetic covariance between direct and maternal effects. A possible explanation for the larger estimates of the direct genetic variance with models involving CG with larger periods of time could be a confounding between environmental and genetic variation due to a bad correction of the environmental trend over time together with the low degree of connectedness of the data. Estimates of direct heritability and genetic correlation, calculated with the posterior means of the variance components, ranged from 0.20 and – 0.43 for the HC90-90 definition to 0.12 and – 0.26 for the HC30-180 strategy. Changes in the maternal variances were somehow smaller across models. Estimates of maternal heritability were low, ranging from 0.07 to 0.08. A larger effect of the definition of CG was expected over the direct genetic component because direct effects are linked to the animal producing a phenotype in a CG. On the other hand, maternal effects are linked to dams, whose progeny are recorded in different CG. Moreover, the prediction of maternal genetic values relies more heavily on the pedigree than on the phenotypic information because of the low value of maternal heritability. Estimates of residual variance followed the same pattern as the time span associated with the alternative definitions of CG. Conventional HYM and the natural calving strategy with a fixed 90-d time limit had the smallest and largest values, respectively, whereas the adaptive definitions with two time periods had intermediate results. The HC30-180 estimates for the residual variance were closer to the HYS estimates than to the HC30-90 values, according with the observed differences in time spans. A deficient adjustment of the environmental variation associated with CG effects spanning larger periods of time could explain the observed results for the residual variance.


View this table:
[in this window]
[in a new window]
 
Table 2. Posterior means, modes, standard deviation, and 95% high posterior density intervals (HPD) for the direct (DGV) and maternal (MGV) genetic variances, genetic covariance (GCOV), maternal permanent environmental variance (MPEV) and residual variance (RV) for weaning weight with different definitions of contemporary groups (CG)
 
Table 3Go shows the between CG and within CG variances for the CG effect in Model [2]. Results are shown when parameters from the current genetic evaluation were used for all data sets and when corresponding estimates obtained in the Bayesian analysis for each data set were employed. The best clustering strategy should minimize the within-CG variance and maximize the variance among CG levels. With the same set of parameters, the within-CG variance became progressively larger as the period of time involved in the definition of the CG increased. As expected, HYS and HC90-90 had worse performance than the other models. Models HYM and HC30-30 had the smallest within-CG variance, but the ratio of the within to the total variance was very close for all models, except for HYS and HC90-90. For the between-CG variance, HYS and HY90-90 also had the worst performance. The between-CG variance was similar for the other models. When different parameters for each data set were used, the sum of the estimates of the between- and within-CG variances was different for the alternative definitions of CG, leading to different scales of measurement. Total variance of adjusted phenotypes was largest for HYM and HC30-30 and smallest for HYS and HC90-90 strategies. This result was likely due to adjustment for the genetic components, with small and large estimated variances for the direct components for CG involving 30 and 90 d, respectively. When the within-CG component was expressed as a ratio to the total variance, HYM and HC30-30 had the smallest ratios and the other models were all similar.


View this table:
[in this window]
[in a new window]
 
Table 3. Between () and within () contemporary group (CG) variances, and the ratio of the between to total variance (/ {sigma}T2) with different definitions of CG when equal parameters from the current evaluation are used for all models or when specific estimates for each data set from the Bayesian analysis are used
 
Table 4Go presents the accuracy of prediction for direct and maternal genetic values for groups of animals that should have the largest accuracies for each type of genetic value. These groups of animals are sires of calves with records, calves with their own record for the direct genetic effect, maternal grandsires of calves with records, and dams of calves with records for the maternal genetic effect. Results for parameters from the current evaluation system and for posterior means of the variance components obtained from the Bayesian analyses are shown. In general, average accuracies for predicting genetic values were small, partially due to the relatively low degree of connectedness among herds in this population. As expected, accuracies were larger for sires of progeny with data (sires of recorded calves for the direct effect and maternal grandsires of recorded calves for the maternal effect). When the variance components used to compute accuracies were the same for all models (i.e., for the current evaluation parameters), differences among accuracies should be due to differences in the amount of information available for each animal and/ or to differences in the structure of the data. Structure is related to the number and distribution of records in the CG levels and connectedness among them. Accuracies were quite similar for all strategies within all groups of animals considered. The HYS and HC90-90 definitions had the largest accuracies, particularly for predicting direct genetic values of sires of calves with records. To investigate these results a little further, the effective amount of data relative to the direct effect for sires of calves with records is also presented in Table 4Go. The average effective number of progeny was approximately 15.6 for definitions with shorter periods of time (HYM, HC30-30, HC30-90, HC30-180), which was similar for the other definitions (approximately 17.6). As expected, given the smaller number of levels for the CG effect, the two definitions with 90-d periods (HYS and HC90-90) had a larger effective number of progeny, in agreement with the slightly larger accuracy observed for this group of animals. Even though the alternative definitions resulted in relatively large differences in the number of discarded records, the definitions did not substantially change the structure of the information available.


View this table:
[in this window]
[in a new window]
 
Table 4. Number of animals and means (SD in parentheses) of accuracies (ACC) for estimated direct and maternal genetic effects when accuracies are obtained with the same parameters from the current evaluation for all models or when specific estimates for each data from the Bayesian analysis are used for sires of animals with records (SIRES), calves with records (CALVES) for direct effects, maternal grandsires of animals with records (MGS), and dams of calves with records (DAM) for maternal genetic effects, and effective number of progeny for the direct effect (Ned), for different definitions of contemporary groups (CG)
 
When parameters involved in the calculation of accuracies were replaced by the estimates obtained with each CG model, the estimated accuracies varied with the corresponding values of heritability. Accuracies were greater for data sets and corresponding models that had larger estimates of the heritability; HYS and HC90-90 for direct effects and HC30-30, HC30-90 and HC30-180 for maternal effects. These results indicate that, for these data and the associated models, calculated accuracy is mainly determined by the numerical value of the parameter estimates used to compute the genetic effects, with only a small impact of alternative definitions of CG observed on the structure of the data.

Criteria for comparison of models from the Bayesian analysis are shown in Table 5Go. The predictive ability of excluded observations of the alternative models measured in the total (DT) and specific data sets (Ds) ranked models in the same order. Low values for both statistics indicate better predictive ability of the model. The DT values were variably larger than the corresponding Ds values, probably because in the total data set, where no observations are discarded, some CG had very few observations (less than five). The difference between DT and DS statistics was greatest for the HYM and HC30-30 definitions, where the loss of data when forming the specific CG was largest. The HYS and HC90-90 models showed lower predictive ability of missing observations than HYM and HC30-30 models or models with two time limits. Ranking of models was the same when the LMD statistic was considered. The quantity 2log BF, which provides a numerically more tractable figure than the BF and is analogous to the likelihood ratio test, ranged from 185.0, when comparing more similar models such as HC30-90 and HC30-180, to 1,214.6, when comparing most distant models HYM and HC90-90. In all cases, the BF showed strong evidence favoring definitions with CG spanning shorter periods of time.


View this table:
[in this window]
[in a new window]
 
Table 5. Statistics used to measure the predictive ability of unknown observations computed with specific data for each definition of CG (Ds) and with the total data set (DT), and log marginal densities of the data (LMDT) computed with the total data set with different definitions of contemporary groups (CG)
 

    Discussion
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Implications
 Literature Cited
 
Agreement on an optimal definition of contemporary grouping requires a compromise between the size and average time span of the CG, which is equivalent to balancing the accuracy and bias of estimates of CG effects and, consequently, of predicted genetic values. The amount of information discarded when forming the CG is also important if returns from investment in the recording system are to be maximized.

In this study, definitions involving 90 d (HYS and HC90-90) had the largest CG size, but also largest average time spans, whereas definitions involving 30 d (HYM and HC30-30) yielded the smallest CG size and also the shortest average time spans, as expected. Adaptive definitions involving two time limits resulted in CG spanning intermediate time periods without a significant improvement in average CG size compared with CG of 30 d. Loss of information was minimal with these definitions.

The use of what we called classical, ad hoc criteria to provide information on what definition would be optimal in terms of bias and accuracy of prediction is not adequate because these criteria depend on the variance components associated to each model. The use of the same parameters in the BLUP analyses with all models allowed inferences about the effects of the amount and structure of available information on accuracy of genetic value prediction and comparison of between and within CG variation from different models on the same scale. With the same parameters, models with CG spanning 90 d (HYS and HC90-90) increased the effective number of progeny for predicting direct genetic values of sires, with a slight increase of accuracy. However, these strategies yielded a suboptimal partition of between and within CG, indicating that a part of the environmental variation in the CG effects remains unadjusted when CG span larger periods of times. This result was expected due to the variable environmental conditions during a year and across years and the extensive production system of the Avileña-Negra Ibérica breed. However, the comparison with the same parameters is not strictly correct because true variance components are not expected to be equal for all models. In particular, the residual variance would be expected to decrease as more environmental variation is accounted for by the CG effect (i.e., for CG spanning shorter periods of time), with no differences expected for the other components. In this study, estimates of the residual variance followed the expected trend, but the genetic components were significantly affected by the definition of CG. Larger estimates of the direct variance component were found for the HYS and HC90-90 definitions. As a result, when estimates of the variance components obtained for each alternative definition were used, the HYS and HC90-90 definitions resulted in substantially larger calculated accuracies for the predicted direct genetic effects, with no clear pattern for the maternal effects. However, the suspected bias in the estimated variance components makes this result questionable. Legarra et al. (2002)Go, in a comparison of different contemporary group definitions for dairy sheep in the Bayesian context, did not find large differences in the estimates of the variance components across definitions, except for the residual component. In this population, the low degree of connectedness may be affecting the estimated genetic components. The lack of genetic ties among herds would be expected to induce an underestimation of the genetic components of variance because only the intraunit variation is captured in the estimation process (e.g., Clément et al. 2001Go). This is likely to be the case in this study, where genetic variances and corresponding heritabilities seem to be smaller than the values usually provided in the literature for weaning weight in beef cattle, for models including CG spanning 30 d or adaptive CG. Adaptive CG definitions only contained a small proportion of data in CG spanning more than 30 d. For definitions of CG spanning larger periods of time, HYS and HC90-90, the environmental variation due to changes in environmental conditions throughout the year may not be properly removed by the CG effect. Due to the lack of sufficient genetic ties, this unadjusted environmental variability might have been erroneously assigned to the direct genetic effect, resulting in inflated estimates for the direct genetic variance.

In an attempt to provide a theoretically better approximation for the accuracy of the genetic value estimation under alternative models, the variance of the posterior distribution of genetic values, var(u |y), was also obtained (data not shown). This value would provide a measure of the accuracy of the marginal estimates of the genetic effects taking account of the uncertainty about other parameters of the model, including the variance components. Accuracies of estimated genetic values obtained from the variance of the posterior distribution of the genetic values were nearly identical to the accuracies obtained in the BLUP analyses (shown in Table 4Go). In this case, the corresponding posterior means obtained in the Bayesian analyses are considered as true values of the variance components. Therefore, accounting for uncertainty of estimates of variance components in the Bayesian analyses leads to the same conclusions as found with the BLUP analyses, which would be expected when the amount of information is large enough.

In contrast to use of several ad hoc statistics to judge the effectiveness of alternative models, using Bayesian model selection provides a procedure in which the evidence from the data, the prior odds, the model dimensionality, and the sample size are combined automatically in a single formula (Sorensen and Gianola, 2003Go). In this study, the Bayes factors and the cross validation predictive statistic ranked models in the same way. Both criteria indicated models with CG spanning 30 d as the most plausible and best predictors of missing data, and models with CG spanning 90 d as the worst. Adaptive definitions yielded intermediate values. These results again suggest that models with definitions of CG spanning larger periods of time might not properly account for environmental variation, which might lead to bias in estimates of the unknown parameters.

Other checking functions could have been used to compare the predictive ability of the models. In particular, the discrepancy between the probability that the predicted record is smaller than the observed record, and the expected value of 0.5 for this probability has been also used as a checking function in animal breeding (Varona et al., 1997Go; Legarra et al., 2002Go). This function measures systematic directional bias in prediction of unobserved records under alternative models. Systematic over- or underestimation of observations due to alternative definitions of CG are not expected in this case. Nevertheless, this checking function was tried for three alternative definitions in this study (data not shown). The average of the checking function value over all observations was nearly null and was approximately 0.20 for the three models analyzed when the absolute value of the difference between the probability of underestimation and the expected value of 0.5 was employed. These results indicate that, as expected, no systematic under- or overestimation occurred for alternative definitions of CG in this data, and that approximately 20% of the time we expect some kind of deviation of the predicted over the true value of the record.

Schmitz et al. (1991)Go, comparing different strategies to cluster animals in CG according to different periods, reached somewhat different conclusions from the ones obtained in this study. Those authors found that the size of contemporary groups was more important than the strategy for forming CG in terms of improving accuracy of genetic evaluations. They also found no bias in genetic evaluations except for CG spanning several years, due to a time trend in milk production of Holstein cattle. They concluded that CG with 15 to 40 observations would be optimal. In our study, where environmental conditions are much more variable than in the dairy cattle case, the bias introduced by defining CG spanning longer periods of time seems to have a larger effect than the size of the CG.

With respect to the way of defining CG given one or two time spans, Crump et al. (1997)Go argued that the use of cluster procedures that rely on distances between consecutive birth dates within a herd allows for the formation of "natural" contemporary groups, whereas with the fixed herd-year-season groups, seasons may not reflect the calving pattern within herds. These authors also argue that using two time spans instead of one promotes obtaining more evenly sized groups within herds. In our study, the use of "natural" CG strategies with one time span (HC30-30 or CG 90-90) did improve the CG size and decreased the amount of discarded records. However, definitions HC30-30 and HC90-90 did not provide better results than the corresponding strategies defined by conventional months (HYM) or seasons (HYS), in terms of plausibility of the model to fit the data or predictive ability, nor in achieving noticeable improvement in accuracy of prediction of genetic values. This might be explained by the fact that the relatively small improvement in size of CG with "natural" strategies resulted in longer periods of time associated with those CG. The use of two time limits (HC30-90, HC30-180) had a larger effect on the CG size than the use of only one limit of time, but also resulted in longer time spans. In addition, conventional months and seasons may align reasonably well with the calving pattern. If that is the case, the advantages of the "natural" grouping strategy may be reduced, as pointed out by Crump et al. (1997)Go.

All the CG definitions proposed in this study imply a lack of continuity in the CG effect over time, given that animals born 1 d apart can be assigned to different CG. Several approaches have been used in the animal breeding area to avoid this problem: assignment of observations to more than one CG according to fuzzy classification techniques (Strandberg and Grandinsson, 1997Go) or similar procedures (Sivarajasingam, 1993Go), considering CG as random effects (Van Vleck, 1987Go; Ugarte et al., 1992Go; Visscher and Goddard, 1993Go), allowing in some cases for covariances among observations in the same CG (Chauhan and Thompson, 1986Go; Wade et al., 1990Go). However, no clear advantages were found in those studies. In our study, the use of procedures that assign observations to more than one CG might not be very advantageous, particularly for CG spanning shorter periods of time, where the lack of continuity is less critical. Moreover, using fuzzy classification has been found to produce instability of CG solutions when having a small proportion of observations in any one class (Strandberg and Grandinsson, 1997Go). Considering CG as random effects might help in improving the low accuracy of prediction of genetic values. However, bias of the prediction of genetic values may occur if an association between level of CG and genetic level of animals exists, which is difficult to evaluate in this population due to the connectedness problems. The assumption of stationary mean required to use simple autoregressive models may not be adequate in this population where a strong environmental influence seems to be present over the year. Therefore, the increase in computing cost and the difficulties in obtaining reliable estimates for procedures accounting for a continuous time trend might be counterbalanced by sufficiently large improvements in accuracy of prediction in this population. The use of those procedures might be less relevant if CG spanning shorter periods of time are to be used.


    Implications
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Implications
 Literature Cited
 
Overall, it seems that according to the evaluation criteria used in this study, average time span rendered by the different definitions of contemporary groups had a major effect in the way the definitions were ranked. Thus, definitions involving shorter periods of time, such as herd-year-month or herd cluster spanning 30 d, would be preferred for this population. This is because those definitions yield a better adjustment of the environmental changes over time, without a large loss of accuracy. However, from the breeder’s point of view, the loss of information associated with these definitions might be unacceptable. The definition, herd cluster spanning 30 d minimum and 90 d maximum, which provided intermediate results but relatively close to the optimum definitions with a much lower loss of information, might provide a compromise solution in this population. Bayesian criteria have proved to have some advantages (in terms of providing measures that summarize model plausibility in a general framework) that are difficult to overcome in the classical approach.


    Footnotes
 
1 This work was funded by a grant from the MCYT of Spain (RTA01-054). The authors acknowledge the Avileña-Negra Ibérica Breed Association for providing the data. Back

2 Correspondence: Ctra. de la Coruña Km. 7.5; 28040 Madrid (phone: +34-91-3476742; fax: +34-91-3572293; e-mail: mjc{at}inia.es).

Received for publication April 7, 2004. Accepted for publication August 27, 2004.


    Literature Cited
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Implications
 Literature Cited
 


Clément, V., B. Bibé, E. Verrier, J. M. Elsen, E. Manfredi, J. Bouix, and E. Hanocq. 2001. Simulation analysis to test the influence of model adequacy and data structure on the estimation of genetic parameters for traits with direct and maternal effects. Genet. Sel. Evol. 33:369–398.[Medline]

Chauhan, V. P. S., and R. Thompson. 1986. Dairy sire evaluation using a ‘rolling months’ model. J. Anim. Breed. Genet. 103:321–333.

Crump, R. E., N. R. Wray, R. Thompson, and G. Simm. 1997. Assigning pedigree beef performance records to contemporary groups taking account of within-herd calving patterns. Anim. Sci. 65:193–198.

García-Cortés, L. A., M. Rico, and E. Groeneveld. 1998. Using coupling with the Gibbs sampler to assess convergence in animal models. J. Anim. Sci. 76:441–447.[Abstract/Free Full Text]

Gelfand, A. E. 1996. Model determination using sampling-based methods. Pages 1454–1461 in Markov Chain Montecarlo in Practice, W. R. Gilks, S. Richardson, and D.J. Spiegelhalter, ed. Chapman and Hall, London, U.K.

Gelfand, A. E., D. K. Dey, and H. Chang. 1992. Model determination using predictive distributions with implementation via sampling-based methods. Pages 147–167 in Bayesian Statistics 4, J. M. Bernardo, J. O. Berger, A. P. David, and A. F. M. Smith, ed. Oxford Univ. Press, Oxford, U.K.

Gelman, A., B. J. Carlin, H. S. Stern, and D. B. Rubin. 1995. Bayesian Data Analysis. Chapman and Hall, London, U.K.

Johnson, V. E. 1996. Studying convergence of Markov chain Monte Carlo algorithms using coupled sample paths. J. Am. Stat. Assoc. 91:154–166.

Kass, R. E., and A. E. Raftery. 1995. Bayes factors. J. Am. Stat. Assoc. 90:773–795.

Legarra, A., P. López-Romero, and E. Ugarte. 2002. Bayesian model selection: An application to genetic evaluation of the Latxa dairy sheep. Pages 521–523 in Proc. 7th World Cong. Genet. Appl. Livest. Prod., Montpellier, France.

López-Romero, P., R. Rekaya, and M. J. Carabaño. 2003. Assessment of homogeneity vs. heterogeneity of residual variance in random regression test day models in a Bayesian analysis. J. Dairy Sci. 86:3374–3385.[Abstract/Free Full Text]

Newton, M. A., and A. E. Raftery. 1994. Approximate Bayesian inference with the weighted likelihood bootstrap. J. Roy. Stat. Soc. B 56:3–48.

Raftery, A., and S. M. Lewis. 1992. How many iterations in the Gibbs sampler. Pages 763–777 in Bayesian Statistics 4. J. M. Bernardo, J. O. Berger, A. P. David, and A. F. M. Smith, ed. Oxford Univ. Press, Oxford, U.K.

Schmitz, F., R. W. Everett, and R. L. Quaas. 1991. Herd-year-season clustering. J. Dairy Sci. 74:629–636.[Abstract]

Sivarajasingam, S. 1993. Comparison of alternative methods of handling contemporary group effects in animal model prediction. J. Anim. Breed. Genet. 110:401–411.

Sorensen, D., and D. Gianola. 2003. Statistics for Biology and Health. Springer-Verlag, New York, NY.

Strandberg, E., and K. Grandinsson. 1997. Adjusting for seasonal effects in an animal model using fuzzy classification. Proc. of the 1997 Interbull Meeting, Vienna, Austria. Interbull Bulletin 16:100–103.

Ugarte, E., R. Alenda, and M. J. Carabaño. 1992. Fixed or random groups in genetic evaluations. J. Dairy Sci. 75:269–278.[Abstract]

Van Bebber, J., N. Reinsch, W. Junge, and E. Kalm. 1998. A Kalman filter based procedure to cluster adjacent herd-test-days into comparison groups. Proc. 49th EAAP Mtg., Warsaw Poland.

Van Vleck, L. D. 1987. Contemporary groups for genetic evaluations. J. Dairy Sci. 70:2456–2464.

Varona, L., C. Moreno, L. A. García-Cortés, and J. Altarriba. 1997. Model determination in a case of heterogeneity of variance using sampling techniques. J. Anim. Breed. Genet. 114:1–12.

Visscher, P. M., and M. E. Goddard. 1993. Fixed and random contemporary groups. J. Dairy Sci. 76:1444–1454.[Abstract]

Wade, K. M., R. L. Quaas, and L. D. Van Vleck. 1990. Mixed linear models with an autoregressive error structure. Pages 508–511 in Proc. 4th World Cong. Genet. Appl. Livest. Prod., Edinburgh, U.K.

Wiggans, G. R., I. Misztal, and L. D. Van Vleck. 1988. Implementation of an animal model for genetic evaluation of dairy cattle in the United States. Proc. of the Animal Model Workshop, Edmonton (Canada). J. Dairy Sci. 71(Suppl. 2):54–69.


This article has been cited by other articles:


Home page
J DAIRY SCIHome page
J. Vasconcelos, F. Santos, A. Bagnato, and J. Carvalheira
Effects of Clustering Herds with Small-Sized Contemporary Groups in Dairy Cattle Genetic Evaluations
J Dairy Sci, January 1, 2008; 91(1): 377 - 384.
[Abstract] [Full Text] [PDF]


Home page
J ANIM SCIHome page
R. J. C. Cantet, A. N. Birchmeier, A. W. C. Cayo, and C. Fioretti
Semiparametric animal models via penalized splines as alternatives to models with contemporary groups
J Anim Sci, November 1, 2005; 83(11): 2482 - 2494.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Carabaño, M. J.
Right arrow Articles by Díaz, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Carabaño, M. J.
Right arrow Articles by Díaz, C.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS