|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ANIMAL GENETICS |


* Laboratory of Animal Breeding and Genetics, Graduate School of Agriculture, Kyoto University, Kyoto, 606-8502, Japan; and
Department of Animal Research, Agura Farm, Nasushiobara, 325-0033, Japan
| Abstract |
|---|
|
|
|---|
Key Words: beef cattle carcass weight genetic evaluation heterogeneity of variance model selection statistical validation
| INTRODUCTION |
|---|
|
|
|---|
Japanese Black cattle are a main beef breed in Japan. Genetic evaluation of sires and dams has been carried out, and dramatic genetic gains for carcass traits have been achieved within prefectures (Sasaki et al., 2006
). Recently, a national genetic evaluation has provided the possibility of accelerating the genetic improvement. The management and environmental conditions of the fattening farms vary greatly from the north to the south of Japan; these differences may cause heterogeneity of variance and bias in the national cattle evaluation.
Several procedures for accounting for heterogeneous variances have been presented. Among them, the procedure using standardization of records before solving the mixed model equations (i.e., 2-step procedure) may work well when the size of the subclasses is relatively small and have applicability to BLUP with the animal model (Hill, 1984
). Modeling of the heterogeneity of variance seems to be important in determining the effectiveness of the 2-step procedure; therefore, a 2-step procedure that allows setting and comparing various methods of modeling the heterogeneity of variance is needed.
The objectives of this study were 1) to quantify the heterogeneity of variance in Japanese Black cattle, 2) to develop a 2-step procedure that allows setting and comparing various methods of modeling the heterogeneity of variance, and 3) to evaluate the effectiveness of the method developed using field data from Japanese Black cattle.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Data
Records of carcass weights of Japanese Black cattle were collected from 26 carcass markets from 1997 to 2005. All animals used for this study were fattened at 130 farms of Agura Kyosai Farm Ltd. across Japan. Odani et al. (2004)
demonstrated that the fattening farms of Agura Kyosai Farm Ltd. across Japan were genetically connected, and a mathematical model containing the combination of fattening farm, market, year, and sex provided the best fit. Accordingly, farm-market-year-sex (FMYS) subclasses were created. The FMYS subclasses with less than 5 observations were excluded from the analysis. After data editing, a total of 96,950 carcass weight records with 2,767 FMYS subclasses were available for this study. The sample size of the FMYS subclasses ranged from 5 to 553. The average number of observations per FMYS subclass was 35.4. The number of FMYS subclasses with less than or equal to 10 observations was 425. Of the 96,950 total records, 57,461 records collected from 1997 to 2002 with 1,591 FMYS subclasses were extracted and denoted as data set I, which was used to quantify the heterogeneity of variance and to evaluate the method developed for its adjustment. The whole data set was denoted as data set II and was used to evaluate the effectiveness of the method developed in terms of the ability to predict breeding values.
Degree of Heterogeneity of Phenotypic Variance in Japanese Black Cattle
Within-FMYS phenotypic variance was estimated by the following expression:
![]() | [1] |
where yi = the ni x 1 vector of carcass weights;
i = the mean of carcass weight; and ni = the number of animals in the ith FMYS subclass.
Heterogeneity of the within-FMYS phenotypic variances was measured by Bartletts test (Kendall and Stuart, 1979
).
BLUP Evaluation
Breeding values were predicted using the following animal model:
![]() | [2] |
where yij = the carcass weight of animal j in the ith FMYS subclass; FMYSi = the fixed effect of the ith FMYS subclass; b1 and b2 = linear and quadratic regression coefficients, respectively, on fattening period (i.e., period from start of fattening to shipping to market in number of days); tij and
= the fattening period of a particular animal and the arithmetic mean of the fattening period, respectively; uj = the breeding value of animal j; and eij = the random residual. Ancestors were traced back 2 generations from the fattened steers and heifers and added as pedigree animals. Consequently, the total number of animals included in the analysis was 128,505 and 197,883 for data set I and II, respectively. Number of sires of steers and heifers included in the pedigree file was 739 and 911 for data set I and II, respectively. Variance components and breeding values were estimated using the set of multiple-trait, derivative-free, REML (MTDFREML) programs (Boldman et al., 1993
).
Adjustment for Heterogeneity of Phenotypic Variance
Maximum Likelihood Estimation.
The estimates of variance obtained from a maximum likelihood method with a log-linear model were used for the adjustment. The maximum likelihood estimation discussed by Harvey (1976)
and Aitkin (1987)
was applied. The general principle is outlined below:
![]() | [3] |
where yi = the carcass weight of animal i; ß = the l x1 vector of parameters of fixed effects; and xi = the l x 1 vector of explanatory variables linking yi and ß. The error terms (ei) are independently distributed as N(0,
2i). The variance model is the log-linear form:
![]() | [4] |
where
= the m x 1 vector of parameters of fixed effects and zi = the m x 1 vector of explanatory variables linking log
2i and
. In addition, zi may contain some or all of the explanatory variables included in xi and other explanatory variables not included in xi.
Let y be the n x 1 vector of observations. The log-likelihood under models [3] and [4] is:
![]() | [5] |
where di = (yi – xi 'ß)2. The iterative equations for estimating ß and
are written separately as:
![]() | [6] |
![]() | [7] |
where d = the n x 1 vector whose ith element is di; X and Z = n x l and n x m matrices of the explanatory variables, respectively; and
= the diagonal variance matrix whose ith diagonal element is
2i . 1n = the n x 1 vector of unit elements; and t = the tth iteration.
The iteration begins by taking
2. The deviance (–2logL) is calculated for each iteration. Equations [6] and [7] are alternated until the deviance converges. Convergence was considered to have been reached when the absolute difference between the deviances of 2 consecutive iterations was less than 10–8.
Definition of the Mathematical Model in Maximum Likelihood Estimation. Only 1 mathematical model was formed as a mean model, Eq. [3], where ß contains fixed FMYS subclass effects and fixed linear and quadratic regression coefficients on fattening period of the fattened animal.
On the other hand, various mathematical models were considered as a variance model, Eq. [4]. Model 1 was for the homogeneous modeling. For the heterogeneous modeling, 2 alternative methods of modeling were considered.
First, management-group modeling was considered. According to previous studies on dairy populations (Hill et al., 1983
; Weigel et al., 1993
; Ibáñez et al., 1996
; Urioste et al., 2001
), production levels of herds largely affect the heterogeneity of variance due to differences in management and due to scale effects, and factors such as period, herd size, and management practices were also reported as sources of heterogeneity of variance. In this study, the management factors (farm, market, year, and sex), the size of FMYS subclass (classified into 4 levels: 5 to 12, 13 to 23, 24 to 46, and >46 animals), and the production level of FMYS subclass (classified into 4 levels: <378 kg, 378 to 405 kg, 405 to 432 kg, and > 432 kg) seemed to cause the heterogeneity of variance. Screening of these 6 factors as possible causes of heterogeneous phenotypic variances was performed using Levenes test (Levene, 1960
). As a result, significant heterogeneity was detected across levels of all factors (P < 0.001), except for the size of FMYS subclass. These 5 factors were used to form the fixed effect part in Eq. [4]. Combining some of the 5 factors might better describe the heterogeneity of variance. Therefore, 6 different combinations of the 5 factors were set up as follows:
![]() |
where F = farm; M = market; Y = year; S = sex; and P = production level of FMYS subclass.
Alternatively, cluster modeling was considered to reduce the number of parameters fitted. Cluster modeling assumes that the variance is homogeneous among the subclasses of the same cluster and is heterogenous between subclasses of different clusters. The FMYS subclasses were grouped into clusters based on the simultaneous similarity of the phenotypic mean and variance. For the determination of the clustering, cluster analysis was performed using the SAS FASTCLUS procedure (SAS Inst. Inc., Cary, NC). Phenotypic means and variances were standardized using the STANDARD procedure of SAS when the distances between subclasses were calculated. Each cluster had at least 2 subclasses. The 7 models (models 8, 9, 10, 11, 12, 13, and 14) differed in number of clusters. The maximum numbers of clusters allowed in the clustering step for these 7 models were 10, 30, 50, 100, 200, 300, and 500, respectively. The factor of the production level of FMYS subclass was included as a fixed effect. The fixed effect part in Eq. [4] was:
![]() |
where CLFMYS = cluster consisting of various numbers of FMYS subclasses and P = production level of the FMYS subclass (4 levels).
Model Comparison.
Generally, as the number of parameters in the model increases, or the model becomes more complex, the fit is improved (the deviance decreases). A balance between fit and complexity is needed. Therefore, 14 models were compared, taking into account the balance. Schwarzs Bayesian information criterion (BIC; Schwarz, 1998
) is a criterion based on parsimony and imposes a penalty on more complicated models. A smaller value of BIC indicates a better model. The choice of an optimal model was based on BIC. The BIC was calculated as:
![]() |
where LK = the maximum of the likelihood within model K; pK = the number of parameters in model K; and n = the number of observations.
Adjustment of Observations to Constant Phenotypic Variance. To account for the heterogeneity of the within-FMYS phenotypic variances, a 2-step procedure was adopted. Carcass weights were standardized to a baseline SD as follows,
![]() | [8] |
where y*ij = the adjusted observation of the fattened animal j in the ith FMYS subclass; FMYSMLi and
ML = maximum likelihood estimates of the fixed effect of the ith FMYS subclass and
, respectively; and
BASE = the baseline SD. In this study,
2BASE was defined as 1,788.4 kg2. After adjustment, within-FMYS phenotypic variances (S*2i) and were estimated using Eq. [1], with y*i
*i, which are the adjusted observation and its mean, respectively.
Assessment of the Adjustment Based on Reduction in Heterogeneity of Phenotypic Variance
The effect of adjusting the observation for heterogeneity of within-FMYS phenotypic variances was visualized by Lorenz curves (Marshall and Olkin, 1979
) and measured by 3 indices: SD of variances, CV of variances, and Gini coefficients (Urioste et al., 2001
). In Lorenz curves, the cumulative proportion of the ordered subclass (x-axis) was plotted against the cumulative proportion of their variances (y-axis). If a Lorenz curve downwardly separates from the line of perfect equality (y = x), then a large degree of heterogeneity exists.
Validation of the Adjustment Based on Predicted Breeding Value
The effect of the adjustment on BLUP evaluation was investigated by comparing the predicted breeding values obtained using the unadjusted and adjusted data from data set I. The breeding values with the adjustment for heterogeneity of variance were obtained, replacing yij by y*ij in Eq. [2]. Spearmans rank correlations between predicted breeding values of the adjusted and unadjusted data were calculated to assess the re-ranking of animals.
Furthermore, the effect of the adjustment on the ability to predict breeding values was investigated by comparing the results of the successive genetic evaluations with data set I and data set II. The parent average (PA) for each of the sires whose progeny had a carcass record only in data set II was calculated from the predicted breeding values of their parents in data set I. The number of sires with more than 10 progeny that had a carcass record only in data set II was 47. Average number of progeny of these sires was 87.4. The PA values for these sires were compared with their predicted breeding values in data set II (û). Parent average and û were considered as the expected values and actual predicted values of the genetic merit for the sires, respectively. Hence, the ability to predict breeding values was assessed by 3 statistics: 1) mean squared error (MSE) between PA and û , 2) correlation coefficient between PA and û (rûPA), and 3) regression coefficient of û on PA (bû | PA). A small value of MSE indicates consistency between the expected values as parental average and the actual predicted values with performance progeny records and also indicates the ability to predict breeding values of future animals. The correlation rû PA depends on both the precision and bias of the evaluations, and a value of unity is preferable. The deviation of bû| PA from 1 indicates that bias exists (Reverter et al., 1994
).
Least-squares ANOVA was performed using the GLM procedure of SAS to compare the unadjusted procedure and the heterogeneity-adjusted procedure based on MSE. The squared errors between PA and û for each of the sires derived from the 2 procedures were treated as dependent variables (94 observations), and the procedure (2 levels) and the individual sire (47 levels) were treated as fixed and random effects, respectively. The correlation coefficients and the regression coefficients were calculated using the CORR and REG procedures of SAS, respectively. The test of significance of the difference between 2 correlation coefficients and the t-test of 2 regression coefficients were performed (Snedecor and Cochran, 1980
).
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
2) were estimated, and 6 FMYS subclasses, for which
2 or sample size were extreme, were extracted and listed in Table 1
2 was about 85 times as large as the smallest one. Even when subclasses with more than 100 observations were considered, the
2 ranged from 786.8 to 3,496.6 kg2. The Bartletts test rejected homogeneity of the within-FMYS phenotypic variances (P < 0.001), indicating the necessity of accounting for heterogeneity of variance in genetic evaluation of Japanese Black cattle.
|
The sizes of fattening farms and carcass markets of Japanese Black cattle are generally small (Sasaki, 1992
, 2001
); as a result, the sizes of the FMYS subclasses became relatively small. Therefore, the 2-step procedure seems to meet the conditions of this study. Modeling of the heterogeneity of variance was of interest in some of the 2-step procedures (Weigel and Lawlor, 1994
; Urioste et al., 2001
). A model containing management factors was fitted to the phenotypic variances within subclass, and the resulting solutions were used to estimate the prior variance for each subclass. The prior variances and phenotypic variances within subclass were combined using Bayesian methods to obtain the posterior variances, and the posterior variances, in turn, were used to standardize the observations. In the series of steps, the method of modeling the heterogeneity to obtain the prior variance has not been fully discussed by previous authors. Therefore, a 2-step procedure that allows setting and comparing various methods of modeling the heterogeneity of variance was developed in this study.
Fourteen log-linear models were set and compared to obtain maximum likelihood estimates of the heterogeneous within-FMYS variances. The model selection criteria of the number of parameters, the deviance, and the BIC for each model are shown in Table 2
. With respect to management-group modeling, the deviance decreased (the fit was improved), as more factors were combined in the model. In addition, estimates of the within-FMYS phenotypic variances obtained with models 2 and 3 were largely different from the
2 for small subclasses (example A, B, and F in Table 1
), whereas those obtained with models 4 to 7, where a larger number of factors were combined, were comparatively close to the
2. Models 2 and 3 seem to be unsuitable to estimate the variances, especially for small subclasses. Models 4 to 7 fitted well, but a drawback of these models was that the extremely large number of parameters (overparameterization) might reduce the accuracy of the estimates. As it turned out, all of the models using management-group modeling were worse than homogeneous modeling in terms of BIC.
|
*2) were estimated with the adjusted data and included in Table 1
*2 were more homogeneous than the
2, at least in the case of the 6 subclasses shown in Table 1
The distributions of the estimates of the within-FMYS phenotypic variances for carcass weight obtained from the unadjusted (
2) and adjusted data (
*2) are displayed as box-and-whisker plots (Figure 1
). Box-and-whisker plots show the median, upper and lower quartiles, and minimum and maximum values of the estimates of the within-FMYS phenotypic variances. Although the distribution of the
2 had a long upper tail, that of the
*2 was more symmetric. The reduction in range and difference between the upper and lower quartiles were 55.0 and 69.0%, respectively, indicating that most of the subclasses moved closer to the median after the adjustment.
|
|
|
Spearmans rank correlation coefficients based on predicted breeding values obtained with the unadjusted and adjusted data were calculated within sex. The overall correlation was high: 0.997 for sires and 0.994 for dams. When only the top 1% of dams was used, however, the correlation dropped to 0.780, indicating reranking of the elite dams due to the adjustment. On the other hand, the correlation was still high (0.947) when only the top 5% of sires was used. Therefore, the reranking was greater for dams than for sires.
Sire ranking may also be affected when progeny are nonrandomly distributed between high and low variance environments (Vinson, 1987
). However, the sire evaluation seems to be robust against heterogeneity of variance, because progeny of a sire are distributed across many subclasses (an average of 28.5 subclasses in this study). On the other hand, progeny of dams existed in only 1.6 subclasses on average; therefore, the predictions of dams were more likely to be biased by heterogeneity of variance. It is suggested that the effect of the adjustment on bull-dam selection is large. This finding is in agreement with previously reported results (Vinson, 1987
; Meuwissen and Van der Werf, 1993
; Urioste et al., 2003
).
The ability to predict genetic values estimated by BLUP was compared between the unadjusted procedure and the heterogeneity-adjusted procedure based on the MSE between PA with data set I and their predicted breeding values obtained with data set II, the correlation coefficient (rûPA) between PA and û , and the regression coefficient (bû | PA) of û on PA. The resulting statistics of the 3 criteria are shown in Table 4
. The MSE obtained with the heterogeneity-adjusted procedure was smaller than that obtained with the unadjusted procedure, and the effect of the procedure on the MSE between PA and û was significant (P = 0.026). The values of rû PA and bû | PA obtained with the heterogeneity-adjusted procedure were slightly closer to 1.0 than those obtained with the unadjusted procedure, although differences between the procedures were not statistically significant. The significant reduction in MSE suggests that the genetic evaluation becomes more accurate using the adjustment for heterogeneity of variance. The consistency between the expected values as PA and the actual predicted values obtained with progeny performance data is an important property of genetic evaluation, because the genetic performance of future animals is of interest to breeders and farmers. Consequently, the heterogeneity-adjusted procedure developed in this study proved to be effective in accounting for heterogeneity of variance.
|
| Footnotes |
|---|
3 Present address: Department of Molecular Life Science, Division of Basic Medical Science and Molecular Medicine, Tokai University School of Medicine, Isehara, 259-1193, Japan. ![]()
2 Corresponding author: nak_1896{at}kais.kyoto-u.ac.jp
Received for publication January 24, 2007. Accepted for publication June 5, 2007.
| LITERATURE CITED |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |