|
|
||||||||
ANIMAL GENETICS |

* University of Guelph, Guelph, Ontario, N1G 2W1 Canada; and
and
GenSys Consultores Associados S/S Ltda, Porto Alegre, RS, Brazil
| Abstract |
|---|
|
|
|---|
Key Words: Crossbreeding Dominance Epistatic Loss Genetic Evaluation Ridge Regression
| Introduction |
|---|
|
|
|---|
One alternative way of dealing with multicollinearity is ridge regression (Hoerl and Kennard, 1970
). The ridge estimator is obtained by solving the system of equations (X'X + kI)
k = X'y, to give
k = (X'X + kI)1 X'y, where k is the ridge parameter, with k > 0, and I is an identity matrix. In a generalized form, kI is replaced by a matrix K, where K = diag (k1 k2 . . . kp), with ki
0. The ridge estimators are biased, but might be useful in providing estimates that are more precise and therefore more stable than least squares estimates when multicollinearity is of concern. With an optimal choice of the ridge parameter matrix K, the ridge estimators have smaller mean squared error than the LS estimators (Lowerre, 1974
; Gruber, 1998
).
The objectives of this study were to identify sources and degree of multicollinearity in the genetic evaluation of a multibreed beef cattle populations and to apply ridge regression to obtain estimates of direct and maternal breed additive, dominance, and epistatic loss genetic effects compared with ordinary least squares method.
| Materials and Methods |
|---|
|
|
|---|
The method used to check for connectedness was the total number of direct genetic links between contemporary groups due to common sires and dams (Fries, 1998
; Roso et al., 2004
). Contemporary groups with more than 10 calves and with at least 10 direct genetic links and two classes of direct or maternal heterozygosities were considered connected and were retained for the analysis. There were nine classes of direct and maternal heterozygosities with an interval of 0.125, ranging from 0 to 1. The resulting dataset used in the analysis included 478,466 calves with a pedigree file of 714,220 animals.
Predictor Variables of Fixed Genetic Effects
Breed Additive Effects.
Coefficients for direct and maternal breed additive effects were equal to the proportion of each breed in the breed composition of the calf and in the breed composition of the dam (Rodríguez-Almeida et al., 1997
), respectively.
Dominance Effects.
Coefficients of direct (HD) and maternal (HM) dominance effects were equal to expected direct and maternal breed heterozygosities (Rodríguez-Almeida et al., 1997
), respectively. The HD and HM were calculated using the following equations:
![]() |
![]() |
where nb is the number of breeds, and Si, Di, MGSi, and MGDi are the fractions of the ith breed for the sire, dam, maternal grandsire, and maternal granddam breed composition, respectively.
Epistatic Loss Effects.
The assumption underlying the estimation of epistatic loss effects was that parents with larger heterozygosities produced more recombinant gametes than parents with smaller heterozygosities. Thus, the coefficients for direct (ED) and maternal (EM) epistatic loss effects were calculated as the average breed heterozygosities in uniting gametes that generated the individual (Fries et al., 2000
). Epistatic loss was assumed to be proportional to the average heterozygosity observed in the parents of an individual, and it will achieve its largest value when both parents are F1. The ED and EM were calculated as follows:
![]() |
![]() |
where HSire, HDam, HMGS, and HMGD are the expected breed heterozygosities of the sire, dam, maternal grandsire, and maternal granddam, respectively. The average epistatic loss due to the breakdown of all kinds of gene interactions involving two or more loci, as deviation from the average additive and dominance effects, will be estimated by ED and EM (Fries et. al., 2002
).
Multicollinearity Diagnostics
To identify possible linear dependencies among covariates included in the model, various measures of the degree of multicollinearity were obtained.
Variance Inflation Factor.
The variance inflation factor is the most common measure of multicollinearity. If Ri 2 is the coefficient of determination resulting when the predictor variable Xi is regressed on all the remaining predictor variables, the variance inflation factor for Xi (VIFi) is given by:
![]() |
The VIF for ordinary LS are the diagonal elements of the inverse of the simple correlation matrix. The VIF indicate the inflation in the variance of each regression coefficient compared with a situation of orthogonality. The decision to consider a VIF to be large was essentially arbitrary. Usually, values larger than 10 suggest that multicollinearity may be causing estimation problems (Chatterjee et al., 2000
).
Condition Index.
In the presence of multicollinearity, the determinant of the correlation matrix among predictor variables is very small. Because the determinant also is equal to the product of eigenvalues
i, the presence of one or more small eigenvalues results in a small determinant, thereby indicating multicollinearity. A measure of multicollinearity called condition index (CI) is obtained for each eigenvalue by computing:
![]() |
where
max is the largest eigenvalue, and
i is the ith eigenvalue of the correlation matrix. Large CIi indicates dependencies among covariates because
i will be close to zero. Belsley (1991)
suggested that a CI between 10 and 30 would indicate possible problems of multicollinearity, and CI larger than 30 suggest the presence of multicollinearity.
Variance-Decomposition Proportions Associated with the Eigenvalues.
This statistic indicates variables that are involved in linear dependencies and how much of the variance of the parameter estimate is associated with each eigenvalue. Following Belsley (1991)
,
![]() |
where
2 is the residual variance estimate, V is a matrix containing the eigenvectors, and
is a diagonal matrix of eigenvalues, (i.e., diag (
1
2 . . .
p)). Writing V = vij, the variance of the ith element of b, the vector of regression coefficients, can be decomposed into a sum of p components, each associated with one eigenvalue, as follows:
![]() |
where p is the number of predictor variables.
Because eigenvalues appear in the denominator, variance components associated with dependencies (small
j) will be relatively large compared to the other components. Thus, a high proportion of two or more coefficients associated with the same small eigenvalue provides evidence that the corresponding dependencies are causing problems.
Let
,, with i = 1, . . ., p. The proportion of the variance of the ith regression coefficient associated with the jth component of its decomposition is obtained as follows:
![]() |
with i, j = 1, . . ., p.
An approach recommended by Belsley et al. (1980)
is to identify eigenvalues
j that have a CI greater than 30. Variables with variance-decomposition proportions
ji larger than 0.5 for each of these eigenvalues are candidates for linear dependencies. Measures of multi-collinearity were obtained after standardization (centering and scaling) of predictor variables, as recommended by Freund and Littell (2000)
. The regression procedure, option COLLINOINT, of the SAS statistical software (SAS Inst., Inc., Cary, NC) was used to perform computations.
Genetic Analysis
The genetic model for preweaning gain, in matrix notation, was:
![]() | [1] |
where y = vector of observations and b = vector of fixed genetic effects. This vector included direct and maternal breed additive, dominance, and epistatic loss effects; v = vector of fixed environmental effects. This vector included age of the calf as a covariate (linear and quadratic effects), and age of the dam by sex of the calf and contemporary group (herd-year-season-management group) as classification variables; a = vector of random direct additive genetic effects; m = vector of random maternal additive genetic effects; p = vector of random maternal permanent environment effects; e = vector of random residual effects; and X, F, Z, W, and S are the appropriate incidence matrices relating records to fixed genetic, fixed environmental, direct genetic, maternal genetic, and permanent environment effects, respectively. Random direct additive genetic effects, random maternal additive genetic effects, and random maternal permanent environment effects were assumed to have variance matrices equal to A
, A
, I
, and I
, respectively, where A is the additive numerator relationship matrix among animals and I is an identity matrix. Covariance between a and m was assumed to be equal to A
am. The estimates of
,
,
am,
, and
used in the analyses were 254.5, 161.2, 128.6, 94.1, and 408.2 kg2, respectively. These estimates were obtained by restricted maximum likelihood, using a subset containing 300,002 records from randomly sampled herds, to overcome computational limitations. The genetic model assumed homogeneity of variances, the same dominance and epistatic loss effects for crosses of different pairs of breeds, and no interactions between genetic and environmental effects.
Solutions for Model [1] were obtained using the procedure described below.
Step 1) Obtain solutions for v, a, m, and p, using the following model:
![]() |
where t denotes the tth iteration and y(t) = y X
(t1). In the first iteration,
was set to the values obtained by LS. The DMU program (Madsen and Jensen, 2000
) was used to perform computations.
Step 2) Using LS or ridge regression, obtain solutions for b, using the following model:
![]() |
where y(t) = y F
(t) Zâ(t) W
(t) S
(t), and
, â,
, and
are solutions obtained in the first step of the tth iteration. Programs used in Step 2 were developed using the Fortran language and the IML procedure of SAS statistical software (SAS Inst., Inc., Cary, NC).
Steps 1 and 2 were repeated until convergence. Convergence was attained when the largest absolute difference between the solutions for
in the current and in the previous iteration was smaller than 104.
Ridge Regression
The usual model for a multiple linear regression is:
![]() |
where y is a (n x 1) vector of observations, X is a (n x p) design matrix of rank p, and
is a (n x 1) vector of random residuals with assumptions E(
) = 0 and Var(
) = I
2. The unknown parameter vector, b, using the least squares criterion, is estimated as
= (X'X)1 X'y; however, estimates and their variances could be unreliable in the presence of multicollinearity. The ridge regression estimator consists of adding a small positive number on the diagonal of the X'X matrix, causing a decrease in the variance of the estimates at the expense of introducing some bias. Thus, the ridge regression estimator of b takes the general form:
![]() |
where K = diag (k1, k2, . . ., kp), ki
0. When all ki elements are equal to zero,
k reduces to the LS estimator. From a Bayesian viewpoint (Goldstein and Smith, 1974
; Sorensen and Gianola, 2002
), the ridge regression can be considered as an estimate of b from the data subject to prior knowledge about the parameter, which is supplemented by the ridge parameter k. Given that k =
2/
, where
2 is the residual variance and
is a measure of the spread of the elements of b, large values of k imply an a priori belief that more restricted values of b are more likely than larger values, whereas small values of k imply an a priori belief that quite a large range of values of b are not unreasonable. The ridge regression is consistent with b considered as a random effect, given that variances
2 and
are known.
The variance-covariance matrix of
k estimates is as follows:
![]() |
where
2 is the LS estimator of
2.
The mean square error (MSE), a measure of the expected squared distance between
k and b, is as follows:
![]() |
![]() |
where Z = (X'X + K)1 X'X. The ridge regression is advocated when the introduction of some bias in the estimates is compensated for by a substantial decrease in the estimation error variance, resulting in smaller MSE compared with LS (Hoerl and Kennard, 1970
).
The variance inflation factors of the ridge regression coefficients are the diagonal elements of the matrix (X'X + K)1 X'X(X'X + K)1.
Ridge regression analyses were carried out in the standardized form of the model, using the correlation matrix. After estimation, the estimates were transformed to and presented on the original scale.
Objective Methods for Selecting the Ridge Parameter K
The theoretical optimal value of the ridge parameter K, which results in smaller MSE than that obtained with LS, depends on the unknown parameter vector b and the unknown error variance
2 (Hoerl and Kennard, 1970
). Consequently, K must be determined empirically or estimated from the data, and there is no way to know whether the theoretical optimal value of the ridge parameter K was attained in a specific problem. Many methods have been proposed in the literature for selecting appropriate ki values, but there is no consensus on which method is the most adequate (Gruber, 1998
). In general, the best method to estimate an optimal K depends on the data and model used. Here, the ridge parameter K was estimated through two objective methods.
Generalized Ridge Estimator of Hoerl and Kennard (R1).
In the Generalized Ridge Regression Estimator of Hoerl and Kennard (Hoerl and Kennard, 1970
), an orthogonal transformation V is applied to reduce X'X to a diagonal matrix. We have that
![]() |
where V is a (p x p) orthogonal matrix, whose columns v1, v2, . . ., vp are the eigenvectors of X'X and
is a diagonal matrix of eigenvalues of X'X. Writing:
![]() |
the model y = Xb +
can then be written as
![]() |
The generalized ridge regression procedure is then defined as follows:
![]() |
where K is a diagonal matrix with nonnegative diagonal elements k1, k1, . . ., kp.
Hoerl and Kennard (1970)
showed that theoretically optimal values for ki, that minimize the MSE of the generalized ridge estimator, are given by ki =
2/
2i. These authors suggested an iterative procedure to estimate ki. This procedure can be summarized as follows: 1) Reduce the system to canonical form; 2) Take the LS solutions as the starting point to compute
i(1) =
2/
i 2, i = 1, 2, . . ., p, where
i 2 =
i 2 and
i are the solutions from the LS equations; 3) Use the
i(j) values in the ridge regression equation to obtain
i(j+1), where j denotes the jth iteration; and 4) Compute a new estimate for ki using
i(j+1)=
2/
i(j+1) 2.
Go to Step 3 until convergence of
i. Convergence was achieved when the maximum difference between
i from two consecutive iterations was smaller than 107. After convergence the estimates
k were converted back to
k through the equation
k= V
k.
Bootstrap in Combination with Cross Validation (R2).
The bootstrap and cross validation for estimating the ridge parameter, originally suggested by Delaney and Chatterjee (1986)
, is based on minimization of the mean squared error of prediction (MSEP). This method was extended to consider the instability of each predictor variable, avoiding the introduction of unnecessary bias to those predictor variables not seriously involved in multicollinearity. The diagonal elements (ki) of the ridge parameter K were estimated by
![]() |
where VIFi is the variance inflation factor of the ith predictor variable. A value of
has to be chosen to generate a
matrix that minimizes the MSEP. The magnitude of the elements
i will be proportional to the variance inflation of each predictor variable.
The MSEP was estimated by combining bootstrap with cross validation. The bootstrap is a powerful re-sampling procedure originally proposed by Efron (1979)
. In the bootstrap procedure, a random sample of n observations with replacement is taken for a particular population. The sample obtained in this manner is known as a bootstrap sample. If a large number of bootstrap samples are performed, the estimates of the parameters of interest will approach the true parameter.
A strategy using bootstrap in combination with cross-validation to estimate the ridge parameter matrix K can be summarized as follows: 1) Select a vector
containing values of
between 0 and 1; 2) Choose a bootstrap sample of n observations with replacement; and 3) For each bootstrap sample and each value of
, obtain
and the ridge estimator vector
k, where
= diag(
1
2 . . .
p). Use the ridge estimator to predict observations that were not selected in the bootstrap sample. If the prediction vector for the unselected observations is
k(
), the MSEP of the jth bootstrap sample and
ridge parameter, given
, is
![]() |
where Nj is the number of unselected observations (randomly determined) in the jth bootstrap sample. 4) Repeat Steps 2 and 3 for B bootstrap samples and obtain a final average of MSEP for each
value as:
![]() |
A value of
that generates a matrix
of ridge parameters that minimizes the MSEP is then chosen. The MSEP were obtained for values of
ranging from 0 to 1, with increments of 0.001, on the basis of 100 bootstrap samples.
Alternative Analyses
Three alternative analyses were performed using Model [1]: 1) ADE-LS, an additive-dominance-epistatic (ADE) model with breed additive, dominance, and epistatic loss effects estimated by LS; 2) ADE-R1, the ADE model with breed additive, dominance, and epistatic loss effect estimated by ridge regression method R1; and 3) ADE-R2, the ADE model with breed additive, dominance, and epistatic loss effect estimated by ridge regression method R2.
Mean Squared Error of Prediction and Variance Inflation Factor
The performance of ridge regression methods has been generally evaluated in terms of the decrease in MSE compared with LS using computer simulation (Gruber, 1998
). A given simulation cannot hope to cover a large range of practical situations, particularly when a large number of factors are involved. In this study, the performance of ridge regression methods was evaluated in terms of MSEP, as in Delaney and Chatterjee (1986)
and Hébel et al. (1993)
, under the assumption that smaller MSE will result in smaller MSEP. A procedure combining bootstrap resampling and cross-validation was used to obtain the average MSEP over 100 bootstrap samples. This approach was deemed to be appropriate because sample statistics based on a large number of bootstrap samples tend to approach true parameter values (Delaney and Chatterjee, 1986
). Average VIF of estimates computed by ridge regression methods and LS also were obtained and used to evaluate the performance of ridge regression methods. A model that results in lower VIF and smaller MSEP is desirable because these statistics indicate stability of estimates and ability of the model to predict future observations, respectively.
Bias Measurement
A known relationship between the ridge parameter and the variance and bias of ridge regression estimates is that, as the ridge parameter increases, the variance decreases and the bias increases. Given that E(
) = b and E(
k) = (X'X + K)1 X'Xb = Hb, a measurement of the bias of the ridge regression vector
k was computed as
, where || || denotes the Euclidean norm. Thus, a bias measurement closer to zero for a particular ridge regression method indicates smaller bias in the estimates.
Comparison of Across-Breed Estimated Breeding Values
Across-breed estimated breeding values (AB-EBV) from models that used LS and ridge regression methods for the estimation of fixed genetic effects were compared through correlations (Pearson and Spearman), and percentages of coincidence for different proportions of selected (top 1, 10, 20, and 40%) sires, dams, and calves. Across-breed estimated breeding values were calculated by adding EBV and estimates of direct breed additive effects, weighted by the breed composition of the animal. In addition to the analyses using ADE model, 3 alternative additive-dominance (AD) models (AD-AH, AD-LS, and AD-R2) were considered in the AB-EBV comparisons. For AD-AH, the preweaning gain was pre-adjusted for expected heterosis based on averages from literature. A heterosis (direct and maternal) of 5% for an animal with heterozygosity of 100% was assumed. Breed additive effects were estimated by LS. For AD-LS, breed additive and dominance effects were estimated by LS. For AD-R2, this model differed from Model AD-LS by the fact that breed additive and dominance (heterosis) effects were estimated by R2 instead of LS. The AB-EBV from Model ADE-R2 were assumed as the reference estimates for calculating Pearson and Spearman correlations, and percentages of coincidence with all other models.
| Results and Discussion |
|---|
|
|
|---|
|
|
|
|
= 0.00189). This eigenvalue was associated with condition index 38.85, reflecting dependencies between predictor variables. The second smallest eigenvalue was equal to 0.05078, with corresponding condition index equal to 7.50. Variance inflation factors shown in Figure 1
Variance-decomposition proportions associated with the largest condition index (CI = 38.85) suggest that breed composition was the main candidate for the dependencies (Table 3
). For nine direct and five maternal breed additive effects, a fraction of the variance of the estimated regression coefficients larger than 50% was associated with dependencies indicated by the largest condition index. Multicollinearity involving breed composition can be partially explained by the mathematical constraint among breeds because breed portions of the breed composition of an animal add to one, and the breed composition of a calf is equal to the average breed composition of the sire and of the dam. In practice, after fitting breeds that are more representative in the data, less new information is added by fitting the remaining breeds. Similarly, after fitting the breed of the dam, less new information is added fitting the breed of the calf, and vice versa.
Combining information from Figure 1
with information from Table 3
, breeds with smaller numbers of records (BD, GV, MA, SA, and SH) had lower VIF and a lower proportion of the variance of the estimates associated with linear dependences among predictor variables. In contrast, breeds with larger number of records and lower SE for the estimated regression coefficients (AN, CH, HE, LM, and SM) had higher VIF and proportion of the variance of the estimates associated with linear dependences among predictor variables.
The second largest condition index (CI = 7.50) indicates possible dependencies involving maternal dominance and direct epistatic loss effects (Table 3
). Proportions equal to 85 and 83% of the variances of estimated regression coefficients of maternal dominance and direct epistatic loss effects, respectively, were associated with linear dependences between the corresponding predictor variables. This multicollinearity problem can be a consequence of the small proportion (10.7%) of crossbred sires in the data.
Ridge Parameter K and Bias Measurement
Ridge regression models that add the same amount to the diagonal of the matrix X'X are known in the literature as ordinary ridge regression (Gruber, 1998
). Preliminary analyses using different ordinary ridge regression methods, however, resulted in a small reduction in the variance inflation factors and similar MSEP to LS (data not shown), in line with Delaney and Chatterjee (1986)
. These authors stated that the ordinary ridge regression model is not appropriate for multicollinearity caused by physical or mathematical constraints in the data. Because breed composition sums to one for each observation, a mathematical constraint was present in the data. Generalized ridge regression methods, such as those used in this investigation, are better suited to deal with this source of multicollinearity.
The ridge parameters obtained by the two objective methods are shown in Table 4
. The selected constant
for calculating the ridge parameter K in R2, that minimizes the MSEP, was equal to 0.04. The mean and the SD of the number of unselected observations over the bootstrap samples in the last iteration for solving the genetic model were equal to 176,002 and 257, respectively. The elements of the ridge parameter K obtained on the basis of R1 were generally smaller than those on the basis of R2. Consequently, smaller bias in the estimates of regression coefficients of R1, compared with R2, can be expected. Bias measurements of R1 and R2 were equal to 1.49 and 5.61%, respectively.
|
|
From Table 5
, the two ridge regression methods provided a general improvement over the LS, when evaluated by MSEP and average VIF obtained over a large number of bootstrap samples. Additional information for comparing the ridge regression methods, based on the decrease in instability of each parameter estimate, is presented in Figure 2
. When multicollinearity was of concern, both ridge regression methods caused a substantial decrease in the VIF, but VIF given by R2 were smaller than VIF given by R1 for most predictor variables.
|
|
Table 6
shows that estimates of direct and maternal breed additive effects of BD, GV, MA, SA, and SH still had relatively large SE under ridge regression methods compared with the remaining breeds. It was previously shown, however, that variance-decomposition proportions of maternal breed additive effects for BD, GV, MA, SA, and SH associated with the largest condition index were lower than 0.5 (Table 3
). Thus, the large SE of the estimates of maternal breed effects for BD, GV, MA, SA, and SH were more likely a consequence of the relatively small number of observations in these breeds than multicollinearity involving the corresponding predictor variables.
In addition to the statistical advantages of more stable solutions for breed differences, the practical implications of the observed differences in breed solutions across the three analyses should be carefully considered with respect to across breed selection using the AB-EBV. Results of this study indicate that estimation problems associated with multicollinearity among predictor variables, often seen in multibreed genetic evaluations, can be greatly minimized using ridge regression methods. Nevertheless, in the present study it was not possible to determine the correlation between estimated and true across breed breeding values (accuracy) for the alternative analyses. A simulation study would likely elucidate this question.
Dominance and Epistatic Loss Effects
Dominance effects indicate deviation from average dominance within breed due to differences in gene frequencies between breeds (breed heterozygosity). Epistatic loss effects express the recombination loss due to breed heterozygosity in relation to F2 calves and F2 dams, respectively. According to Koch et al. (1985)
, long-term selection within a breed can increase frequencies of favorable non-allelic combinations, which result in favorable effects on phenotype. When breeds are crossed, random recombination of loci in the progeny tends to decrease the frequencies of these parental breed combinations towards Hardy-Weinberg equilibrium, resulting in recombination loss.
Estimates of dominance and epistatic loss effects and respective SE obtained by LS and by ridge regression methods are presented in Table 7
. Both direct and maternal dominance effects resulted in a favorable effect on preweaning gain, whereas direct and maternal epistatic loss decreased preweaning gain, as expected. The estimate of maternal epistatic loss, however, was not different from zero (P > 0.05). Because predictor variables HM and ED were involved in multicollinearity, ridge regression methods caused substantial changes in the estimates of maternal dominance and direct epistatic loss effects. A small decrease in the SE of estimates of maternal dominance and direct epistatic loss effects was obtained through ridge regression methods. This decrease was slightly more pronounced under the R2 method. Estimated maternal dominance and direct epistatic loss effects were of opposite sign and comparable magnitude, and had large SE under LS (Table 7
). Both ridge regression methods R1 and R2 seemed to slightly alleviate the multicollinearity involving maternal dominance and direct epistatic loss effects. The estimates of maternal dominance and direct epistatic loss effects were decreased from 2.28 and 2.19% in LS to 1.72 and 1.04% in the R1, and to 1.55 and 0.66% in the R2, respectively.
|
Sampling Correlations
To obtain information on the degree of confounding between estimates given by LS, R1, and R2, sampling correlations among estimates of breed additive, dominance, and epistatic loss effects were calculated. Overall averages of absolute values for pairwise correlations among estimates under LS, R1, and R2 were equal to 0.49, 0.30, and 0.18, respectively. These correlations indicated a substantial decrease in the degree of overall association between estimates given by ridge regression methods, especially with R2, compared with LS. The decrease in the degree of association between estimates was more pronounced between estimates of direct and maternal breed effects involving different breeds than between direct and maternal breed effects for the same breed.
Figure 3
shows correlations between estimates of maternal dominance and direct epistatic loss effects and between direct and maternal breed additive effects for the same breed. Averages of these correlations were equal to 0.88, 0.79, and 0.74 under LS, R1, and R2, respectively. Under ridge regression methods, breeds with more multicollinearity showed a substantial decrease in the degree of confounding between estimates of direct and maternal breed additive effects, noticeably under R2. In contrast, estimates of maternal dominance and direct epistatic loss effects were still highly correlated under both ridge regression methods. The correlation between estimates of maternal dominance and direct epistatic loss was 0.94 in the LS and 0.93 in both ridge regression methods. These results suggest that the variety of crosses available in the data, aggravated by linear dependences between HM and ED, caused by mainly purebred sires being used, did not comprise enough information to effectively separate maternal dominance and direct epistatic loss effects, regardless of the fact that both effects were statistically significant.
|
|
Models AD-LS and ADE-LS were similarly correlated with Model ADE-R2. Compared with Model AD-AH, Models AD-LS and ADE-LS had larger percentages of coincidence with Model ADE-R2; however, the difference with Model ADE-R2 was still substantial. Among the 1% highest AB-EBV, approximately 30% of selected animals, based on ADE-R2, would not be selected based on Models AD-LS and ADE-LS. Among the 40% highest AB-EBV under Model ADE-R2, approximately 20% of selected animals would not be selected based on Models AD-LS and ADE-LS. Model ADE-R1 showed a larger percentage of coincidence with Model ADE-R2 than Models AD-AH, AD-LS, and ADE-LS, but differences with ADE-R2 were still considerable. These results con-firm that the choice of the method to estimate the ridge parameter has consequences to genetic selection, resulting in different ranking of animals on the basis of across-breed estimated breeding values.
The ridge regression estimator is a type of weighted average between the actual data and other data taken according to an orthogonal experiment (in Bayesian terms, the prior information), for which the response values are arbitrarily set to zero (Marquardt, 1970
). An alternative to ridge regression is to combine the actual data with prior information from the literature. This Bayesian procedure was used to estimate breed additive and heterosis effects in a multibreed model (Pollak and Quaas, 1998
). The choice of prior distributions is not simple in practice. Pollak and Quaas (1998)
used prior distributions based on expected values from literature such that neither prior information nor data would dominate the solutions. For heterosis, however, because the available data did not provide reasonable estimates, a prior distribution was chosen such that most of the weight was on the prior information. A comparison of ridge regression with the Bayesian procedure used by Pollak and Quaas (1998)
would be recommended.
| Implications |
|---|
|
|
|---|
| Footnotes |
|---|
2 Correspondence: Dept. of Anim. and Poultry Sci., Room 018 (phone: 1-519-824-4120, ext. 58650; fax: 1-519-767-0573; e-mail: Schenkel{at}uoguelph.ca).
Received for publication February 4, 2005. Accepted for publication April 22, 2005.
| Literature Cited |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
R. Carvalheiro, E. C. G. Pimentel, V. Cardoso, S. A. Queiroz, and L. A. Fries Genetic effects on preweaning weight gain of Nelore-Hereford calves according to different models and estimation methods J Anim Sci, November 1, 2006; 84(11): 2925 - 2933. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||