|
|
||||||||
ANIMAL GENETICS |
,2
,
* Departamento de Zootecnia, Universidade Estadual Paulista, Campus Jaboticabal, Sao Paulo, Brazil;
and
GenSys Consultores Associados S/S Ltda., Porto Alegre, Rio Grande do Sul, Brazil;
and
Lagoa da Serra Ltda., Sertaozinho, Sao Paulo, Brazil
| Abstract |
|---|
|
|
|---|
Nelore) exhibited greater predicted PWG. In the tropics, predicted PWG increased linearly as genotype got closer to Nelore.
Key Words: crossbreeding epistasis genotype x environment interaction heterosis multicolinearity ridge regression
| INTRODUCTION |
|---|
|
|
|---|
Considering diverse genetic and environment x genetic interaction covariables in the model may result in severe multicolinearity problems (Pimentel et al., 2006
). Consequently, estimates of the regression coefficients are usually very unstable and may differ greatly from the parameters they are estimating, even to the point of having an incorrect sign. In this case, alternative estimation methods are recommended and ridge regression (RR; Hoerl and Kennard, 1970
) seems to be a good alternative (Schoeman et al., 2002
; Roso et al., 2005
).
The objectives of this study were 1) to estimate genetic effects on preweaning weight gain of a commercial crossbred population using different genetic models; 2) to investigate the presence of an interaction between environment (latitude) and genetic effects; and 3) to compare estimates obtained by ordinary least squares (OLS) and RR.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Data Set
Preweaning weight gain (PWG) records of Nelore-Hereford calves pertaining to the Brazilian breeding program Conexão Delta G were analyzed. The final data set consisted of 103,445 records collected from 1975 to 2000; it included purebred and crossbred calves distributed in 1,864 contemporary groups (CG), which were raised under pasture conditions on farms located in south, southeast and middle west Brazilian regions, between latitudes 14.0°S and 31.5°S. To define CG, identifiers for farm, year, season, sex, and management group/pasture were concatenated. Connectedness was guaranteed by retaining only CG with at least 10 direct genetic links in the final data set (Roso et al., 2004
).
Data from farms that produced only purebred animals were not retained in the final data set. With this procedure, a large amount of data (approximately 120,000 records) from Nelore calves were discarded. Table 1
presents the distribution of animals according to the genetic compositions of their parents. Most of the calves were produced by purebred and
Nelore dams, and by purebred (mainly Hereford) sires. Table 2
presents the distribution of the data according to the latitude at which the farms were located, including the average proportion of Nelore in the calfs breed composition, by latitude. Approximately 70% of the calves considered for the analysis were raised in the south. The average proportion of Nelore varied from 0.16 (31°S) to 0.44 (central region).
|
|
![]() |
where y is the vector of PWG; b is the vector of environmental effects (including CG, linear and quadratic effects of age of dam nested in sex of calf, linear and quadratic effects of age of calf, and spline functions of Julian date of birth); g is the vector of fixed genetic effects varying for each model (described later); a is the vector of random direct additive genetic effects; m is the vector of random maternal additive genetic effects; e is the vector of random residual effects; and X, F, Z, and W are the incidence/coefficient matrices relating y to b, g, a, and m, respectively. The following assumptions were made: E(y) = Xb + Fg, Var(y) = ZAZ'
+ WAW'
+ I
, where A is the numerator relationship matrix,
2a is the direct additive variance,
2m is the maternal additive variance,
2e is the residual variance, and I represents an identity matrix. The maternal permanent environmental effect was not included in the model, and the covariance between direct and maternal additive genetic effects was assumed to be zero because the data structure would not allow an adequate estimation of these parameters. Most dams had only 1 calf (57%) and had no data for their own performance (79%), and a large number of sires (50%) had no maternal grandprogeny records.
For the first 5 models studied, the effects contained in vector g were:
where Ad and Am are the direct and maternal breed additive effects; Hd and Hm are the direct and maternal dominance effects; Ed and Em, Dd and Dm, Kd and Km are the direct and maternal epistatic or recombination loss effects according to definitions of Fries et al. (2000)
, Dickerson (1973)
, and Kinghorn (1980)
, respectively; and Cd and Cm are the direct and maternal complementarity effects (Fries et al., 2000
).
Coefficients for Ad and Am were calculated as the proportion of Nelore in the breed composition of the calf and the dam, respectively. Coefficients for Hd and Hm were equal to expected direct and maternal breed heterozygosities, respectively (Rodríguez-Almeida et al., 1997
), calculated as Hd = 1 (SN x DN) + (SH x DH), and Hm = 1 (MGSN x MGDN) + (MGSH x MGDH), where Si, Di, MGSi, and MGDi are the fractions of the ith breed (N = Nelore, and H = Hereford) for the sire, dam, maternal grandsire, and maternal granddam breed composition, respectively. When the genetic composition of the parents of a giving ancestor was not available, it was assumed that the animal was produced by inter se mating. Sires, dams, maternal grandsires, and maternal granddams had complete information to calculate their heterozygosities, with the following frequencies: 0.76, 0.76, 0.74, and 0.73, respectively.
Coefficients for Ed and Em were calculated as the average breed heterozygosities of the parents that generated the calf and the dam, respectively (Fries et al., 2000
): Ed = 0.5 x (HdS + HdD), and Em = 0.5 x (HdMGS + HdMGD), where HdS, HdD, HdMGS, and HdMGD are the expected direct breed heterozygosities of the sire, dam, maternal grandsire, and maternal granddam, respectively. The assumption underlying Ed and Em is that parents with larger heterozygosities produce more recombinant gametes than parents with smaller heterozygosities. The average epistatic loss due to the breakdown of all types of gene interactions involving 2 or more loci, as a deviation from the average additive and dominance effects, will be estimated by Ed and Em (Fries et al., 2002
). Coefficients for Dd and Dm were calculated as: Dd = 4 x Ad x (1 Ad) Hd, and Dm = 4 x Am x (1 Am) Hm. The recombination loss of the Dickersons model intended to measure deviations from linear association of heterosis with the degree of heterozygosity, and describes the average fractions of independently segregating pairs of loci in gametes from both parents, which are expected to be nonparental combinations (Dickerson, 1973
). Coefficients for Kd and Km were calculated as Kd = 2 x Ad x (1 Ad), and Km = 2 x Am x (1 Am). The epistasis effects in this model are based on the assumption of additive x additive gene interactions (Kinghorn, 1980
) but can have other biological interpretations depending on the restrictions imposed (Wolf et al., 1995
).
Coefficients for Cd and Cm were calculated as Cd = Ad x (1 Ad), and Cm = Am x (1 Am). The Cd and Cm can also be viewed as a joint additive effect (interaction between the indicus and the taurus fraction of the genotype). The inclusion of a nonlinear term on additive effect into the model (Cd and Cm in the current study) could be justified wherever commercially important traits are the product of subtraits. When evaluating a trait like growth in a given environment, for example, it can be imagined that 2 sets of genes are at play: one governing feed consumption, basal metabolism, feed and metabolic pathways and efficiency, etc., and another set related to heat production and dissipation, tick and parasite resistance, roughage utilization, general adaptation, etc. These 2 conglomerates complement each other, and for each production environment a certain genetic composition optimum should exist. Kinghorn (1993)
called this profit heterosis and presented a hypothesized example for milk yield per lactation (MY) viewed as the product of daily yield (DY) and days lactating (DL). The 2 traits (yield and persistency) were assumed to have fully additive inheritance, such that the F1 cross is halfway between the parental breeds for these traits. Breed A and B exhibit, respectively, DY equal to 40 and 10 kg and DL equal to 100 and 400 d, resulting in 4,000 kg of MY for both breeds. Because of additive inheritance of the 2 subtraits and the multiplicative derivation of total milk yield, there is notable heterosis in F1 and F2 crosses, expressing total milk yield of 6,250 kg. This profit heterosis effect is an example of a theoretical basis for heterosis that does not depend on dominance or epistasis (Kinghorn, 1993
). Empirical evidence for such scale effects also exists with respect to lifetime production of crossbred cows. From the indicus side they bring adaptation, resistance, lower metabolic rate, and longevity. From taurus, the crossbred cows bring sexual precocity and greater fertility and milk production. These 2 evolutionary strategies from indicus and taurus act in combination to enable a crossbred cow to wean close to twice as many kilograms of calves as the average purebred during their lifetimes.
Effects considered in vector g for the other 5 models studied, namely AHlat, AHElat, AHDlat, AHKlat, and AHEClat, were those effects considered in models AH, AHE, AHD, AHK and AHEC, respectively, plus their interactions with linear and quadratic effects of latitude. Latitude was included in these models as a modifier effect and not as a main effect or as a single covariate because there was a complete confounding with farm location; i.e., with the CG effect.
Statistical Analysis
In a first step, parameters of interest (variance components, fixed environmental and genetic effects, and random direct and maternal additive genetic effects) were estimated using the ASREML program (Gilmour et al., 2000
) for all studied models.
In a second step, records in vector y were preadjusted (yc) for all effects estimated in step 1, except for those in vector g, and multicolinearity studies were carried out applying OLS under the model yc = Fg + e.
Ridge Regression was applied for models presenting multicolinearity problems [models with a maximum value of variance inflation factor (maxVIF) greater than 30] in an attempt to improve the efficiency of the estimates; i.e., to reduce their prediction error variance with a relatively small incorporation of bias. As in Roso et al. (2005)
, the ridge parameter adopted was ki =
x (VIFi/maxVIF), where ki is the ith element of the diagonal matrix K (added to the left-hand side of OLS equations to perform RR) associated to the ith regression coefficient of the vector g. This weighted RR procedure considers the instability of each predictor variable separately, avoiding the introduction of unnecessary bias to those predictor variables not seriously involved in multicolinearity (Roso et al., 2005
). Similar to the method proposed by Neter et al. (1983)
, the parameter
was defined as the minimum value between 0 and 1 (varying with increments of 0.0001) that provided average VIF and maxVIF values lower or equal to 10 and 30, respectively. The combination of bootstrap with cross validation for defining
suggested by Delaney and Chatterjee (1986)
, and adopted by Roso et al. (2005)
, was not adopted because with this procedure a
value that provided an evident advantage over the other
values, in terms of reduction of MSEP, was not found for the present data set and models studied.
Comparison Criteria
Results from the first step Animal Model analyses were compared based on Akaike (AIC) and Bayesian information criteria (BIC; Bozdogan, 2000
; Myung, 2000
). Second step OLS and RR estimates were compared based on multicolinearity parameters (average VIF and maxVIF), and on bias and estimated prediction error variances. The bias measurement adopted was that proposed by Roso et al. (2005)
:
![]() |
where gRR are the solutions of vector g, obtained by RR; ||H|| is the Euclidean norm of the H matrix [H = (F'F+K)1F'F]; and ||I|| is the Euclidean norm of an identity matrix of order equal to the number of parameters in the vector g. It is important to note that the bias measurement adopted is more like a measure of the difference between OLS and RR estimates than a measure of bias itself.
Similar to Wolf et al. (2005)
, the effects contained in the vector g were considered to be significant if they had their absolute values at least 10 times as large as their SE.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
The main results from models including additive and nonadditive fixed genetic effects are shown in Table 3
. Results from models including additive and nonadditive fixed genetic effects and their interactions with latitude are shown in Table 4
.
|
|
Direct and maternal dominance estimates obtained from the 5 models were different in magnitude, but were all positive and significant. Differences in dominance estimates can be explained by the fact that recombination losses between uniting gametes are confounded with heterosis effects for models AHE, AHD, and AHEC (Wolf et al., 1995
; Fries et al., 2000
). The Hd and Hm estimates from model AHK corresponded, respectively, to +14 and +8% of the phenotypic mean.
The AIC and BIC model comparison criteria (Table 3
) reinforce results from literature (e.g., Arthur et al., 1999
; Demeke et al., 2003
) that breed additive and dominance effects are not sufficient to explain observed variability on preweaning traits of Bos taurus x Bos indicus calves. Models AHE, AHD, AHK, and AHEC presented lower AIC and BIC values than model AH. Direct epistatic or recombination loss effect was negative (models AHE, AHD, and AHK) or nonsignificant (model AHEC), and maternal epistatic or recombination loss effect was positive (models AHD and AHK) or nonsignificant (models AHE and AHEC). Considering that, if important, epistasis has a negative effect because of genes finding themselves having to interact or cooperate with other genes that they are not used to (Kinghorn, 1993
), maternal epistatic estimates from models AHD and AHK are biologically difficult to interpret.
Model AHEC presented a multicolinearity problem (Table 3
). Application of RR reduced drastically the sum of prediction error variances (88.5%) of AHEC estimates, with a relatively small incorporation of bias (11.1%). As a main consequence of the shrinkage on regression coefficient estimates, Hm became significant and positive, and the magnitudes of Cd and Cm effects were reduced. For both estimation methods, Ed and Em were nonsignificant, suggesting that the attempt to estimate epistasis and complementarity effects simultaneously was not entirely successful. Actually, covariables adopted trying to model complementarity effects were close to epistasis covariables proposed by Kinghorn (1980)
.
Results from Table 4
revealed that the data were ill-conditioned to estimate environment x genetic effects interactions. All models including both genetic effects and their interactions with latitude, presented unstable OLS estimates and extremely high VIF (in the thousands). Because of large SE (values not shown), significance of most of the effects could not be detected. This result can probably explain why there are so few studies (based on field trials or poorly designed experiments) reporting relevance of the interactions between environment and isolated additive and nonadditive genetic effects.
Compared with models AHElat and AHDlat, model AHKlat presented larger average and maximum VIF values and larger sum of prediction error variances (Table 4
), indicating a more serious multicolinearity problem. Multicolinearity parameters were even larger for AHEClat model, which had estimates that were more stable only under stronger shrinkage (larger ki values).
As observed by Schoeman et al. (2002)
and Roso et al. (2005)
, RR seems to be a powerful tool to obtain more plausible and stable estimates when evaluating ill-conditioned crossbreeding data. Estimated prediction error variances and VIF were drastically reduced, and many effects that were not significant under OLS became significant under RR. For example, considering model AHlat only direct genetic components (Ad and Hd) and their interactions with linear and quadratic effects of latitude were significant under OLS. Maternal genetic components (Am and Hm), and their interactions with latitude also became significant when RR was applied (Table 4
).
For all models (presented in Table 4
), RR estimates reinforce literature results (e.g., Barlow, 1981
; Arthur et al., 1999
) of the presence of environment by genotype interaction. Except in few cases, interactions between genetic effects with linear and quadratic effects of latitude were significant. The practical implication of these results is that crossbreeding programs should be planned in a way that genotypes are chosen to suit the environment in which the animals will be raised. Again, according to AIC and BIC, lack of fit of the additive-dominance model was observed in comparison to the other studied models. Models AHElat, AHDlat, AHKlat, and AHEClat presented lower AIC and BIC values than model AHlat.
Individual estimates of models presented in Table 4
are difficult to interpret. In an attempt to evaluate and better compare these models, predicted PWG surfaces from stabilized synthetic lines, ranging from pure Hereford (0) to pure Nelore (1), at different latitudes, were generated based on estimates provided by each model and estimation method (Figure 1
). The AHKlat surfaces were not presented as they were equal to those of AHDlat, which was expected, as they are equivalent models, i.e., both can be linearly transformed into one another (Wolf et al., 1995
).
|
Prediction surfaces based on RR estimates (figures on the right) were more acceptable from a biological perspective, highlighting that RR can be useful to overcome multicolinearity problems not only when the interest is on interpreting individual estimates, but also when regression analysis is made for prediction purposes. The prediction surfaces illustrated in Figure 1
based on RR estimates, for all models, showed a continuous decrease in predicted PWG of Nelore calves, as latitude got higher. Individuals of intermediate breed compositions overcome purebreds in performance, evidencing beneficial heterotic effects at all latitudes except at lower ones, where Nelore calves did better. Generally, at latitudes closer to 15°S, predicted PWG increased linearly as genotype got closer to Nelore. Barlow (1981)
, in a review, concluded that heterosis for growth among ruminants appears to be favored by a benign nutritional environment, and under low input conditions, genes for adaptation seem to have much more importance than genes for high metabolism and heterotic effects.
Figure 1
also shows that models considering different definitions of the epistatic or recombinant loss effect [AHElat and AHDlat (or AHKlat)] presented very similar surfaces, i.e., the choice of which epistatic covariables to be included in the model was not crucial for prediction purposes. In comparison to these models, RR surface of model AHEClat showed a bit larger advantage of Nelore calves at lower latitudes. The RR estimates of model AHlat provided a surface not as acceptable under a biological perspective as the others. Superiority of stabilized (synthetic)
Nelore calves in temperate and subtropical regions, in contrast to
Nelore calves in the tropical region was not as pronounced (0.5 to 2.0%), whereas Nelore calves at latitude 14°S were substantially superior (6.0%) to stabilized
Nelore calves in temperate and subtropical regions. Table 5
presents predicted PWG values based on estimates provided by the AHEClat model and RR estimation method, as auxiliary information to better visualize the corresponding surface of Figure 1
.
|
value and, consequently, the ridge parameter (ki), RR analysis were replicated using the ordinary RR method (a unique ki for all covariables) and ki were defined according to the index of stability of relative magnitude proposed by Vinod (1976)Instead of considering latitude, analyses were also performed including region (3 main regions were defined) and its interactions with the genetic effects in the model (data not shown). Multicolinearity was also observed for these models, and the RR results were not as informative as those obtained with models considering interactions with latitude effects.
In conclusion, results from the current study provided evidence that epistasis plays an important role in the performance of Bos indicus x Bos taurus crossbred calves. The inclusion of profit heterosis or complementarity on crossbred genetic evaluation models seems to be important but should be further investigated as we generally have little information on this phenomenon and on the appropriate manner to model it. The use of RR, or possibly another shrinkage method, seems to be mandatory to estimate environment by genetic effects interactions, especially with ill-conditioned data. The relevance of the latitude by genetic effects interaction estimates obtained suggested that the exploration of other geoclimatic indicators on well-distributed data is indicated to better explain the genotype x environment interaction effect.
| IMPLICATIONS |
|---|
|
|
|---|
| Footnotes |
|---|
2 Corresponding author: rcar{at}fcav.unesp.br
Received for publication April 5, 2006. Accepted for publication June 6, 2006.
| LITERATURE CITED |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |