|
|
||||||||
ANIMAL GENETICS |
Animal & Dairy Science Department, The University of Georgia, Athens 30602-2771
| Abstract |
|---|
|
|
|---|
Key Words: Cross Validation Polynomial Regression Two-Dimensional Spline
| Introduction |
|---|
|
|
|---|
One alternative to polynomial regressions is splines. Splines are a series of polynomial functions fit through control points, referred to as knots. It has been shown that spline functions are resistant to artifacts (Druet et al., 2003
; Aarons et al., 2004
). Unlike polynomial regressions, in which a small subset of data can affect the entire function, splines are defined by a series of polynomials that are affected only by their bounding knots (Molinari et al., 2002
). Although one-dimensional splines provide a more robust model, they still require nesting when modeling AOD and age of animal. Two-dimensional splines can provide a generalized and robust model for fixed effects. The selected two-dimensional knots provide automatic nesting and implicit modeling of interactions and decrease the effects of outlying records.
The purpose of this cross-validation study was to compare the fit of two methods for evaluating growth in beef cattle and to provide a basic methodology for fitting two-dimensional splines to biological data.
| Materials and Methods |
|---|
|
|
|---|
|
The equation for random effects in scalar notation was
![]() |
where randomhijkm = sum of random effects for trait t (growth records at birth, weaning, or yearling ages) and AOD for group j, dirdk and pedk = spline coefficients d for additive direct (dir) and permanent environmental (pe) effects for animal k, matdm and mpedm = spline coefficients d for maternal (mat) and maternal permanent (mpe) environmental effects for dam m, ehijkm = weighted heterogeneous random residual modeled by linear splines and implemented by weighting each observation, and sdh = coefficient d of the linear spline function for an observation taken at age h. This is the same equation used by Robbins et al. (2005)
.
The fixed-effect model using within-trait nested cubic polynomial regressions on age and within-trait nested AOD classes was
![]() |
where cgi = contemporary group i, consisting of animals of the same sex, percent Gelbvieh, and from the same breeder-defined management groups;
ht = linear, quadratic, and cubic regression coefficients at age h and nested in trait t; ageh = age h of animal; aget = reference age of trait t; and AODj = AOD class j nested within trait t. Age of dam classes were renumbered for Wwt and Ywt traits for nesting purposes.
A second model that contained the same within-trait fixed effects plus an additional AOD by age of animal interaction was fit and is described here:
![]() |
where AODj*ageh = the interaction of AOD class j by age of animal h.
The two-dimensional spline model can be written as
![]() |
where cf = the coefficient of an animal with age and AOD such that
![]() |
hj = the estimated knot value for age h and AODj.
The coefficients for the two-dimensional splines were determined as
![]() |
where x is 1 minus the distance of the age of animal from the knot for age of animal, and y is determined by 1 minus the distance of the AOD from the knot for AOD when 0
distance
1.
Because the two-dimensional spline was poor at extrapolation beyond the two-dimensional grid, a model was run that used the weighted sum of one-dimensional splines for extrapolation beyond the grid knots. The equation was
![]() |
where wi = weighting factor for one-dimensional splinei; lcf = linear spline coefficient for one-dimensional spline extrapolation; knotad = two-dimensional spline knot for age of animal h and age of dam d; and nk = number of one-dimensional spline functions.
Evaluation Methods
Solutions obtained by program BLUP90IOD (Tsuruta et al., 2001
) from the analysis of data set 1 were used to predict the records of animals in data set 2. Using actual and predicted records from data set 2, the R2, average squared errors (ASE), and percent bias were computed for each model at each trait (Bwt, Wwt, and Ywt). The ASE, percent bias, R2, and plots of fixed-effect solutions were used to evaluate each model.
As a result of the overparameterization of the evaluation models, mean squared error could not be used because there were no df. Therefore, the ASE was used to evaluate the fixed-effect models. The ASE was computed as
![]() |
where yi = the weight of animal i,
i = the predicted record of animal i, and n = the number of records contained in the test data set. In addition to ASE, percent bias was calculated as
![]() |
| Results and Discussion |
|---|
|
|
|---|
Creating an optimal two-dimensional spline model can be much more time-consuming than polynomial regressions. Splines are approximations that depend heavily on the location of the knots. There must be enough knots to adequately model the shape of the function, and there must be enough records in each interval between knots to accurately estimate knot values. Unfortunately, there is no automatic procedure for the selection of knots, which can result in much trial and error; however, there are some general rules that can aid in this process. Wold (1974)
suggested the use of as few knots as possible (no more than one extremum and one inflection point per interval) and the location of knots close to inflection points. In the case of the two-dimensional spline, the application of these rules to each variable separately can provide a good starting point. In addition, the inability of the two-dimensional spline to model data outside the two-dimensional grid necessitates the placement of knots at extreme values. However, if data are sparse around the extrema, the use of weighted spline extrapolation may give the best results.
Once a base model has been established, there are some generalized procedures for the addition of knots to the model. One procedure is to place an additional knot at the median of the existing interval (Rosenberg et al., 2003
). This process could be useful when the data are continuously distributed. In the case of growth data in beef cattle, both age and AOD are clustered, thereby limiting the areas in which knots can be placed. In such a case, the median may not be the best place to add additional knots; however, the general principles of this procedure can be useful in expanding the base model. When dealing with disjointed data, placing knots at the end points of each cluster may be a good idea; however, if data are sparse at the endpoints, placing the knots closer to the center can provide better results.
When using two-dimensional splines, variables may behave differently depending on the value of another variable. In such a case, as with the nesting of AOD within age, placing knots based on each variables curve alone may not be optimal. To account for possible interactions or nesting effects, conditional plots can be of value. Plotting a variable by each interval of the other variable can help in determining how the two variables interact. In such a case, it is best to place as few knots as possible that allow enough flexibility to model possible interactions and nesting effects. It is important to remember that although an optimized spline is robust against artifacts, the flexibility of the spline model makes it highly susceptible to artifacts when knots are poorly placed (Wold, 1974
).
Cross-validation results in Table 2
show that M1 performed well. The model containing AOD by age of animal interactions had lower R2 values than models without the interaction effect, suggesting no interaction is present in the data. The interaction model showed increases in ASE and negative biases for Wwt and Ywt. The relatively large and negative percent bias values show that the interaction model is overpredicting records; this is likely due to overfitting of the model to data set 1. It was expected that the nested polynomials would perform well given the disjointed nature of the age distributions. The distinct Bwt, Wwt, and Ywt groups, coupled with the high density of records within each group, makes nested polynomials an appealing model choice for this particular data set.
|
|
|
As seen in Table 2
, M2 performed well with the extended grid and weighted spline extrapolation methods. The parity of M1 and M2 at birth would be expected, as there is no age variation; the modeling of AOD effects was the only difference between models. Although there are some differences in R2, ASE, and percent bias at Wwt and Ywt, M1 and M2 had similar fits for these traits. These results suggest that M2 is capable of automatically nesting AOD within age of animal but does not provide a superior fit to the data. Whereas M1 has seven additional fixed-effect parameters compared with M2, their effect on model complexity is negligible when weighted against more than 16,000 CG.
When looking at the graph of M2 in Figure 3a
, it seems that using two-dimensional functions to extrapolate beyond the grid can result in large jumps in the estimated effects. This results from the fact that, unlike one-dimensional splines, two-dimensional coefficients must be forced to sum to a constant. Once outside the grid, this restriction is removed, and knot coefficients suddenly jump to values that no longer sum to this constant. As seen in Figure 3b
, the use of weighted spline extrapolation greatly alleviates this problem. The weighted spline function allows the sum of knot values to increase or decrease gradually from one. As well as giving smoother graphs, the use of the weighted interpolation gives lower ASE and percent bias as shown in Table 2
. Another solution to this problem is the extension of the two-dimensional grid to encompass all data. This method performs well in terms of ASE and percent bias, but graphs of solutions in Figure 3c
show that it can be subject to artifacts. Some alternatives to the previously described M2 methods could involve the creation of a function with an asymptote such that knotad = one-dimensional spline knot at age "a," where age "a" is 1 d beyond the bound of the two-dimensional grid, and age of dam "d," where age of dam "d" is 1 d beyond the bound of the two-dimensional grid. This results in a discontinuous model with an increased number of knots. Additionally, the elimination of the bounding knot farthest from any given data point would leave only three knots for each observation, allowing for a linear formulation of knot coefficients. In addition to this simplified triangular methodology, the inclusion of both fixed and random interaction effects could be effective for modeling of data sets containing multiple growth curves.
|
The clustering of data around birth, 205 d, and 365 d makes the use of nested polynomials a relatively simple and effective way to model fixed effects in this application. However, if the use of RRM models becomes a more standard practice, the nesting of polynomials will become increasingly difficult if collection of data across ages becomes more continuous. Because of the disjoint nature of the nested polynomials, evaluation of animals with records located between Wwt and Ywt age ranges could be problematic.
Given the current state of the industry in which records are clustered within predefined age ranges, M2 does not have an advantage over traditional polynomial regressions. Previous applications of two-dimensional splines have been in the form of thin-plate splines used in the context of engineering and graphical applications (Meinguet, 1979
). In such instances, data are collected in a grid-like manner or such that observations are located at key points given a known three-dimensional shape (Bookstein, 1989
). Under such conditions, thin-plate splines are very effective; however, in the present application, neither of these conditions is met. This does not mean that two-dimensional splines cannot be effective, but they might not provide optimal performance. Despite this, the potential susceptibility of polynomial regression to artifacts could make two-dimensional splines a more attractive choice.
In this study, only one set of curves was fit for all animals. In fact, curves for different management systems and different regions may vary as a function of year of birth. Modeling of these different curves could be done using combinations of fixed and random effects, as done in the random regression modeling of herd by year of calving (Druet et al., 2003
). If these differences are ignored, the curves are averages over environments. If recording for Ywt is selective, as in this study where only 30% of animals with Bwt had records for Ywt, the curves for Ywt may be averages over mostly selected environments and may cause imperfect curves for Wwt when the age effect is not nested. Thus, it is possible that M2 could be closer or superior to M1 if recording for Ywt were more complete.
| Implications |
|---|
|
|
|---|
| Appendix 1 |
|---|
|
|
|---|
![]() |
![]() |
![]() |
Weighted spline extrapolation = 0.83 x 0.73 x knot(205, 2,190) + 0.17 x 0.73 x knot(270, 2,190), where knot(205, 2190) is the two-dimensional spline knot, as estimated by the mixed-model equations, at 205 d of age and a dam age of 2,190 d. Knot(270,2190) is the two-dimensional spline knot at 270 d of age and dam age of 2,190 d.
1 Correspondence: Rhodes Center for Animal and Dairy Science (phone: 706-542-0965; fax: 706-583-0274; e-mail: krobbin1{at}uga.edu).
Received for publication May 12, 2005. Accepted for publication August 1, 2005.
| Literature Cited |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |