J. Anim Sci.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Slanger, W. D.
Right arrow Articles by Carlson, J. K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Slanger, W. D.
Right arrow Articles by Carlson, J. K.
J. Anim. Sci. 2003. 81:1950-1958
© 2003 American Society of Animal Science

A comparison via simulation of least squares Lehmann-Scheffé estimators of two variances and heritability with those of restricted maximum likelihood

W. D. Slanger1 and J. K. Carlson

Office of Institutional Research and Analysis and Department of Animal and Range Sciences, North Dakota State University, Fargo 58105


    Abstract
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Implications
 Literature Cited
 
The objective was to compare the performance of a recently derived, new method of estimating variances and covariances with any mixed linear model and any pattern of missing data with that of restricted maximum likelihood. For each of 96 combinations of six three-herd x four-sire unbalanced designs of 39 offspring each, four heritability values, two ratios of sire variance to interaction variance, and two distributions (multivariate normal and multivariate {chi}2, 3 df), 15,000 vectors (n = 39) were generated. Least squares Lehmann-Scheffé (LSLS) estimators of sire variance, interaction variance, and heritability were compared to those of REML with the performance measures of percentage of estimates (of the 15,000) that were positive, mean square error, variance, percentage of estimates within ± 50% of the parameter, bias, maximum value, skewness, and kurtosis. The LSLS method vastly outperformed REML in almost all 96 combinations. Averaged over the 48 combinations with multivariate normal data, the average percentage that REML estimators of heritability performed relative to those of LSLS for the first five of the above listed eight performance measures was -100%. The number of times LSLS was better than REML was 235 out of 240. The analogous values for the 48 combinations with multivariate {chi}2, 3-df data were -90% and 230 out of 240. The REML maximum values were always larger than the LSLS values. The LSLS skewness and kurtosis values were about the same as those for REML, with the exception of LSLS heritability kurtosis values, which were notably less than those for REML. The explicit expectations of the LSLS estimators showed that the LSLS estimators were surprisingly unbiased given the paucity of data. Explicit coefficients for calculating mean square errors, variances, and biases squared of the LSLS estimators of the three variances were obtained for each design. The LSLS advantage was not quite so large with the multivariate {chi}2, 3-df data as with the multivariate normal data. Results with a symmetric multinomial distribution were the same as with the multivariate normal. The overall result was that the LSLS estimators produced substantially more non-zero estimates than REML estimators and these more abundant positive estimates were substantially grouped closer to their respective parameters. Results justify efforts to make the LSLS procedure computationally available.

Key Words: Estimation • Heritability • Least Squares • Linear Models • Variance Components


    Introduction
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Implications
 Literature Cited
 
Slanger (1996)Go introduced a new method of estimating variances and covariances. The procedure mathematically detailed and initially scrutinized in that paper is completely general, because no a priori values are required and there is no need for equal numbers of observations in the case of multivariate data. The context is any mixed linear model. The new method provides explicit quadratic estimators of (co)variances that are uniformly minimal variance, unbiased to the maximal extent possible over the entire range of possible parameter values of the (co)variances being estimated. These estimators were derived by determining the restrictions on the elements of the quadratic-form matrix necessary to satisfy the Lehmann-Scheffé criterion (1950)Go for uniformly minimal variance, unbiased estimation and then solving the resulting linear equations via the principle of least squares. Thus, the new method is called least squares Lehmann-Scheffé (LSLS). Slanger (1996)Go pointed out that LSLS is the first noniterative procedure proposed for quadratic estimation of (co)variances and introduces potential advantages and opportunities the approach offers. The comparison in Slanger (1996)Go of LSLS with Henderson’s Method 3 (Henderson, 1984Go) for a three-herd x four-sire design of 39 offspring considerably favored LSLS. A comparison of LSLS with the often-used REML procedure of Patterson and Thompson (1971)Go was needed. This article provides that comparison using 96 combinations of designs (all with 39 offspring), heritability values, ratios of sire variance to interaction variance, and data distributions.


    Materials and Methods
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Implications
 Literature Cited
 
Variances of quadratic estimators of (co)variances are functions of the numeric values of the (co)variance parameters being estimated. This situation makes estimation of (co)variances problematic. Uniformly best quadratic, unbiased estimators exist for balanced designs but not for unbalanced designs. Slanger (1996)Go tackles this problem head on by providing explicit quadratic estimators of (co)variances that are uniformly minimal variance, unbiased to the maximal extent possible over the entire range of possible parameter values of the (co)variances being estimated.

There have been two general approaches to estimating (co)variance components. The first is to equate expected values of quadratic functions to their respective sample numeric values and solve the resulting set of linear equations. This approach is iterative in the sense that the choosing of quadratic forms is arbitrary. The second is to iterate maximum likelihood expressions to solutions starting with a priori values of the (co)variance parameters being estimated (Slanger, 1996Go). Least squares Lehmann-Scheffé is not iterative; LSLS is explicitly defined with an explicit desirable property for any and all linear models with any and all (co)variance structures.

The statistical model for computer simulating (Monte Carlo) the data of this article is as follows:


where i = 1, 2, 3 treatments (herds); j = 1, 2, 3, 4 sires with assumptions of finite values for (sire variance), (variance of interaction between herd and sire), and (error variance), and zero correlations among, and within, {alpha}, {gamma}, and {varepsilon}. This is the same model of Slanger (1996)Go. The LSLS approach does not require the correlations among and/or within {alpha}, {gamma}, and {varepsilon} to be zero.

A matrix form expression of the above linear equation is as follows:


For this study, F was the identity matrix of order 39. The Z matrices were two incidence matrices determined by the six factorial designs of three herds x four sires and 39 observations each shown in Table 1Go. The Z{gamma} matrices had as many columns as subcells with observations. Each Z{alpha} had four columns. The G matrices were identity matrices. Multivariate normal vectors of Y39x1 were generated by U'n, where U' was the lower triangular matrix of the Cholesky decomposition of V into U'U by the ROOT function of PROC IML (SAS Inst., Inc., Cary, NC) and n was a vector of 39 values drawn at random from a univariate normal distribution of mean of 0.0 and variance 1.0. Multivariate {chi}2, 3-df vectors of Y39x1 were generated by U'{chi}, where {chi} was a vector of 39 values resulting from first subtracting 3.0 from, and then dividing by the square root of 6, each of 39 values drawn at random from a univariate {chi}2, 3 df distribution. For each n there was a corresponding {chi} generated with the same seed values as the n. The {chi}2, 3-df distribution was chosen for two reasons. The first was to compare LSLS with REML when the distribution of Y is skewed (i.e., not symmetric as with the multivariate normal). The second was to compare LSLS with data that satisfy the only assumption of its derivation to LSLS with data that does not satisfy that assumption. The one assumption in the derivation of LSLS is that the kurtosis parameters of the distribution of Y are those of a normally distributed Y. Table 2Go shows the eight parameter combinations of ,, and h2 for which 15,000 ns and 15,000 {chi}s were generated for each of the six designs of Table 1Go. Independence of results among the 48 combinations was insured by printing the 15,001th seed value at the end of simulating data for one of the 48 design by parameter set combinations and using this 15,001th seed value to start the simulation of the data of the next design by parameter set combination. The eight parameter set combinations were the four h2 values of 0.05, 0.20, 0.50, and 0.70 by two ratios of sire variance to interaction variance. These ratios were = 18:6 = 3 and = 6:18 = 1/3. Values for h2, , and determine the error variance parameter. The grand total number of estimates was 5,760,000 (96 combinations x 15,000 vectors x the 4 parameters of h2,,, and With one exception (see Table 6Go), this paper presents results for only the positive estimates of and and the estimates of h2 when all three variance estimates were positive. Results of the estimates are not presented, because the LSLS and REML estimates of were always positive and practically the same numeric values.


View this table:
[in this window]
[in a new window]
 
Table 1. Six designs for which data were simulated
 

View this table:
[in this window]
[in a new window]
 
Table 2. Parameter combinations for which data were simulated for each of the six designs of Table 1Go
 

View this table:
[in this window]
[in a new window]
 
Table 6. Expected values of least squares Lehmann-Scheffé (LS) estimators of error , sire , and herd x sire interaction variances and the respective averages (n = 15,000) of restricted maximum likelihood (RE) estimates from multivariate normal data of the same estimators for each of 48 combinations of four heritability (h2), two ratios of to , and six designs
 
The six designs of Table 1Go warrant brief characterizations. Design 1 was taken from page 137 of Henderson (1984)Go and is the design of Slanger (1996)Go. Design 2 is similar to design 1 but with more balance among the total numbers of progeny per sire. For design 3, one subcell of each of the four sires has no offspring, but the herd and sire totals are reasonably balanced. Design 4 is the most unbalanced in that there is one subcell of each of the four sires with no offspring, and herd and sire totals are quite unequal. Design 5 is as balanced as possible within the constraint of 39 total offspring. Design 6 is about as balanced as possible but with one subcell having no offspring.

The LSLS and REML estimates were obtained via PROC IML and PROC VARCOMP in SAS, respectively. The PROC VARCOMP procedure sets any REML negative estimate to 0.0.

The LSLS of this article is the LSLSb (LSLS biased) of Slanger (1996)Go. Slanger (1996)Go shows how to obtain unbiased LSLS estimators. These unbiased estimators were not included in the comparisons of this research because REML estimators are biased and the results of Slanger (1996)Go suggest that insisting on unbiasedness can substantially increase mean square error.

The performance measures were percentage of the 15,000 estimates that were positive, mean square error, variance, percentage of estimates within ± 50% of the parameter, bias, maximal value, and skewness and kurtosis values. Averages (means), variances, and maximal values were obtained from PROC MEANS of SAS. Medians and skewness and kurtosis values were obtained from PROC UNIVARIATE of SAS. Biases were means minus parameter values. Mean square errors were the addition of biases squared and variances. The REML variances and mean square errors were expressed relative to LSLS variances and mean square errors.

The coefficients for calculating the expected values of the LSLS estimators of ,, and for each design were obtained. Expected values of LSLS estimators of ,, and were compared to the averages of the 15,000 respective REML estimates. Coefficients for calculating mean square errors, variances, and biases squared of the LSLS estimators of the three variances were obtained for each design.

The performances of LSLS and REML with multivariate {chi}2, 3-df data were compared to those with multivariate normal data. This was done by 1) counting the number of times LSLS outperformed REML more with multivariate {chi}2, 3-df data than with multivariate normal data and 2) subtracting REML performance from LSLS performance with multivariate {chi}2, 3-df data, subtracting REML performance from LSLS performance with multivariate normal data, and then subtracting the second of these differences from the first. Outperformed means a higher percentage of estimates within ± 50% of the parameter, smaller mean square error, smaller variance, higher percentage of positive estimates, or smaller bias.


    Results and Discussion
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Implications
 Literature Cited
 
Table 3Go shows the percentages of estimates that were positive, means, medians, variances, biases squared, mean square errors, and percentages of estimates within ± 50% of the parameter for the LSLS and REML estimators for the eight combinations of four heritabilities and two ratio values for design 4, which was the most unbalanced design. Only eight of the total 144 comparisons of Table 3Go were in favor of REML, and these few REML advantages were small except for the bias squared value for the REML estimator of when h2 was 0.70 and was 1:3. In general, across all designs and parameter combinations, the LSLS estimators of were not as superior as LSLS estimators of and h2. Most of the LSLS advantages shown in Table 3Go are large, especially for variance and mean square error (e.g., the mean square error for the REML estimator of h2 when h2 = 0.20 and was 3.47 times larger than the mean square error of the respective LSLS estimator). Of particular interest are the huge differences in percentages of positive h2 estimates for all parameter combinations. This advantage of LSLS estimators of h2 was true for all parameter combinations and all designs, except for the nearly balanced design 5 for which the LSLS and REML estimator performances were approximately equal.


View this table:
[in this window]
[in a new window]
 
Table 3. Performances of least squares Lehmann-Scheffé (LSLS) and REML estimators of sire and herd x sire interaction variances and heritability (h2) for design 4 from positive estimates of 15,000 simulations of multivariate normal vectors (n = 39)
 
The performance advantage of LSLS extended to the measures of maximal values of estimates and skewness and kurtosis values. As an example, the results of these performance measures for the eight parameter combinations with design 4 (same context as Table 3Go) are summarized in this paragraph. The values of the maximal REML estimates of ,, and h2 were always larger than the respective LSLS maximal values. The average ratios of REML maximums to LSLS maximums were 2.54, 1.33, and 1.64 for ,, and h2, respectively. There was little variation in these ratios among the eight combinations. The skewness values of the estimates of and were all approximately 2.0; the average of these 32 skewness values was 1.89, with a range of 2.2 to 1.5. Ten of the 16 REML skewness values were larger than the LSLS skewness values. The REML and LSLS heritability skewness values were approximately the same for each of the eight combinations; the average was 0.78. Skewness values decreased as heritability increased. The REML and LSLS kurtosis values were also approximately the same for both and at each of the eight combinations. The average of these 32 kurtosis values was 5.5, with a range of 7.6 to 3.1. The LSLS kurtosis values for heritability did separate themselves from those of REML. The average ratio of REML h2 kurtosis value to LSLS h2 kurtosis value was 1.71. The average REML (n = 8) h2 kurtosis value was 0.78; the respective LSLS average was 0.49.

Initial results with a symmetric multinomial distribution were so close to those of the multivariate normal (data not shown) that effort was focused on obtaining and contrasting the results from multivariate normal and multivariate {chi}2, 3-df data.

Table 4Go provides information for designs 1, 2, 3, 5, and 6 when h2 was 0.20 and was 3:1. Design 1 is that of Slanger (1996)Go. Not counting the nearly balanced design 5, REML performance was higher for only 12 of 60 comparisons, and as suggested earlier, most of the better REML performances were with estimators, and even then the advantages were small. The more unbalanced the design, the better LSLS became compared to REML. Design 2 is similar to design 1 but with more balance among the total numbers of progeny per sire. Indeed, LSLS appears to have been more sensitive than REML in detecting that difference (i.e., the LSLS increases in percentages of estimates of within ± 50% of the true parameter value of of 18 and positive estimates and reduction in bias from design 1 to design 2 were greater than those of REML). Table 4Go captures the essential result of this article—that result being LSLS performance exceeded, and often vastly exceeded, that of REML, and the few times REML performance was higher, its advantage was small. The amount of detail given in the tables is for the purpose of unambiguously documenting that there are no holes in these results (i.e., the essential result is true for all 96 combinations of variance parameters, designs, and data distributions).


View this table:
[in this window]
[in a new window]
 
Table 4. Performance measures of least squares Lehmann-Scheffé (LSLS) and REML estimators of sire and herd x sire interaction variances and heritability (h2) for designs 1, 2, 3, 5, and 6 from positive estimates of 15,000 simulations of multivariate normal vectors (n = 39) when h2 = 0.20 and
 
The results for design 5 of Table 4Go deserve special mention and lead to an important statement. With this nearly balanced design, REML performance measures were slightly better (for the and estimators, not the h2 estimator) or equal to those of LSLS. This result contrasts nicely with the result stated in the previous paragraph that the more unbalanced the design, the better LSLS became. In other words, speaking anthropomorphically, as the estimation task became more difficult, LSLS applied itself more, but when the task was easy, LSLS relaxed to let the lesser estimator have the slight advantage. This result is attributed to the fact that the LSLS estimator has the precise, well-defined, explicit aim (which is totally unobscured by the data sample) of getting as close as possible to minimal-variance nonbias within the family of quadratic estimators and the linear model specified.

Table 5Go gives the matrices of expectations of the LSLS estimators of ,, and for each of the six designs. The expectation matrix for design 5 is nearly an identity. Given that there are only four sires with an average of 10 offspring each, the expectation matrices for the other, much more unbalanced designs are very encouraging (e.g., the absolute values of all error variance off-diagonals are 0.0061 or less, the diagonals range from 0.3838 to 0.5374, the diagonals range from 0.5490 to 0.8793, the by off-diagonals range from 0.0851 to 0.2365, and the less unbalanced the design, the less an expectation of one variance is encumbered by the values of the other two variances).


View this table:
[in this window]
[in a new window]
 
Table 5. Expectation matrices of least squares Lehmann-Scheffé (LSLS) estimators of the six designs
 
Table 6Go provides another comparison of LSLS with REML by showing 1) the LSLS expected values for the three variance estimators for each of the 48 combinations of heritabilities, sire variance to interaction variance ratios, and designs and 2) the respective averages of 15,000 REML estimates from the multivariate normal data. The expected values of the LSLS estimators of and were closer to the parameter values than the respective averages of REML estimates 73% more times. This 73% advantage was equally distributed between and estimators. The REML biases were especially larger than LSLS biases when h2 was 0.05. Except for the nearly balanced design, the expectations of the LSLS estimators of and were approximately the same for each of the five designs; that was not the case for the REML averages. This result, plus the result that design 5 expectations round to the respective parameter values in all but one of the 16 combinations, suggests a stability and robustness for LSLS estimators. The LSLS estimators have been exactly the same as the ANOVA estimators for every balanced-design example; however, this equivalency has not been proved analytically.

The results of Table 7Go also suggest stability, robustness, and reasonableness for LSLS estimators. Table 7Go gives the coefficients for calculating mean square errors, variances, and biases squared of the LSLS estimators of the three variances for designs 1, 4, and 5. Mean square error, variance, and bias squared values for when ,, and are 30, 2, and 5, respectively, are also given. This provides a way to directly compare with Slanger (1996)Go, since that article provides these coefficients and resulting mean squares, variances, and biases squared for design 1. The coefficients, mean square errors, variances, and biases squared for designs 1 and 4 are approximately the same, but the mean square error values are actually less for the more unbalanced design 4, especially for the estimator of . The LSLS biases squared were a small fraction of the respective mean square errors. The nearly zero LSLS biases associated with design 5 are due to all biases squared coefficients being small, not positively and negatively signed coefficients compensating for each other. Table 7Go also gives the analogous coefficients and values for the ANOVA estimators for design 5. That these coefficients and values are similar to those of the LSLS estimators reinforces the point that the LSLS estimators appear to be stable, robust, and reasonable.


View this table:
[in this window]
[in a new window]
 
Table 7. Coefficients for calculating variances and biases squared of least squares Lehmann-Scheffé estimator of error , sire , and herd x sire interaction variances for designs 1, 4, and 5
 
Table 8Go provides the results of two measurements of the extent to which LSLS outperformed REML more with the multivariate chi-square, 3 df data than with the multivariate normal data. In general, the LSLS advantage was not quite as large with the multivariate chi-square, 3 df data as with the multivariate normal data; however, the results suggest two interesting points about LSLS. First, the LSLS estimators of , which were occasionally challenged by respective REML estimators with multivariate normal data, more often out performed REML estimators of with multivariate {chi}2, 3-df data than LSLS estimators for and h2 with multivariate {chi}2, 3-df data outperformed respective REML estimators. This first point suggests there was a compensation effect for LSLS estimators of . The second point is that LSLS outperformed REML more often with multivariate {chi}2, 3-df data when the design was nearly balanced. This second point hints that LSLS is less sensitive to its one distributional assumption about the data when applied to balanced data than REML is sensitive to its assumption of normality when applied to balanced data.


View this table:
[in this window]
[in a new window]
 
Table 8. Extent to which least squares Lehmann-Scheffé (LSLS) outperformed REML more with multivariate {chi}2, 3-df data than with multivariate normal data
 

    Implications
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Implications
 Literature Cited
 
A new method of estimating the variances and correlations from data of animal breeding and other studies has recently been discovered. The results of a detailed examination of the method with simulated data show the method to have great promise of much greater accuracy than any previous method, at least for situations of limited data and no selection, and justify efforts to make this method computationally feasible for larger data sets.

1 Correspondence: P.O. Box 5075 (phone: 701-231-7418; fax: 701-231-9419; E-mail: william.slanger{at}ndsu.nodak.edu).

Received for publication June 14, 2002. Accepted for publication May 14, 2003.


    Literature Cited
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Implications
 Literature Cited
 


Henderson, C. R. 1984. Applications of Linear Models in Animal Breeding. Univ. of Guelph, Guelph, Ontario, Canada.

Lehmann, E. L., and H. Scheffé, 1950. Completeness, similar regions, and unbiased estimation—Part I. Sankhya Indian J. Stat. 10:305–340.

Patterson, H. D., and R. Thompson. 1971. Recovery of inter-block information when block sizes are unequal. Biometrika 58:545–554.[Abstract/Free Full Text]

Slanger, W. D. 1996. Least Squares Lehmann-Scheffé estimation of variances and covariances with mixed linear models. J. Anim. Sci. 74:2577–2585.[Abstract]



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Slanger, W. D.
Right arrow Articles by Carlson, J. K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Slanger, W. D.
Right arrow Articles by Carlson, J. K.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS