J. Anim Sci.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J. Anim Sci. 2008. 86:3358-3366. doi:10.2527/jas.2008-1021
© 2008 American Society of Animal Science

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
jas.2008-1021v1
86/12/3358    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Long, N.
Right arrow Articles by Avendaño, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Long, N.
Right arrow Articles by Avendaño, S.

ANIMAL GENETICS

Marker-assisted assessment of genotype by environment interaction: A case study of single nucleotide polymorphism-mortality association in broilers in two hygiene environments1

N. Long*,2, D. Gianola*,{dagger}, G. J. M. Rosa{dagger}, K. A. Weigel{dagger} and S. Avendaño{ddagger}

* Department of Animal Sciences, University of Wisconsin, Madison 53706; and {dagger} Department of Dairy Science, University of Wisconsin, Madison 53706; and {ddagger} Aviagen Ltd., Newbridge, Midlothian, EH28 8SZ, United Kingdom


    Abstract
 Top
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 LITERATURE CITED
 
Interplay between genetic and environmental factors, genotype x environment interactions (G x E), affect phenotypes of complex traits. A methodology for assessing G x E was investigated by detecting hygiene (low and high) environment-specific SNP subsets associated with broiler chicken mortality, followed by an examination of consistency between SNP subsets selected from the 2 environments. The trait was mean progeny mortality rate in 253 sire families, after adjusting records for nuisance effects affecting mortality at the individual bird level. Over 5,000 whole-genome SNP were narrowed down via a machine-learning (filter-wrapper) feature selection procedure applied to mortality rates in each of the 2 environments. For both early and late mortality, it was found that the selected SNP subsets differed across hygiene environments, in terms of either across-environment predictive ability or extent of linkage disequilibrium between the subsets. Reduction in predictive ability due to G x E was assessed by the ratio of 2 predicted residual sum of squares statistics, one associated with SNP selected from the same hygiene environment and the other associated with the SNP subset from a different environment. Reduction was 30 and 20% for early and late mortality, respectively. An extremely low level of linkage disequilibrium between SNP subsets selected under low and high hygiene also indicated G x E. Findings suggest that there may not be a universally optimal SNP subset for predicting mortality and that interactions between genome and environmental factors need to be considered in association analysis of complex traits.

Key Words: chicken • genetic association • genotype by environment interaction • machine learning • mortality • single nucleotide polymorphism


    INTRODUCTION
 Top
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 LITERATURE CITED
 
One consequence of genotype x environment interactions (G x E) is the existence of environment-specific genetic effects. For example, 2 metabolic genes (CY-P1A1 and GSTT1) were found to be associated with reduction in infant BW only in mothers who smoked during pregnancy (Wang et al., 2002Go). Genotype x environment interactions also result in different patterns of genetic associations across environments (Colhoun et al., 2003Go; Ye et al., 2006Go), and the best environment for one breed may not be best for other breeds (Steinheim et al., 2008Go).

In the context of SNP subset selection, G x E can be viewed in different ways. One perspective is that the best SNP subsets selected in different environments do not overlap fully; also, the level of allelic associations between the 2 sets of SNP is low, as measured by some linkage disequilibrium (LD) parameters. Another perspective is that a subset of well-chosen SNP selected in one environment may not predict equally well the same trait in a different environment. That is, across-environment predictions would be poorer. This would imply that, in addition to genetic background (SNP), environmental factors need to be taken into account in the development of genomic-assisted selection models.

The objective of this study was to examine G x E using machine-learning techniques in a SNP-mortality association case study. The filter-wrapper feature selection of Long et al. (2007)Go was applied to 4 age-hygiene environment specific strata, to test whether environment affects SNP subset selection for early and late mortality traits. Genotype x environment interactions would be present if the SNP subsets selected from 2 hygiene environments are not consistent, in terms of across-environment predictive ability and extent of LD between subsets.


    MATERIALS AND METHODS
 Top
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 LITERATURE CITED
 
Institutional Animal Care and Use Committee approval was not obtained for this study because no animals were used.

Data Set

Genotypic and phenotypic data came from the Genomics Initiative Project at Aviagen Ltd. (Newbridge, UK). Phenotypes consisted of early (0 to 14 d) and late (14 to 42 d) living status (dead or alive) of 333,483 birds. These chicks were raised in either a high (H, n = 251,539) or low (L, n = 81,944) hygiene condition. The high and low hygiene environments were representative of selection nucleus and commercial level conditions, respectively. Information included sire, dam, age of dam, hatch, and sex of each bird. There were 1,552 sires and 9,921 dams; age of dam was categorized into 18 levels (30 to 64 wk), and there were 232 hatch groups.

There were 5,523 SNP genotyped on 253 of the 1,552 sires. Each SNP was biallelic (e.g., A or G alleles), and genotypes were coded as 0 (AA), 1 (AG), or 2 (GG). Detailed description of these SNP is in Long et al. (2007)Go.

Sex-specific mortality was not studied, because sex was assigned post 35 d; therefore, sex-specific early mortality (0 to 14 d) could not be evaluated. Also, average mortality post 35 d was low for both sexes (data not shown), hence precluding assessment of sex-specific late mortality (14 to 42 d).

Calculation of Adjusted Sire Means and Formation of 4 Strata

To create a response variable for the sires genotyped, first, effects of factors (age of dam and hatch) potentially affecting individual bird survival were removed via a GLM model (GLMM) without sire and hygiene effects. For each bird, the residual derived from the GLMM was calculated. Then, birds were classified into 2 groups, corresponding to the hygiene environments (L and H) in which they had been raised. In each hygiene group, residuals of progeny of a sire were averaged, producing an adjusted progeny mortality mean as the response variable for each sire.

The individual record on each bird was binary (dead or alive), and the GLMM fitted was


Formula

where pijk = the death probability of bird k, progeny of a dam of age I, and born in hatch j. Here, DAi stands for the fixed effect of the ith level of age of dam (i = 1, 2,..., 18); Hj denotes the random effect of hatch j (j = 1, 2,..., 232), which was assumed normal, independent, and identically distributed as Hj ~ NIID(0, {sigma}2H), where {sigma}2H = the variance between hatches. Let yijk be the true binary status of bird ijk (0 = alive, 1 = dead) and pijk be the fitted death probability using model [1]; then, the residual for a given bird is rijk = yijkpijk, with a sampling space of [–1, 1]. The GLMM was implemented in SAS PROC GLIMMIX (SAS Institute Inc., Cary, NC).

Model [1] was fitted to both early and late mortality data. Subsequently, birds were divided into the 2 hygiene groups, and progeny residuals were averaged for each sire, as described above. Thus, 4 strata of age-hygiene combinations were formed, with each stratum containing the adjusted progeny mortality means calculated for: 1) birds of early age raised in low hygiene (EL), 2) birds of early age raised in high hygiene (EH), 3) birds of late age raised in low hygiene (LL), and 4) birds of late age raised in high hygiene (LH). After removing SNP with missing values (i.e., those that were not genotyped on some sires), number of sires and SNP used were as follows: EL and LL—222 sires, 5,119 SNP and EH and LH—232 sires, 5,166 SNP; 201 sires had progeny in both the L and H environments.

Discretization of Phenotypes and Filter-Wrapper Feature Selection

Methods were essentially those employed in Long et al. (2007)Go, designed to circumvent the difficulty encountered in standard regression models when fitting effects of a much greater number of SNP than the number of observations available. Filter-wrapper feature selection operates on sire group means and results in an optimal SNP subset. Because the procedure is tailored for discrete traits, the continuous adjusted mortality rates were discretized into 2 classes, namely, decreased and increased mortality. By means of a pair of cut-off values, c1 and c2, sire groups with intermediate means were discarded, to make the remaining sire samples (sires with mortality rates ≤ c1 or ≥ c2) more informative. Here, the cut-off pair (c1, c2) was such that c1 is the {alpha} quantile of the distribution of adjusted mortality rates over all sires and c2 is the 1-{alpha} quantile. Four pairs of (c1, c2) corresponding to 4 {alpha} values (0.05, 0.20, 0.35, and 0.50) were adopted to generate 4 groups of case-control sire samples (group 1 to group 4) for use in the filter-wrapper procedure.

An information-gain (Mitchell, 1997Go)-based filter was used to decrease the feature (SNP) space from thousands of SNP to a small number (e.g., the top 50 SNP), thus eliminating irrelevant features from further consideration. Subsequently, the wrapper optimized the performance of the 50 SNP selected in the filter step, by adopting a naïve Bayes (NB) classifier (Elkan, 1997Go), acting as a black box for scoring each of the searched subsets (relative to the full set of 50 SNP), with the score being classification accuracy. The filter procedure was coded in Java, and the wrapper procedure was carried out on the Weka platform (Witten and Frank, 2005Go).

For each of the 4 case-control groups, the filter-wrapper procedure selected an optimal subset of SNP (i.e., the subset with greatest NB classification accuracy). Because there were 4 {alpha} groups per stratum, 4 subsets of SNP were produced at each age-hygiene condition combination. These were validated as described below.

Validation of the 4 Subsets of SNP in Each Stratum

Each of the 4 SNP subsets in a stratum has an associated NB classification accuracy. These accuracies are not comparable with each other, because each subset is specific to a certain {alpha} value and, therefore, to a different number of sire families classified into either low or high mortality groups. To compare the 4 subsets within each stratum fairly, and to arrive at an optimal subset, cross-validation was used. A fixed-effects linear model was fitted in which predictors were SNP from the subset under consideration, and response variables were the adjusted mortality rates of all sire families studied in the stratum in question. The model was


Formula

where Mi = the adjusted mortality mean of sire i (after standardization, to achieve a zero mean and unit variance); SNPij = the fixed effect of the jth genotype of SNP (chosen from the fiter-wrapper process) in sire i; and ng = the number of SNP in the subset under consideration; this model has 2 ng df. The analysis was by weighted least squares, in which the weight assigned to sire mean i was Ti/T (Ti = the number of progeny in sire family i; T = the total number of progeny over all sire families). In addition, Q-Q plots of the fitted residuals (derived from model [2] assuming normal errors) versus either normal or t-distributions with 4, 6, and 8 df (t-4, t-6, and t-8, respectively) were examined, to determine a proper error distribution for this model.

Predicted residual sum of squares (PRESS) was used as a criterion for comparing the predictive ability of the SNP subsets. Predicted residual sum of squares, as described in Ruppert et al. (2003)Go, is a cross-validation-type statistic and predicts Mi using all sire means except the ith (i = 1, 2,...,N). Letting M(i) be the predicted value of the ith sire mean when the latter is not used to estimate parameters in model [2], PRESS is given by


Formula

A subset of SNP was considered best if it produced the smallest PRESS when employing them as predictors. A SAS macro was written to generate PRESS statistics for model [2], which was embedded in SAS PROC GLIMMIX (SAS Institute Inc.).

Across-Environment Prediction

Across-environment prediction assesses whether or not the predictive ability of SNP selected under one hygiene environment stands in a different hygiene environment. Variation in predictive ability of a certain SNP subset under different hygiene conditions is indicative of sensitivity of SNP to environments (i.e., G x E).

This step utilized the 4 SNP subsets derived above, in each stratum. It was performed for early and late mortality separately and in the same way. For example, there were 2 strata for early mortality: namely, EL and EH. Four subsets of SNP in low hygiene (EL) were then used to predict sire mortality means in high hygiene (EH), using model [2]. Similarly, 4 subsets in H were also used to predict sire mortality means in L. The PRESS statistics were computed to compare with the PRESS values obtained in the validation step, in which sire progeny mortality means were predicted using SNP selected from the hygiene environment in which the progeny of the sire had been raised.

Graphical Display of LD Between SNP

The magnitude of LD between 2 SNP loci is a measure of association between them, although existence of LD does not imply physical linkage (Ott, 1999Go). A group of SNP could be a proxy for another if there is strong LD between the 2 groups, regardless of differences in physical locations. Pairwise LD between the 2 best SNP subsets (each selected in a different environment, L or H) was summarized graphically. The LD measure adopted was r2, the squared correlation coefficient between 2 SNP loci (Thomas, 2004Go). Graphs were made with R (http://www.r-project.org/) package LDheatmap (Shin et al., 2006Go).


    RESULTS
 Top
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 LITERATURE CITED
 
Distribution of Adjusted Sire Mortality Means

The distribution of stratum-specific (EL, EH, LL, and LH) adjusted sire mortality means is shown in Figure 1Go. The fitted densities were skewed, suggesting that adjusted mortality rates were not well described by a normal distribution.


Figure 1
View larger version (27K):
[in this window]
[in a new window]

 
Figure 1. Distribution of adjusted sire mortality means in the 4 strata: EL (early mortality, low hygiene), EH (early mortality, high hygiene), LL (late mortality, low hygiene), and LH (late mortality, high hygiene).

 
Filter-Wrapper Feature Selection

The 4 case-control sire samples per stratum (from the 4 distinct {alpha} values) are shown in Table 1Go (left part). Sample size (number of sires) included in the filter-wrapper procedure ranged from 24 to 232 and increased with {alpha}, as the phenotypic difference between the case and control sires became smaller.


View this table:
[in this window]
[in a new window]

 
Table 1. Number of sires and size of SNP subsets (selected by wrapper from a full set of 50 SNP) in {alpha}-specific case-control sample from the 4 strata
 
The information gain-based filter ranked SNP according to their information gains (from large to small) and retained the top 50 for further analysis. Information gains for the top 50 SNP are shown in Figure 2Go. Each of the plots corresponds to a stratum, and the 4 lines in each plot correspond to the 4 case-control samples resulting from varying {alpha}. The maximum value for information gain is 1; hence, information gains of individual SNP were relatively low, except when {alpha} = 0.05 (maximum difference between 2 groups of sires among the 4 {alpha} values).


Figure 2
View larger version (32K):
[in this window]
[in a new window]

 
Figure 2. Information gains of the top 50 SNP in the 4 stata: EL (early mortality, low hygiene), EH (early mortality, high hygiene), LL (late mortality, low hygiene), and LH (late mortality, high hygiene). Each of the plots corresponds to a stratum, and the 4 lines in each plot correspond to the 4 case-control samples resulting from varying {alpha}. Groups 1 to 4 correspond to {alpha} = 0.05, 0.20, 0.35, and 0.50, respectively.

 
The wrapper optimized the performance of the 50 top-scoring SNP selected by the filter. Because combining the best individual features may not yield a best set of features, the wrapper led to an optimal SNP subset specific to a NB classifier, which was the learning procedure employed. According to results discussed in Long et al. (2007)Go, the search method of backward elimination worked better than forward selection, forward stepwise selection, and backward stepwise elimination, so it was adopted here.

Table 1Go (right part) gives the sizes of the SNP subsets selected accordingly. In the most extreme case-control samples ({alpha} = 0.05), the number of SNP ranged from 4 to 9. On the other hand, the number of SNP ranged from 33 to 46 when {alpha} = 0.50, much greater than in the case of {alpha} = 0.05. The difference in the size of SNP subsets over different {alpha} sample groups is due to the fact that redundancy between the top 50 SNP was much greater in small {alpha}-groups than in large {alpha}-groups (Long et al., 2007Go).

Figure 3Go shows the cross-validation classification accuracy of NB of SNP before (full set) and after feature selection (subset). Subset selection increased classification accuracy, except in group 1 ({alpha} = 0.05), in which sire families in the case-control groups were more extreme and therefore more informative; it was hard to improve classification accuracy in that case. Classification accuracies of SNP subsets were high (>90% in most cases) and decreased with increasing {alpha}.


Figure 3
View larger version (41K):
[in this window]
[in a new window]

 
Figure 3. Classification accuracy of naïve Bayes classifier of the top 50 SNP (full set) and of those in the subset in the 4 strata: EL (early mortality, low hygiene), EH (early mortality, high hygiene), LL (late mortality, low hygiene), and LH (late mortality, high hygiene). Each graph corresponds to a stratum, and groups 1 to 4 correspond to {alpha} = 0.05, 0.20, 0.35, and 0.50, respectively.

 
Validation of the 4 Subsets of SNP in Each Stratum

According to the Q-Q plots, a t-8 distribution fitted the standardized adjusted mortality rates best (data not shown). Hence, model [2] was fitted with a t-8 error distribution to generate PRESS statistics for the 4 SNP subsets (corresponding to {alpha} = 0.05, 0.20, 0.35, and 0.50, respectively) in each age-hygiene condition stratum as shown in Table 2Go. The smallest value in each stratum corresponds to the best SNP subset selected based on a case-control sire sample specific to an {alpha} value. Based on these results, the best subsets were as follows: a) early mortality: H05 (low hygiene, {alpha} = 0.35) containing 34 SNP and H05 (high hygiene, {alpha} = 0.35) containing 35 SNP; b) late mortality: L35 (low hygiene, {alpha} = 0.35) containing 29 SNP and H05 (high hygiene, {alpha} = 0.05) containing 4 SNP.


View this table:
[in this window]
[in a new window]

 
Table 2. Predicted residual sum of squares for the 4 SNP subsets in each age-hygiene condition stratum
 
Across-Environment Prediction of Sire Means

The PRESS statistics for across-environment predictive ability of hygiene condition-specific SNP are in Table 3Go. For comparison, PRESS values from the validation step discussed in the previous section are presented as well.


View this table:
[in this window]
[in a new window]

 
Table 3. Predicted residual sum of squares (PRESS) statistics of across-environment prediction1
 
The SNP selected under the hygiene environment where the response variable was measured tended to have better predictive ability (smaller PRESS) than those selected under a different environment. For example, SNP selected in L had an average PRESS of 0.484 when predicting mortality in H; the average PRESS of SNP selected under H was 0.364 when predicting mortality in H. Similarly, the average PRESS of SNP selected in L was 0.693 when predicting mortality in L; on the other hand, PRESS of SNP selected in H was 0.993 when predicting mortality in L. This suggests existence of genotype (SNP) x environment interaction. A rough measure of G x E in predictive cross-validation is the ratio of the 2 PRESS values produced by SNP selected under 2 environments. Specifically, let PRESSxy be the statistic when SNP selected under environment x are used to predict mortality in environment y; x = L, H and y = L, H. Using averages in Table 3Go, the ratios for early mortality were as follows: PRESSHH:PRESSLH = 0.364:0.484 = 0.75, PRESSLL:PRESSHL = 0.693:0.993 = 0.70, with the geometric mean (Formulaxy) of the 2 ratios being

Formula

. Similarly, for late mortality, Formulaxy = 0.77. These figures suggest that G x E decreased the efficiency of SNP-assisted prediction by 20 to 30%.

Optimal SNP subsets were not consistent over the 2 hygiene environments. For example, for early mortality, the SNP subset selected from the {alpha} = 0.20 sample group was best (smallest PRESS) among all low hygiene subsets for prediction of mortality in the high hygiene condition. However, for prediction of mortality in the low hygiene condition, the SNP subset selected from the {alpha} = 0.35 sample group was best among all low hygiene subsets.

LD Between SNP Selected from Low and High Hygiene Environments

The 2 best SNP subsets (L35 and H35 for early mortality, L35 and H05 for late mortality) spread along the genome distinctly, with no overlap in terms of physical locations (data not shown). Figure 4Go shows a map of pairwise LD among SNP from the 2 subsets. Each square represents a pair of SNP, and darkness indicates magnitude of LD. The LD measure, r2, ranges from 0 (no LD) to 1 (complete LD). The 2 triangular regions represent within-subset LD, whereas the rectangular region represents between-subset LD. For example, in the top graph (early mortality), the lower triangle corresponds to the L35 subset consisting of 34 SNP; the upper triangle corresponds to the H35 subset consisting of 35 SNP. Each square in the rectangular region denotes a pair of SNP in which one comes from the L35 subset and the other one from the H35 subset.


Figure 4
View larger version (22K):
[in this window]
[in a new window]

 
Figure 4. Pairwise linkage disequilibrium (LD), measured in r2, among SNP of L35 and H35 subsets associated with early mortality and of L35 and H05 subsets associated with late mortality. L35: SNP subset selected from low hygiene condition and based on sire sample group with {alpha} = 0.35; H35 (or H05): SNP subset selected from high hygiene condition and based on sire sample group with {alpha} = 0.35 (or {alpha} = 0.05). Each square represents a pair of SNP loci, and darkness indicates magnitude of LD. r2 ranges from 0 (no LD) to 1 (complete LD). In each graph, the triangular regions represent within-subset LD, whereas the rectangular region represents between-subset LD.

 
For early mortality, the overall LD between the 2 SNP subsets (rectangular region) was very weak. Most pairwise r2 values were almost 0, few had values around 0.1, and only one pair of SNP (the darkest one) had complete LD (r2 = 0.999). In the case of late mortality, there was no detectable LD between the 2 subsets of SNP by means of r2.


    DISCUSSION
 Top
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 LITERATURE CITED
 
The filter-wrapper feature selection procedure of Long et al. (2007)Go was used to select sets of SNP associated with bird mortality, out of thousands of SNP distributed across the whole chicken genome. Although focus was on the interplay between genetic and environmental factors, 2 methodological improvements were made relative to Long et al. (2007)Go. First, performance data at the individual bird level were used to fit a GLMM, accounting for nuisance (nongenetic and nonsystematic) effects. A sire family phenotype was the mean of the model residuals. Second, all birds were classified into 2 environmental groups (L and H, corresponding to low and high hygiene condition levels, respectively) before calculating sire means. Furthermore, 2 traits, early and late mortality, were considered in each hygiene group. The filter-wrapper SNP feature selection, coupled with discretization of continuous sire means, was implemented in each context so that hygiene environment-specific SNP subsets could be identified for mortality at both ages.

In feature selection, it was not surprising that information gain decreased with {alpha} in each age-hygiene condition stratum: an increase of {alpha} shortens the distance between the 2 cut-off values c1 and c2. The low information gain over all groups implies that highly discriminative or major SNP affecting mortality are unlikely to exist. The usefulness of subset selection in increasing classification accuracy was corroborated.

Conventional approaches for detecting G x E rely on the assumption that genotypes rank differently among environments or exhibit differential environmental variability. For example, multiplicative (joint effect of genetic and environmental factors is the product of each) or additive models (joint effect of genetic and environmental factors is the sum of each) provide specifications under the hypothesis of no G x E (Hunter, 2005Go). However, with more than 5,000 SNP as in this study, it is not feasible to examine all possible interactions between the 5,000 SNP and 2 hygiene environments. Therefore, it was reasonable to assess G x E by carrying out the SNP subset selection separately in each hygiene condition (L and H), to then evaluate the extent to which SNP subsets selected in different hygiene environments are distinct. Here, G x E was examined via across-environment prediction and by examination of the extent of SNP loci LD between L and H subsets.

The PRESS statistics measured the predictive ability of each SNP subset and ranked the 4 subset candidates (corresponding to {alpha} = 0.05, 0.20, 0.35, and 0.50) within the same hygiene environment. Likewise, prediction of mortality in 1 hygiene environment using the 4 SNP subsets selected in another hygiene environment was used to gauge G x E. It was found that, given a set of candidate SNP subsets, the best one for predicting phenotypes in one environment was not the best one in another environment. A statistic for measuring cross-prediction consistency over environments ({rho}xy) had mean values of 0.72 (for early mortality) and 0.77 (for late mortality), suggesting that SNP x environment interaction lowered predictive ability by 20 to 30%, at least when measured in terms of PRESS. This is analogous to the fact that ranking of genotypes might change from one environment to another, due to G x E (Mulder and Bijma, 2005Go). The implication is that there may not be a universally best SNP subset for predicting mortality, illustrating the interplay between genetic and environmental factors.

The idea that SNP selected under the 2 environments are different was reinforced by the extremely low level or absence of LD between the 2 best SNP subsets (L and H subsets) for early and late mortality rates. If LD was strong and frequent between 2 subsets of SNP, one would expect that a set of SNP can work as effectively as the other set in predicting a response variable. In the extreme case, in which each SNP in one subset is found to be in LD with a SNP in the other subset, these 2 subsets can be treated as tag for each other.

Due to incomplete sex information on birds, sex-specific SNP could not be investigated. When calculating residuals, there may be a need to remove other nuisance effects (e.g., contemporary groups) or half of the EBV of a dam of a bird, or its environmental effect. This was not done for 2 reasons: first, calculation of breeding value introduces the assumption of additive inheritance, which may not hold for a trait such as mortality. Second, removing dam effects under the assumption that these are entirely environmental could also remove genetic effects, and this was deemed undesirable. Naturally, if there are strong reasons to believe that a trait is strictly additive, one could introduce the additive genetic and environmental effects of the dam in the GLMM model, provided it is feasible to do so. Lastly, when individuals are tested in many different environments, the machine-learning-based feature selection method may not be able to capture G x E suitably. Instead, other statistical approaches could be used (e.g., the mixed model of Gogel et al., 1995Go), in which environments are taken as random effects. When using dense marker data, semiparametric procedures such as mixed models with kernel regressions (Gianola et al., 2006Go; Gianola and van Kaam, 2008Go) could be tailored to capture cryptic forms of interactions that are difficult to model.


    Footnotes
 
1 Support by the Wisconsin Agriculture Experiment Station and by grants NRICGP/USDA 2003-35205-12833, NSF DEB-0089742, and NSF DMS-044371 is acknowledged. Aviagen Ltd. (Newbridge, UK) is thanked for providing the data and for support. Back

2 Corresponding author: nlong{at}wisc.edu

Received for publication March 9, 2008. Accepted for publication August 7, 2008.


    LITERATURE CITED
 Top
 Abstract
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 LITERATURE CITED
 


Colhoun, H. M., P. M. McKeigue, and G. D. Smith. 2003. Problems of reporting genetic associations with complex outcomes. Lancet 361:865–872.[CrossRef][Medline]

Elkan, C. 1997. Boosting and naive Bayesian learning. Technical report. Department of Computer Science and Engineering, University of California, San Diego.

Gianola, D., R. L. Fernando, and A. Stella. 2006. Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics 173:1761–1776.[Abstract/Free Full Text]

Gianola, D., and J. B. C. H. M. van Kaam. 2008. Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178:2289–2303.[Abstract/Free Full Text]

Gogel, B. J., B. R. Cullis, and A. P. Verbyla. 1995. REML estimation of multiplicative effects in multienvironment variety trials. Biometrics 51:744–749.

Hunter, D. J. 2005. Gene-environment interactions in human diseases. Nat. Rev. Genet. 6:287–298.[Medline]

Long, N., D. Gianola, G. J. M. Rosa, K. A. Weigel, and S. Avendano. 2007. Machine learning classification procedure for selecting SNPs in genomic selection: Application to early mortality in broilers. J. Anim. Breed. Genet. 124:377–389.[Medline]

Mitchell, T. M. 1997. Machine Learning. McGraw-Hill, Hightstown, NJ.

Mulder, H. A., and P. Bijma. 2005. Effects of genotype x environment interaction on genetic gain in breeding programs. J. Anim. Sci. 83:49–61.[Abstract/Free Full Text]

Ott, J. 1999. Analysis of Human Genetic Linkage. 3rd ed. The Johns Hopkins University Press, Baltimore, MD.

Ruppert, D., M. P. Wand, and R. J. Carroll. 2003. Semiparametric regression. Cambridge University Press, Cambridge, UK.

Shin, J.-H., S. Blay, B. McNeney, and J. Graham. 2006. LDheatmap: An R function for graphical display of pairwise linkage disequilibrium between single nucleotide polymorphisms. J. Stat. Softw. 16:1–10.

Steinheim, G., J. Odegard, T. Adnoy, and G. Klemetsdal. 2008. Genotype by environment interaction for lamb weaning weight in two Norwegian sheep breeds. J. Anim. Sci. 86:33–39.[Abstract/Free Full Text]

Thomas, D. C. 2004. Statistical Methods in Genetic Epidemiology. Oxford University Press, New York, NY.

Wang, X., B. Zuckerman, C. Pearson, G. Kaufman, C. Chen, G. Wang, T. Niu, P. H. Wise, H. Bauchner, and X. Xu. 2002. Maternal cigarette smoking, metabolic gene polymorphism, and infant birth weight. JAMA 287:195–202.[Abstract/Free Full Text]

Witten, I. H., and E. Frank. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, CA.

Ye, X., S. Avendano, J. C. M. Dekkers, and S. J. Lamont. 2006. Association of twelve immune-related genes with performance of three broiler lines in two different hygiene environments. Poult. Sci. 85:1555–1569.[Abstract/Free Full Text]



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
jas.2008-1021v1
86/12/3358    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Long, N.
Right arrow Articles by Avendaño, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Long, N.
Right arrow Articles by Avendaño, S.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS