J. Anim Sci.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Reverter, A.
Right arrow Articles by Lehnert, S. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Reverter, A.
Right arrow Articles by Lehnert, S. A.
J. Anim. Sci. 2004. 82:3430-3439
© 2004 American Society of Animal Science


ANIMAL GENETICS

Joint analysis of multiple cDNA microarray studies via multivariate mixed models applied to genetic improvement of beef cattle1

A. Reverter2, Y. H. Wang, K. A. Byrne, S. H. Tan, G. S. Harper and S. A. Lehnert

The Cooperative Research Centre for Cattle and Beef Quality, CSIRO Livestock Industries, Queensland Bioscience Precinct, St. Lucia, Queensland 4067, Australia


    Abstract
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Implications
 Literature Cited
 
In functional genomic laboratories, it is common to use the same microarray slide across studies, each investigating a unique biological question, and each analyzed separately due to computational limitations and/or because there is no hybridization of samples from different studies on one slide. However, the question of analyzing data from multiple studies is a major current issue in microarray data analysis because there are gains to be made in the accuracy of estimated effects by exploiting a covariance structure between gene expression data across studies. We propose an approach for combining multiple studies using multivariate mixed models, with the assumption of a nonzero correlation among genes across experiments, while imposing a null residual covariance. We applied this method to jointly analyze three experiments in genetics of cattle with a total of 54 arrays, each with 19,200 spots and 7,638 elements. The resulting seven-variate model contains 752,476 equations and 56 covariances. To identify differentially expressed genes, we applied model-based clustering to a linear combination of the random gene x variety interaction effect. We enhanced the biological interpretation of the results by applying an iterative algorithm to identify the gene ontology classes that significantly changed in each experiment. We found 118 elements with coordinate expression that clustered into distinct biological functions such as adipogenesis and protein turnover. These results contribute to our understanding of the mechanistic processes involved in adipogenesis and nutrient partitioning.

Key Words: Beef • Complementary DNA • Gene Expression • Mixed Models


    Introduction
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Implications
 Literature Cited
 
The analysis and interpretation of the large data sets from microarray studies is an area of intense development (Tilstone, 2003Go). Often, the same microarray slide is used across studies; however, because of computational limitations and/or because there is no direct association of samples from different studies, currently used methods for selecting differentially expressed (DE) genes are mostly of univariate nature. Combining data from multiple studies is a major issue (Moreau et al., 2003Go). Recent attempts are based on meta-analysis and either ignoring (Rhodes et al., 2002Go, 2004Go; Ghosh et al., 2003Go) or accounting for interstudy variation (Choi et al., 2003Go). Also, an approach has been proposed to translate results across microarray platforms (Wright et al., 2003Go). However, these methods possess two major limitations: first, the information on correlation and hence the ability to explore interactions between the expression of the same gene across studies is ignored; and second, the condition being explored has to be the same across studies.

Multivariate mixed-model equations (MME) are a natural framework to tackle these problems because they allow for a general covariance structure. We propose a model that incorporates the simultaneous information from seemingly independent experiments by allowing a nonzero correlation among gene expressions across experiments, while imposing a null residual covariance. In this article, we highlight the complexity of the subject by presenting the heterogeneous designs of three experiments in beef cattle; second, we introduce the selected seven-variate model to perform the joint analysis of the previously mentioned experiments; third, we present how DE genes are identified using model-based clustering, and where emphasis is given to details of gene ontology classes and genes that are jointly expressed across experiments; and finally, the results are presented and discussed in a biological context.


    Materials and Methods
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Implications
 Literature Cited
 
Gene Expression Studies in Beef Cattle
The importance of genome research in livestock species, not only in agriculture but also in basic biology and human medicine, has been emphasized (Andersson and Georges, 2004Go). We developed a bovine cDNA microarray with 19,200 spots for the profiling of bovine muscle and fat tissue (Reverter et al., 2003Go; Lehnert et al., 2004Go). In brief, a total of 9,600 elements were printed in duplicate onto CMT GAPSII (Corning, Lindfield, Australia) glass slides using a Virtek microarrayer with the Stealth 48-pin head and micro spotting pins (Telechem Int., Inc., Sunnyvale, CA) at a spacing of 220 µM at the Institute of Molecular Bioscience of the University of Queensland, Brisbane, Australia. The array consisted of 9,222 cattle probes, comprised 7,291 anonymous cDNA from the bovine skeletal muscle and s.c. fat cDNA libraries and 1,915 bovine expressed sequence tags selected from the muscle and fat libraries, the CSIRO cattle skin library, the MARC 1-4BOV libraries (U.S. Meat Animal Research Center, Clay Center, NE), and contributing laboratories. In addition 16 expressed sequence tag-derived oligonucleotides and the Lucidea Microarray Scorecard V1.1 (Amersham Biosciences, Piscataway, NJ) were printed on the array. The DNA was covalently cross-linked to the slides using ultraviolet irradiation with Stratalinker (Stratagene, LaJolla, CA). Spots on the array were arranged in 48 printing blocks of 20 rows x 20 columns each.

To date, we have used this array in three experiments (Exp. 1, 2, and 3), each with different design structures and alternatives of biological and technical replication (dye-swaps). Powerful experimental design depends on a consideration of the statistical consequences of comparison structure (Yang and Speed, 2002Go). Practical considerations included in the development of our designs included transitivity (Townsend, 2003Go) and extendibility (Kerr, 2003Go). In this context, transitivity is defined by the ability to compare any two samples of interest directly from the same hybridization, as well as indirectly through transitive association across two or more hybridizations. The concept of extendibility implies that additional samples can be added to a given design in a sensible way.

In Exp. 1, we used 14 slides to contrast the gene expression profile of muscle in 10 steers fed varying quality diets, from high to medium to low. In Exp. 2, we compared the expression profile in muscle tissue between six animals, three of each of two breeds of cattle (Japanese Black and Holstein) at three time points corresponding to 11, 15, and 20 mo of age. The design of Exp. 2 comprised all possible dye-swaps within time points and across breed (12 slides), plus six slides providing information across time, within breed. Within time and breed, the RNA of the three animals was pooled, and two samples of this pool were used in the within-time–across-breed comparisons (dye-swaps). Finally, Exp. 3 contained 22 slides and studied the mechanisms underlying in vitro adipogenesis. Duplicate flasks of fibroblasts cell cultures were compared, with one (treated culture) induced to undergo adipogenic differentiation, while the other (untreated culture) received no treatment. The gene expression profiling of each culture was explored over six time points from 0 h to 14 d. Figure 1Go illustrates the configuration for the experimental design developed for each experiment. Note that the design for Exp. 1 and 3 contained two distinct configurations: a reference and a within-time pairwise comparison with dye-swap replicates.



View larger version (19K):
[in this window]
[in a new window]
 
Figure 1. Design of the three experiments using the same microarray. The direction of the arrows indicates the labeling from red to green. Experiment 1 studied muscle samples in 10 cattle fed varying quality diets, from HIGH (with three animals), to MEDIUM (with three animals), to LOW (with four animals). The RNA from the three animals in HIGH was pooled. A replicate from the third animal in MEDIUM was obtained and a reference design was initially performed with eight arrays. Later, the RNA samples from LOW and MEDIUM were pooled and dye-swaps undertaken. Experiment 2 contained 18 arrays and compared muscle samples in two breeds (BREED1 = Holstein; BREED2 = Japanese Black) with three animals each, and at three time points (11, 15 and 20 mo of age). Within time and breed, the RNA of the three animals was pooled, and two samples of this pool were used. Finally, Exp. 3, with 22 arrays, studied the mechanisms underlying adipogenesis in vitro. Six time points and a treatment vs. a control cell culture are explored in a two-component design: a central reference in the treated culture with seven arrays, and a multiple dye-swap comparing treated vs. untreated culture with 15 arrays.

 
For all the hybridizations, the commercial image analysis software GenePixPro 3.0 and a GenePix 4000 optical scanner (both from Axon Instruments Inc., Union City, CA) were used to quantify the gene expression fluorescent intensities (i.e., signals). Criteria for data acquisition included signal-to-noise ratio (computed by dividing the background-corrected intensities by the SD of the background pixels) greater than 1.0, and mean-to-median correlation greater than 0.85 (Tran et al., 2002Go). These criteria were applied separately for the red and green channels so that a different number of signals for each channel could be obtained. These resulted in 959,498 signals (457,391 red and 502,107 green) that were background corrected and base 2log transformed.

The evaluation of the peculiarities of the three experiments (Figure 1Go) resulted in the consideration of seven variables: y11 with 197,802 signals from the eight arrays of the reference design component of Exp. 1; y12 with 74,030 signals from the six arrays of the dye-swap component of Exp. 1; y21 with 110,308 signals from the four arrays of the dye-swaps in the breed comparison on the first time point plus the four arrays connecting the remaining time points within breed in Exp. 2; y22 with 116,409 signals from the four arrays of the dye-swaps in the breed comparison on the second time point plus the four arrays connecting the remaining time points within breed in Exp. 2; y23 with 117,687 signals from the four arrays of the dye-swaps in the breed comparison on the third time point plus the four arrays connecting the remaining time points within breed in Exp. 2; y31 with 106,591 signals from the seven arrays of the reference component of Exp. 3; and y32 with 236,671 signals from the 15 arrays of the time x fat treatment component of Exp. 3. Thus, in this notation, first and second subscripts indicate experiment of origin and design component within experiment, respectively.

Summary statistics for each of the seven variables are given in Table 1Go. Note that the maximal value for the observations was constant and equal to 15.99, indicating the presence of color saturation in all variables. Note also that the apparent heteroskedasticity in the gene expression intensity readings both between and within experiments anticipates the convenience of resorting to multivariate models for the analysis of these data. In particular, green intensities from y11 originate from individual RNA, whereas all intensities from y12 originate from pooled RNA. Smaller variances are expected when pooled RNA samples are used relative to individual RNA (Kendziorski et al., 2003Go; Peng et al., 2003Go).


View this table:
[in this window]
[in a new window]
 
Table 1. Summary statistics for the fluorescent intensity signals from each of the seven variables considered
 
The Seven-Variate MME
We used ANOVA models (Kerr et al., 2001Go; Wolfinger et al., 2001Go) because they provide a general framework when there are more than two conditions to compare. In particular, MME allow full use of the information available, with multiple factors and a hierarchy of sources of variation (Cui and Churchill, 2003Go). The proposed model is equivalent across experiments, although different gene variances across experiments, and across components within experiment, are allowed. In matrix notation, and for the ith experiment (i = 1, 2, 3) and the jth design component within experiment (j = 1, 2 for i = 1 and 3; and j = 1, 2, 3 for i = 2) the following model was fitted:


[1]

where yij is the (nij x1) vector of signals from the ith experiment and the jth design component within experiment; ni is the size of yij; Xij is an (nij xpij) incidence matrix relating the pij fixed comparison group (CG) effects in ßij with signals in yij; Gij is an (nij xg) matrix relating the random gene (or array elements) effects in gij with yij; Aij is an (nij xaij) matrix relating the random gene xarray-printing-block interaction effects in aij with yij; Vij is an (nij x vij) matrix relating the random gene x variety effects in vij with yij; and eij is the (nij x 1) vector of random errors.

Features of Model [1] include the following: 1) although both experiments use the same array (making the column rank of each Gij equal to g), each may contain a different number of signals, arrays, and varieties; 2) a single fixed effect is fitted, CG and defined as the group of intensity signals originating from the same array, printing block, and fluorescent dye channel. These effects lack biological importance and their fitting aims to normalize the data by accounting for systematic effects. Main effects and first- and higher-order interactions are accounted in this definition of CG; 3) the term aij models the spot effect, accounts for spot-to-spot variability, allows us to estimate the treatment effects, and obviates the need to form ratios (Wolfinger et al., 2001Go); 4) unlike the model of Chilingaryan et al. (2002)Go, our approach accommodates variation between studies, and the source of multidimensionality is the various studies and components within studies, as opposed to the individual genes.

We make standard stochastic assumptions about Model [1]. Further details about this model are given in Table 2Go. A total of 7,638 genes (or array elements) was included. However, not all of them were represented in each yij. The number of genes represented in yij averaged 5,856 and ranged from 3,580 for y12, to 6,749 for y11. The percentage of genes providing signals in only one, only two, and so on until all the seven variables were accounted for, was 7.0, 6.2, 6.0, 9.9, 13.0, 22.5, and 35.4%, respectively. Genes jointly evaluated across yij provided the basis for the estimation of 56 covari-ances. Covariance matrices are {sum}g, {sum}a, {sum}v, and {sum}e, for the g, a, v, and e, respectively. Their components are as follows:


View this table:
[in this window]
[in a new window]
 
Table 2. Model information for the seven-variate mixed model
 




where diag indicates a diagonal matrix. Note there is an implied identity matrix of appropriate dimension multiplying each (co)variance component. Note also that higher accuracies in the resulting BLUP will be obtained as the number and size of off-diagonal elements (covariances) increases.

Because no hybridization was performed across experiments, no off-diagonal elements are allowed in {sum}e. On the other extreme, {sum}g is fully dense due to gene covariances between experiments and between components within experiments. In Exp. 2, the six slides connecting samples across time allow for the estimation of the off-diagonal elements in {sum}a. Off-diagonal elements in {sum}v are covariances between varieties within experiments. These are nonzero for Exp. 1 and 2, but zero for Exp. 3, as v31 contains gene x age interaction and v32 contains the interaction of gene x age x fat treatment. Finally, the MME for the seven-variate Model [1] contained 752,476 equations arranged in a 28 x 28 block-matrix structure, and 6,587,868 nonzero elements. We obtained restricted maximum likelihood estimates of covariance components and BLUP using the VCE software (Groeneveld and García-Cortés, 1998Go).

Measures of Differential Expression
For each experiment, the following linear combination of the BLUP in 1j were calculated for each gene in k (k = 1 to 7,638):


[2]

For Exp. 1, large positive d1k values are likely to belong to genes whose expression is upregulated in animals being fed the high-quality diet. Accordingly, large negative d1k values are likely to belong to upregulated genes in animals under nutritional restriction. Similar reasoning applies for d2k when contrasting Holstein vs. Japanese Black in Exp. 2, and for d3k when comparing fat treatment vs. control across the 6 time points (t = 0 to 5) in Exp. 3. Note that although the statistics in d1k, d2k, and d3k have the standard form of the hypothesis to be tested in ANOVA approaches to microarray data analyses (Cui and Churchill, 2003Go), we did not conduct hypothesis testing because there is no evidence to support the assumptions of normality and homogeneity of variance, and there exists the problem of multiple comparisons. Also, this hypotheses testing would clash with the assumption of vij being random because the probability of a random variable being equal to any given value is zero.

Model-based clustering via a mixture of distributions has been proposed as a method of choice to identify DE genes (Yeung et al., 2001Go; McLachlan et al., 2002Go; Medvedovic et al., 2004Go). Similar to the approach of Moser et al. (2004)Go, our goal here was to apply model-based cluster analysis to the values in d1k, d2k, and d3k, and to determine which genes have relative levels far away from the majority. Each data point in d1k, d2k, and d3k are assumed to be independent observations from a mixture density with c (possible unknown but finite) components and with probability density function


where denotes the trivariate normal density function with mean µm and covariance matrix Vm, and the mixing proportions {pi}i are constrained to be nonnegative and sum to unity. All unknown parameters are represented in {phi}c for a c-component (or c-cluster) mixture model. In the current study, mixture models with up to five components (or clusters) were contemplated. Parameters of the mixture were estimated using the EMMIX software (McLachlan et al., 1999Go). This program has various criteria for model selection, including the Akaike information criterion (Akaike, 1969Go), the Bayesian information criterion (Schwartz, 1978Go), the likelihood ratio test (Stram and Lee, 1994Go), and the bootstrap approximation of the distribution of the likelihood ratio test (McLachlan, 1987Go). Once the model of choice has been identified, the probability of each data point belonging to each cluster is given by its posterior probability.

Finally, the biological interpretation of the results was facilitated using an iterative group analysis approach (Breitling et al., 2004Go) to identify functional classes of genes that are DE. Functional classes were obtained from gene ontology assignments (Ashburner et al., 2000Go; http://www.geneontology.org). There were 928 gene ontology classes containing an average of nine genes each. Following Breitling et al. (2004)Go, values listed in d1k, d2k, and d3k were sorted once by upregulation and once by downregulation. Moving along each sorted list, and for each gene ontology class, every time a new gene member of the class is encountered, the probability of having that many members that high up in the list by chance alone is given by the hypergeometric probability in


where n is the total number of genes (n = 7,638), x is the total number of class members in the gene ontology, and t is the rank of the zth class member (z being the current step in the iteration). The position in the list that yields the smallest P-value determines the probability of change by chance alone for the entire gene ontology class and all class members above that position are considered as DE.


    Results and Discussion
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Implications
 Literature Cited
 
Estimates of Covariance Components
At convergence, the following restricted maximum likelihood estimates were obtained:




and

e = diag{0.14 0.16 0.26 0.22 0.19 0.12 0.13}

The main effect of gene (g) accounted for the vast majority of this variation for all variables except y31 for which both g and a (for gene x array-printing-block) accounted each for approximately 45% of the total variation. Also apparent was the large gene correlation existing within and between experiments (with the exception of Exp. 3). These estimates revealed the strong goodness of fit of the model. The amount of total variation accounted by the model (given by the diagonal elements of (g + a + v)(g + a + v + e)– 1) ranged from 94.5% for y21 to 97.1% for y32.

Of interest is the large correlation (>0.95 in all cases) observed in Exp. 2 for a between time points (i.e., between y21, y22, and y23). Hence, the variation in size and quality of the spots for the same gene was basically identical across hybridizations. Also, the 0.76 (from 0.10/{surd} [0.09 x 0.19]) correlation for the gene x diet inter- action between y11 and y12 indicates that a similar set of DE genes is expected to be found in both components of Exp. 1. This same correlation was much smaller (averaging 0.37) between y21, y22, and y23 and was attributed to the age effect in Exp. 2.

Estimates in {nu} provide the amount of total variation that can be attributed to DE genes. They averaged 3.0% and ranged from 2.3 to 5.4%. These percentages provide an indication of the expected proportion of DE genes and serve to control the false discovery rate (FDR), a major concern in gene expression (Van den Oord and Sullivan, 2003Go; Meuwissen and Goddard, 2004Go) as well as in genome scan mapping studies (Fernando et al., 2004Go). The reasoning behind using these percentages to control the FDR can be found in Efron (2004)Go, in which FDR is defined as fdr(z) {equiv} f0(z)/f(z), where f(z) = p0f0(z) + p1f1(z) is the estimated density from the observed ensemble of z-values, f0(z) is the distribution of the z-values under the null hypothesis (i.e., nonsignificant or non-DE in our context), and f1(z) is the distribution under the alternative hypothesis (i.e., DE in our context).

Finally, and to assess whether this full model offers any improvement over the individual analysis of the different experiments, a reduced model was fitted by forcing to zero the 16 covariance components defining between-study gene expression variation and given by {sigma}gi,j for i < 3 and j > 2 (i.e., a total of 10) and {sigma}gi,j for 2 < i < 6 and j > 5 (i.e., a total of six). The likelihood ratio test (Stram and Lee, 1994Go) concluded strongly in favor of the full model (P < 0.001). This result is not surprising given the magnitude of the estimates of covariance components from the full model.

Measures of Differential Expression, Model-Based Clustering, and Gene Ontology Analyses
Values in d1k, d2k, and d3k averaged 0.00 and ranged from –1.85 to 3.54 for d1, from –1.79 to 1.71 for d2, and from –1.67 to 2.20 for d3. The fitting of mixtures of distributions identified three components (or clusters) as follows:


Figure 2Go illustrates the empirical marginal densities for d1k, d2k, and d3k, as well as the posterior probability (decision functions) of each value belonging to each cluster. Note that there is no y scale for the densities. This scale corresponds to probability 1 for the decision functions and densities are drawn to proportionality. The first cluster encapsulates the majority of genes (67%) for which no DE was observed. The second cluster, with a much larger variance for d1, identified 450 DE genes from Exp. 1 (of which 409 were upregulated). This cluster also contained 68 upregulated genes from Exp. 2. The third cluster contained 319 downregulated genes from Exp. 2 plus 361 DE genes from Exp. 3 (with 252 being upregulated).



View larger version (29K):
[in this window]
[in a new window]
 
Figure 2. Empirical densities (boxes) for measures of DE in d1, d2, and d3 for Exp. 1 to 3, and posterior probability (lines) of being in each cluster ({diamond} = Cluster 1; + = Cluster 2; {square} = Cluster 3).

 
Values in d1k, d2k, and d3k were sorted once by upregulation and once by downregulation and iterative group analysis (Breitling et al., 2004Go) performed separately. Table 3Go presents five GeneOntology classes with a probability of change by chance alone <1.0 x10–4, along with the number of total and DE elements per class. A total of 42 elements was jointly DE between Exp. 1 and 2, 52 between Exp. 1 and 3, and 24 between Exp. 2 and 3. Also, there were 10 elements jointly DE across the three experiments. After annotation analyses, these 118 elements were found to represent 23 unique genes. A selection of these genes is presented in Table 4Go.


View this table:
[in this window]
[in a new window]
 
Table 3. Description gene ontology (GO) classes with a probability of change by chance alone less than 1.0 x10–4 and for each experiment (Exp. 1, 2, and 3), number of elements (No.), and differentially expressed (DE) elements for each GO class
 

View this table:
[in this window]
[in a new window]
 
Table 4. Selected genes jointly differentially expressed across two or more experiments and for each experiment (Exp. 1, 2, and 3)
 
The extracellular matrix protein, type I collagens were highly expressed in the muscle samples from high quality diet and Holstein individuals. In other words, collagen is downregulated in low-quality diet and Japanese Black animals. The former is due to nutrition restriction (Laurent, 1987Go; Krupsky et al., 1997Go); the latter is due to early i.m. fat development in Japanese Black. Interestingly, a large number of jointly DE genes were common to Exp. 1 and 3, suggesting that genes associated with adipogenesis are heavily modulated under nutritional challenge. These genes are involved in transcription, signaling, energy metabolism, and the immune system.

Contrary to what was expected, the least number of comodulated genes occurred between Exp. 2 and 3, and included genes involved in cell proliferation and alpha B crystallin. The latter has been implicated in the ubiquitin pathway, where it promotes FBX-4-dependent ubiquitination (den Engelsman et al., 2003Go), as well as in osteoblastic differentiation (Furushima et al., 2002Go). The finding that alpha B crystallin is DE in both Exp. 2 and 3 may point to changes in protein turnover processes that accompany cell differentiation processes in both systems. The DE genes from Exp. 1 and 2 revealed genes associated with extracellular matrix, detoxification, and energy metabolism. In this case, these genes are modulated between different breeds and during different diets. Stearoyl-CoA desaturase and adipocyte enhancer binding protein 1 are examples of two genes that were found to be DE in all three experiments. These genes are involved in fatty acid synthesis, suggesting that genes modulated during in vitro adipogenesis are intrinsic to changes occurring during nutritional manipulation and breed differences.

An additional use of this analysis is to identify housekeeping genes from those elements having a consistent expression across experiments. These are likely to be included in a set of genes with a very large posterior probability of being clustered in the second components with near zero mean and small variance. Alternatively, the application of algorithms to identify housekeeping genes (Vandesompele et al., 2002Go; Wilson et al., 2003Go) to d1, d2, and d3 becomes straightforward.

Finally, because of the large number of genes, the statistical technique presented here still poses challenges to the control of FDR. First, many genes that agree between two experiments could be due to chance and would not pass a hypergeometric test (Man et al., 2000Go). Second, the linear combinations defined in Model [2] are only illustrative. Other combinations could also have a biological meaning (e.g., high- vs. medium-quality diet in Exp. 1, and time contrasts in Exp. 2 and 3). These new definitions could potentially identify a different set of DE genes. Applying the method of Reverter et al. (2004)Go to the number and distribution of DE genes found in this study (i.e., 450, 387, and 361 genes for Exp. 1, 2, and 3, respectively), yielded an estimate of the sensitivity equal to 100, 280, and 400 transcripts per million for Exp. 1, 2, and 3, respectively.

Finally, robustness to the model assumption is critical for the control of FDR. Significance analysis of microarrays (Tusher et al., 2001Go) is a powerful approach; however, it does not take care of the correlation and heteroskedastic structure within and across arrays, let alone within and across microarray studies. The mixed-model approach proposed here overcomes this drawback; no significance level needs to be specified before identifying DE genes, FDR is controlled from the proportion of total variance due to gene x variety interaction, and violations to the assumptions of the model can be easily ascertained by exploring residuals.


    Implications
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Implications
 Literature Cited
 
Given the number and diversity of microarray experiments that are now being performed across the globe, the potential exists for comparative analyses of broad scope. Comparative analyses among experiments, anchored to one gene, could provide evidence for unpredicted gene functions. Quantification of gene expression from limited experimental comparisons cannot provide this opportunity. We have illustrated the feasibility of multivariate mixed models as a natural and promising mechanism for the combination of multiple studies. We have applied this method to the joint analysis of three seemingly independent experiments from livestock improvement with three different objectives, different design structures, and various alternatives of technical and biological replication. We have identified genes with coordinate differential expression that are consistent with the biological mechanisms of adipogenesis and nutrient partitioning.


    Footnotes
 
1 This research was supported by the Cooperative Research Centre for Cattle and Beef Quality and its core partners: The Univ. of New England, New South Wales Agric., CSIRO, and Queensland Dept. of Primary Industries. The authors thank R. Hunter for providing the animals, H. L. Bruce for providing RNA samples from the nutrition experiment, P. Allingham for performing the biopsies, S. Tsuji at Kobe Univ., Japan, for providing RNA samples, and A. Day and B. van den Heuvel for technical assistance. Back

2 Correspondence: 306 Carmody Rd. (phone: +61 7 3214 2392; fax: +61 7 3214 2900; e-mail: Tony.Reverter-Gomez{at}csiro.au).

Received for publication June 12, 2004. Accepted for publication August 31, 2004.


    Literature Cited
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Implications
 Literature Cited
 


Akaike, H. 1969. Fitting autoregressive models for prediction. Ann. Inst. Stat. Math. 21:243–247.

Andersson, L., and M. Georges. 2004. Domestic-animal genomics: Deciphering the genetics of complex traits. Nature Rev. 5:202–212.

Ashburner, M., C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock. 2000. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25:25–29.[Medline]

Benson, D. A., I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, B. A. Rapp, and D. L. Wheeler. 2002. GenBank. Nucleic Acids Res. 30:17–20.[Abstract/Free Full Text]

Breitling, R., A. Amtmann, and P. Herzyk. 2004. Iterative group analysis (iGA): A simple tool to enhance sensitivity and facilitate interpretation of microarray experiments. Available: http://www.biomedcentral.com/1471-2105/5/34. Accessed July 12, 2004.

Chilingaryan, A., N. Gevorgyan, A. Vardanyan, D. Jones, and A. Szabo. 2002. Multivariate approach for selecting sets of differentially expressed genes. Math. Biosci. 176:59–69.[Medline]

Choi, J. K., U. Yu, S. Kim, and O. J. Yoo. 2003. Combining multiple microarray studies and modeling interstudy variation. Bioinfor-matics 19(Suppl.1):i84–i90.[Abstract]

Cui, X., and G. A. Churchill. 2003. Statistical tests for differential expression in cDNA microarray experiments. Available: http:// genomebiology.com/2003/4/4/210. Accessed July 12, 2004.

den Engelsman, J., V. Keijsers, W. W. de Jong, and W. C. Boelens. 2003. The small heat-shock protein {alpha} ß-crystallin promotes FBX4-dependent ubiquitination. J. Biol. Chem. 278:4699–4704.[Abstract/Free Full Text]

Efron, B. 2004. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Am. Stat. Assoc. 99:96–104.

Fernando, R. L., D. Nettleton, B. R. Southey, J. C. M. Dekkers, M. F. Rothschild, and M. Soller. 2004. Controlling the proportion of false positives in multiple dependent tests. Genetics 166:611–619.[Abstract/Free Full Text]

Furushima, K., K. Shimo-Onoda, S. Maeda, T. Nobukuni, K. Ikari, H. Koga, S. Komiya, T. Nakajima, S. Harata, and I. Inoue. 2002. Large-scale screening for candidate genes of ossicification of the posterior longitudinal ligament of the spine. J. Bone Miner. Res. 17:128–137.[Medline]

Ghosh, D., T. R. Barrette, D. Rhodes, and A. M. Chinnaiyan. 2003. Statistical issues and methods for meta-analysis of microarray data: A case study in prostate cancer. Funct. Integr. Genom. 3:180–188.

Groeneveld, E., and L. A. García-Cortés. 1998. VCE 4.0, a (co)variance components package for frequentists and Bayesians. Pages 455–458 in Proc. 6th World Cong. Genet. Appl. Livest. Prod., Armidale, NSW, Australia.

Kendziorski, C. M., Y. Zhang, H. Lan, and A. D. Attie. 2003. The efficiency of pooling mRNA in microarray experiments. Biostatistics 4:465–477.[Abstract]

Kerr, M. K. 2003. Design considerations for efficient and effective microarray studies. Biometrics 59:822–828.[Medline]

Kerr, M. K., M. Martin, and G. A. Churchill. 2001. Analysis of variance for gene expression microarray data. J. Comput. Biol. 7:819–837.

Krupsky, M., P. P. Kuang, and R. H. Goldstein. 1997. Regulation of type I collagen mRNA by amino acid deprivation in human lung fibroblasts. J. Biol. Chem. 272:13864–13868.[Abstract/Free Full Text]

Laurent, G. J. 1987. Dynamic state of collagen: Pathways of collagen degradation in vivo and their possible role in regulation of collagen mass. Am. J. Physiol. 252:C1–C9.

Lehnert, S. A., K. A. Byrne, and Y. H. Wang. 2004. Development and application of a bovine cDNA microarray for expression profiling of muscle and adipose tissue. Aust. J. Exp. Agric.

Man, M. Z., X. Wang, and Y. Wang. 2000. POWER_SAGE: Comparing statistical tests for SAGE experiments. Bioinformatics 16:953–959.[Abstract/Free Full Text]

McLachlan, G. J. 1987. On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Appl. Stat. 36:318–324.

McLachlan, G. J., D. Peel, K. E. Basford, and P. Adams. 1999. The EMMIX software for the fitting of mixtures of normal and t-components. Available: http://www.jstatsoft.org/v04/i02. Accessed July 12, 2004.

McLachlan, G. J., R. W. Bean, and D. Peel. 2002. A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18:413–422.[Abstract/Free Full Text]

Medvedovic, M., K. Y. Yeung, and R. E. Bumgarner. 2004. Bayesian mixture model based clustering of replicated microarray data. Bioinformatics 20:1222–1232.[Abstract/Free Full Text]

Meuwissen, T. H. E., and M. E. Goddard. 2004. Bootstrapping of gene-expression data improves and controls the false discovery rate of differentially expressed genes. Genet. Sel. Evol. 36:191–206.[Medline]

Moreau, Y., S. Aerts, B. De Moor, B. De Strooper, and M. Dabrowski. 2003. Comparison of meta-analysis of microarray data: From the bench to the computer desk. Trends Genet. 19:570–577.[Medline]

Moser, R. J., A. Reverter, C. A. Kerr, K. J. Beh, and S. A. Lehnert. 2004. A mixed-model approach for the analysis of cDNA microarray gene expression data from extreme performing pigs after infection with Actinobacillus pleuropneumoniae. J. Anim. Sci. 82:1261–1271.[Abstract/Free Full Text]

Peng, X., C. L. Wood, E. M. Blalock, K. C. Chen, P. W. Landfield, and A. J. Stromberg. 2003. Statistical implications of pooling RNA samples for microarray experiments. Available: http://www.biomedcentral.com/1471-2105/4/26. Accessed July 12, 2004.

Reverter, A., K. A. Byrne, H. L. Bruce, Y. H. Wang, B. P. Dalrymple, and S. A. Lehnert. 2003. A mixture model-based cluster analysis of cDNA microarray gene expression data on Brahman and Brahman composite steers fed high, medium and low quality diets. J. Anim. Sci. 81:1900–1910.[Abstract/Free Full Text]

Reverter, A., S. M. McWilliam, W. Barris, and B. P. Dalrymple. 2004. A rapid method for computationally inferring transcriptome coverage and microarray sensitivity. Bioinformatics.

Rhodes, D. R., T. R. Barrette, M. A. Rubin, D. Ghosh, and A. M. Chinnaiyan. 2002. Meta-analysis of microarrays: Interstudy validation of gene expression profiles reveals pathways dysregulation in prostate cancer. Cancer Res. 62:4427–4433.[Abstract/Free Full Text]

Rhodes, D. R., J. Yu, K. Shanker, N. Deshpande, R. Varambally, D. Ghosh, T. Barrette, A. Pandey, and A. M. Chinnaiyan. 2004. Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc. Natl. Acad. Sci. USA 101:9309–9314.[Abstract/Free Full Text]

Schwartz, G. 1978. Estimating the dimensions of a model. Ann. Stat. 6:461–464.

Stram, D. O., and J. W. Lee. 1994. Variance components testing in the longitudinal mixed effects model. Biometrics 50:1171–1177.[Medline]

Tilstone, C. 2003. Vital statistics. Nature 424:610–612.[Medline]

Tran, P. H., D. A. Peiffer, Y. Shin, L. M. Meek, J. P. Brody, and K. W. Y. Cho. 2002. Microarray optimisations: Increasing spot accuracy and automated identification of true microarray signals. Nucleic Acids Res. 30:e54.[Abstract/Free Full Text]

Townsend, J. P. 2003. Multifactorial experimental design and the transitivity of ratios with spotted DNA microarrays. BMC Genomics 4:41–49.[Medline]

Tusher, V. G., R. Tibshirani, and G. Chu. 2001. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98:5116–5121.[Abstract/Free Full Text]

Van den Oord, E. J. C. G., and P. F. Sullivan. 2003. False discoveries and models for gene discovery. Trends Genet. 19:537–542.[Medline]

Vandesompele, J., K. De Preter, F. Pattyn, B. Poppe, N. Van Roy, A. De Paepe, and F. Speleman. 2002. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Available: http://genomebiology.com/2002/3/7/research/0034. Accessed July 12, 2004.

Wilson, D. L., M. J. Buckley, C. A. Helliwell, and I. W. Wilson. 2003. New normalization methods for cDNA microarray data. Bioinformatics 19:1325–1332.[Abstract/Free Full Text]

Wolfinger, R. D., G. Gibson, E. D. Wolfinger, L. Bennett, H. Hamadeh, P. Bushel, C. Afshari, and R. S. Paules. 2001. Assessing gene significance from cDNA microarray expression data via mixed models. J. Comput. Biol. 8:625–637.[Medline]

Wright, G., B. Tan, A. Rosenwald, H. E. Hurt, A. Wiestner, and L. M. Staudt. 2003. A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma. Proc. Natl. Acad. Sci. USA 100:9991–9996.[Abstract/Free Full Text]

Yang, Y. H., and T. Speed. 2002. Design issues for cDNA microarray experiments. Nature Rev. 3:579–588.

Yeung, K. Y., C. Fraley, A. Murua, A. E. Raftery, and W. L. Ruzzo. 2001. Model-based clustering and data transformations for gene expression data. Bioinformatics 17:977–987.[Abstract/Free Full Text]


This article has been cited by other articles:


Home page
Physiol. GenomicsHome page
E. de la Vega, M. R. Hall, K. J. Wilson, A. Reverter, R. G. Woods, and B. M. Degnan
Stress-induced gene expression profiling in the black tiger shrimp Penaeus monodon
Physiol Genomics, September 11, 2007; 31(1): 126 - 138.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
T. Vuocolo, K. Byrne, J. White, S. McWilliam, A. Reverter, N. E. Cockett, and R. L. Tellam
Identification of a gene network contributing to hypertrophy in callipyge skeletal muscle
Physiol Genomics, February 12, 2007; 28(3): 253 - 272.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
A. Reverter, N. J. Hudson, Y. Wang, S.-H. Tan, W. Barris, K. A. Byrne, S. M. McWilliam, C. D. K. Bottema, A. Kister, P. L. Greenwood, et al.
A gene coexpression network for bovine skeletal muscle inferred from microarray data
Physiol Genomics, December 13, 2006; 28(1): 76 - 83.
[Abstract] [Full Text] [PDF]


Home page
J ANIM SCIHome page
S. A. Lehnert, K. A. Byrne, A. Reverter, G. S. Nattrass, P. L. Greenwood, Y. H. Wang, N. J. Hudson, and G. S. Harper
Gene expression profiling of bovine skeletal muscle in response to and during recovery from chronic and severe undernutrition
J Anim Sci, December 1, 2006; 84(12): 3239 - 3250.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Reverter, A. Ingham, S. A. Lehnert, S.-H. Tan, Y. Wang, A. Ratnakumar, and B. P. Dalrymple
Simultaneous identification of differential gene expression and connectivity in inflammation, adipogenesis and cancer
Bioinformatics, October 1, 2006; 22(19): 2396 - 2404.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Reverter, W. Barris, S. McWilliam, K. A. Byrne, Y. H. Wang, S. H. Tan, N. Hudson, and B. P. Dalrymple
Validation of alternative methods of data normalization in gene co-expression studies
Bioinformatics, April 1, 2005; 21(7): 1112 - 1120.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Reverter, A.
Right arrow Articles by Lehnert, S. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Reverter, A.
Right arrow Articles by Lehnert, S. A.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS