J. Anim Sci.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J. Anim Sci. 2007. 85:E20-E23. doi:10.2527/jas.2006-479
© 2007 American Society of Animal Science

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Smith, G. W.
Right arrow Articles by Rosa, G. J. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Smith, G. W.
Right arrow Articles by Rosa, G. J. M.

TRIENNIAL REPRODUCTION SYMPOSIUM

Interpretation of microarray data: Trudging out of the abyss towards elucidation of biological significance1

G. W. Smith*,2 and G. J. M. Rosa{dagger}

* Laboratory of Mammalian Reproductive Biology and Genomics, Departments of Animal Science and Physiology, Michigan State University, East Lansing 48824-1225; and and {dagger} Department of Dairy Science, University of Wisconsin, Madison 53706


    Abstract
 Top
 Abstract
 INTRODUCTION
 HYPOTHESIS AND RATIONALE
 CHOICE OF THE MICROARRAY...
 COMPLEXITY OF THE SAMPLES
 EXPERIMENTAL DESIGN AND DATA...
 SUMMARY
 LITERATURE CITED
 
The recent development of tools for expression profiling in livestock has availed reproductive biologists of new opportunities to examine global changes in gene expression during key developmental events, in response to hormonal or other treatments, and as a tool for phenotyping or predicting developmental potential. Such experiments often yield lists of tens to thousands of modulated genes, transcripts of interest, or both. Some argue that such technological advances signal a move from hypothesis-driven research to descriptive discovery research, resulting in information overload at the expense of biological significance. One can easily spend hours staring into the abyss, wondering if the results are real and what they mean. However, microarrays can be more than a high throughput and expensive screening tool. Many factors contribute to the success of expression profiling experiments and the yield of interpretable data, including the nature of the hypothesis or objective of the study, the microarray platform, the complexity of the tissue of interest, the experimental design, and the incorporation of the best available strategies for data analysis and interpretation of the biological themes. Although challenging due to the lack of extensive annotation or ontology classification for genes in livestock species, functional categories of coregulated genes and gene pathways can be determined, and hypotheses about common regulatory elements or the functional significance can be formulated. We have applied cDNA microarray technology to studies of follicular growth, oocyte quality, and the periovulatory period in cattle. Lessons learned from such experiments and a review of the available literature form the basis for the strategies described to facilitate successful application of microarray technology to studies of reproductive biology of livestock species.

Key Words: microarray • data analysis • false discovery rate • biological theme • cattle


    INTRODUCTION
 Top
 Abstract
 INTRODUCTION
 HYPOTHESIS AND RATIONALE
 CHOICE OF THE MICROARRAY...
 COMPLEXITY OF THE SAMPLES
 EXPERIMENTAL DESIGN AND DATA...
 SUMMARY
 LITERATURE CITED
 
The invited presentation at the 2006 Triennial Reproduction Symposium titled "Interpretation of microarray data: Trudging out of the abyss towards elucidation of biological significance" represented a compilation of insights into DNA microarray data analysis and strategies for interpretation of biological themes. Insights were gained, in part, during a recent series of published and unpublished gene expression profiling experiments conducted at (Patel et al., 2005Go) or in collaboration with (Evans et al., 2004Go; Corcoran et al., 2006Go; Mihm et al., 2006Go) colleagues at Michigan State University and abroad in the areas of folliculogenesis, ovulation and oocyte quality, and early embryonic development in cattle.

Key factors influencing the success of a microarray experiment include 1) formulation of an appropriate hypothesis and rationale for the experiments; 2) choice of the microarray platform (e.g., cDNA, oligonucleotide) based on the goals of the experiment and tissue/cell type and species of interest; 3) complexity of the samples of interest and use of whole tissue vs. purified cell types; 4) experimental design issues, including appropriate biological replication, the need for technical replication, and the appropriateness of pooling of samples; 5) control of the false discovery rate and avoidance of arbitrary data analysis procedures; and 6) use of gene classification procedures, pathway analysis, and other available tools to facilitate the interpretation of biological themes. The above factors are also addressed in detail in an excellent recent review by Allison et al. (2006)Go, which is highly recommended and served as a valuable source of information during the preparation of this paper.

For the purpose of this paper, we will use the term microarray to refer to DNA microarrays used as a high throughput platform for the quantification of differences in relative RNA transcript abundance for a very large number of genes simultaneously. The paper is also presented from the perspective of a reproductive biologist (G. W. Smith) who has utilized microarray technology successfully, so emphasis is on the lessons learned and the strategies found to be successful, not on the details of the statistical theory behind the approaches described.

The power of microarray technology is self-explanatory, and the term gene expression profiling is commonly used to describe it. However, as is true for other common technologies used for quantification of the abundance of RNA for genes of interest, one must exercise caution in directly attributing changes in RNA transcript abundance to differences in transcription or in directly inferring that such changes automatically result in differences in biological activity of the genes of interest.


    HYPOTHESIS AND RATIONALE
 Top
 Abstract
 INTRODUCTION
 HYPOTHESIS AND RATIONALE
 CHOICE OF THE MICROARRAY...
 COMPLEXITY OF THE SAMPLES
 EXPERIMENTAL DESIGN AND DATA...
 SUMMARY
 LITERATURE CITED
 
Microarrays can be more than a high throughput and expensive screening tool. Whereas microarray experiments often can be classified as discovery research, the chances of success are much greater when the appropriate hypotheses have been formulated and specific questions of interest have been clearly delineated. To address the questions of interest, the experiments must be designed appropriately in terms of sample collection and treatments. The nature of specific hypotheses can influence the experimental design and the strategies for microarray interrogation and data analysis. For relevant examples, please see the recent review by Rosa et al. (2005)Go.


    CHOICE OF THE MICROARRAY PLATFORM
 Top
 Abstract
 INTRODUCTION
 HYPOTHESIS AND RATIONALE
 CHOICE OF THE MICROARRAY...
 COMPLEXITY OF THE SAMPLES
 EXPERIMENTAL DESIGN AND DATA...
 SUMMARY
 LITERATURE CITED
 
The choice of microarray platform may depend on the species, model system utilized, or both; the tissue or cell type of interest; the nature of the question proposed; and economics. For example, traditional platforms may not be optimal for exon-level expression profiling to uncover alternative splicing events. The experiments mentioned above and conducted at Michigan State University have used a high density, bovine cDNA microarray (Suchyta et al., 2003Go) containing expressed sequence tags (EST) representing >15,000 genes as well as custom arrays (BOTL-4 and BOTL-5) containing EST from a bovine total leukocyte library and additional cDNA representing common growth factors, cytokines, and signaling molecules (Evans et al., 2004Go; Corcoran et al., 2006Go; Mihm et al., 2006Go).

In addition to the numerous cDNA microarrays available for livestock species, high-density oligonucleotide arrays for gene expression profiling of samples derived from cattle, pigs, and chickens also are now commercially available. For those farm species for which microarrays are less readily available (e.g., sheep), cross-species hybridization to existing arrays, particularly cDNA arrays, is possible but not ideal. Such approaches may yield a significant proportion of false negatives in situations where sequence similarity between a heterologous clone on the array and a corresponding transcript in the samples of interest is not sufficient to allow robust hybridization. Generally, the availability of microarrays for farm animal species is no longer a limiting factor in the execution of expression profiling experiments, but construction of custom arrays may still be warranted when the goal is examination of the regulation of rare or tissue- or cell-specific transcripts not likely to be highly represented on the existing arrays.


    COMPLEXITY OF THE SAMPLES
 Top
 Abstract
 INTRODUCTION
 HYPOTHESIS AND RATIONALE
 CHOICE OF THE MICROARRAY...
 COMPLEXITY OF THE SAMPLES
 EXPERIMENTAL DESIGN AND DATA...
 SUMMARY
 LITERATURE CITED
 
Another important consideration before the execution of microarray experiments is the complexity of the tissue of interest and whether to use RNA from whole tissue vs. individual, purified cell populations of interest. Whereas the costs of microarray experiments may affect the decision, the use of purified cell populations is highly recommended whenever possible. When whole tissue RNA is utilized in microarray experiments, changes in expression of the gene of interest may be diluted by the contribution of transcripts from other cell types and tissue components or may be masked by potentially divergent gene regulation within individual cell populations. When confronted with this issue, we have chosen, for example, to isolate and analyze thecal and granulosa cell RNA separately in microarray studies of bovine follicular development, rather than using RNA isolated from whole follicles (Evans et al., 2004Go; Mihm et al., 2006Go).


    EXPERIMENTAL DESIGN AND DATA ANALYSIS
 Top
 Abstract
 INTRODUCTION
 HYPOTHESIS AND RATIONALE
 CHOICE OF THE MICROARRAY...
 COMPLEXITY OF THE SAMPLES
 EXPERIMENTAL DESIGN AND DATA...
 SUMMARY
 LITERATURE CITED
 
Appropriate experimental design is the foundation of the success of any microarray experiment. Key issues associated with the design of microarray experiments have been reviewed previously (Rosa et al., 2005Go; Allison et al., 2006Go). The issue of what is appropriate biological replication remains one of the most troubling questions for researchers new to the technology. Unfortunately, decisions about biological replication are often driven by economics because of the significant cost of microarray experiments. However, the importance of biological replication to a successful microarray experiment is paramount. Allison et al. (2006)Go recommended a minimum sample size of 5 per group for simple designs in which 2 experimental groups are compared and identification of differential gene expression is the goal.

Questions also arise about the necessity of technical replication; i.e., running multiple arrays with the same samples. Given the current status of DNA microarray technology and the available platforms, technical replication is not a prerequisite for successful microarray experiments. Technical replication only provides an estimate of the variability in measurement, whereas biological replication (analysis of multiple, independent samples per treatment group on separate slides) in essence accounts for biological variation between samples and variation in measurement.

Another common question related to the design of microarray experiments is the potential benefits and risks associated with pooling of samples. For example, pooling of samples may be considered when sample costs are low relative to the cost of the microarray procedures. Whereas pooling of samples can reduce the variation between arrays, potential outliers may get masked or may compromise the entire pool. Pooling of samples has also been proposed when the starting material (RNA) from individual samples is limited. Although this is potential justification for the pooling of samples, there are alternatives.

We have validated linear amplification procedures for use in microarray experiments when the input RNA is limiting (Patel et al., 2004Go) and have applied such procedures in microarray studies using bovine oocytes (Patel et al., 2005Go). In such experiments, RNA from thousands of oocytes would have been required to conduct the experiment using standard microarray procedures. Pooling of samples can be advantageous under certain circumstances, but not at the expense of biological replication. Independent biological samples are always required, regardless of whether the samples were derived from pools or from individuals.

Avoidance of Arbitrary Data Analysis Procedures and Control of the False Discovery Rate
Size and complexity are major obstacles to overcome in the analysis of microarray data sets. For example, the output for a single replicate with the ~15,000-gene bovine cDNA microarray that we have used (Suchyta et al., 2003Go) contains more than 19,000 rows and 100 columns of data. This can be overwhelming. Thus, it is tempting for investigators to employ simple, arbitrary methods of microarray data analysis, such as calculation of the mean fold-change (differential expression) alone or the use of individual t-tests. Such approaches are not advisable (Allison et al., 2006Go). For example, use of the mean fold-change to select genes of interest will yield arbitrary results with no associated degree of confidence. Moreover, the use of independent, gene-specific t- (or ANOVA) tests may generate unreliable variance estimates, especially in situations with a limited number of data points for each gene. In such cases, small fold-changes may be called statistically significant by chance because of the adversely small estimates of variability among samples. Likewise, important differential expression among samples may be missed because of overestimated variances. To overcome this problem, significance testing approaches that combine information across genes, for example by using shrinkage estimation of variance components, are advised (for example, see Cui et al., 2005Go).

Another central issue with microarray data analysis is the multiple testing problem. As thousands of genes are tested in a single experiment, large numbers of false positives are expected even if there is no differential expression at all. For example, suppose that RNA isolated from a single bovine corpus luteum is divided into 2 aliquots, and each aliquot is subjected to microarray analysis using the previously described, bovine cDNA array with ~15,000 genes represented (Suchyta et al., 2003Go). Because the 2 RNA aliquots used for microarray analysis were derived from the same sample (i.e., self-self hybridization), in reality there should be no real differences in the gene expression detected. However, one would expect 0.05 x 15,000 = 750 false positives to be detected with a gene-wise type I error rate set at 0.05 in this experiment.

To control the number of false positives when performing multiple tests, the significance level adopted for each test should be more stringent than the desired overall significance level. Traditional multiple testing correction approaches that control for the experiment-wise significance level (i.e., the probability of a single false positive) are shown to be too conservative, decreasing the power of the experiment to undesirable levels, and consequently increasing the number of false negatives. For large-scale multiple testing situations, such as with microarray experiments, it is more important and sound to control the false discovery rate (FDR), defined to be the proportion of false positives among all significant tests (Benjamini and Hochberg, 1995Go; Storey and Tibshirani, 2003Go).

Freely available software for the analysis of microarray experiments, including significance tests using shrinkage-based procedures (such as LIMMA and R/ MAANOVA packages) and FDR approaches, can be found at the Bioconductor Web site (http://www.bioconductor.org/; last accessed Oct. 3, 2006).

Interpretation of Biological Themes
After successful design and execution of a microarray experiment, with appropriate biological replication and control of the FDR during statistical analysis, investigators are still left with the exciting, yet overwhelming task of elucidating the biological significance of the gene lists obtained and interpretation of the meaning of the results, where hundreds or even thousands of genes exhibiting differential expression are represented. Such gene lists represent a myriad of unorganized findings. It is tempting to spend months staring at a data set and performing individual PubMed searches to try to intuitively interpret biological themes de novo. It is also tempting to organize genes merely based on relative fold-change or degree of differential expression, but such approaches yield no useful information about gene function, and such approaches alone do not help dramatically in terms of interpretation of the findings.

However, a logical and systematic data analysis strategy to help delineate biological themes can relieve unnecessary anxiety and significantly reduce the amount of time spent trying to interpret microarray data. Such gene classification approaches are geared to reveal commonality in function that might not readily be interpreted solely from a laborious, manual, literature-based approach alone. We have used publicly available software (Dennis et al., 2003Go) to group genes based on commonality in function (gene ontology) and to determine the frequency with which genes are represented. Such an approach can, in itself, help reveal major biological themes within microarray data sets. Furthermore, we have successfully used a program named EASE (Hosack et al., 2003Go) to identify genes from microarray data sets that are overrepresented in a gene ontology category at a significantly greater frequency than would be expected based on the frequency in which genes of the given category are present on the array.

Pathway analysis represents another useful tool to facilitate interpretation of biological themes. We have obtained information on the representation of genes in microarray data sets within their respective biological pathways and potential gene interactions using the Kegg pathway database (Kanehisa and Goto, 2000Go). These gene classification approaches have revealed novel biological themes from our microarray data sets, leading to new hypotheses and investigations in model systems of interest that would not have otherwise been pursued (Patel et al., 2005Go) and have reduced the time-line from discovery/descriptive microarray studies to the formulation of specific hypotheses and the testing of gene function.


    SUMMARY
 Top
 Abstract
 INTRODUCTION
 HYPOTHESIS AND RATIONALE
 CHOICE OF THE MICROARRAY...
 COMPLEXITY OF THE SAMPLES
 EXPERIMENTAL DESIGN AND DATA...
 SUMMARY
 LITERATURE CITED
 
The widespread availability of platforms for performing DNA microarray experiments in livestock species has advanced the incorporation of such technology into studies of reproductive biology of farm animals into the realm of reality. Although the wet laboratory component of microarray experiments is not technically demanding, data analysis and interpretation can present significant obstacles because of the sheer volume of data generated and the accompanying statistical challenges (e.g., multiple testing). However, a logical and systematic approach grounded in sound experimental design with sufficient biological replication, appropriate statistical analysis incorporating control of the FDR, and the utilization of available tools to facilitate interpretation of biological themes can relieve the enormousness of such experiments and facilitate the generation of new, biologically significant data relevant to an individual’s model system of interest. Microarrays now represent a viable platform for discovery in studies of the reproductive biology of farm animals.


    Footnotes
 
1 Presented at the ADSA-ASAS Joint Annual Meeting, Triennial Reproduction Symposium: Molecular Techniques and Statistics, Minneapolis, MN, July 2006. Back

2 Corresponding author: smithge7{at}msu.edu

Received for publication July 18, 2006. Accepted for publication September 17, 2006.


    LITERATURE CITED
 Top
 Abstract
 INTRODUCTION
 HYPOTHESIS AND RATIONALE
 CHOICE OF THE MICROARRAY...
 COMPLEXITY OF THE SAMPLES
 EXPERIMENTAL DESIGN AND DATA...
 SUMMARY
 LITERATURE CITED
 


Allison, D. B., X. Cui, G. P. Page, and M. Sabripour. 2006. Microarray data analysis: From disarray to consolidation and consensus. Nat. Rev. Genet. 7:55–65.[CrossRef][Medline]

Benjamini, Y., and Y. Hochberg. 1995. Controlling the false discovery rate—A practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B. Stat. Meth. 57:289–300.

Corcoran, D., T. Fair, S. Park, D. Rizos, O. V. Patel, G. W. Smith, P. M. Coussens, J. J. Ireland, M. P. Boland, A. C. Evans, and P. Lonergan. 2006. Suppressed expression of genes involved in transcription and translation in in vitro compared with in vivo cultured bovine embryos. Reproduction 131:651–660.[Abstract/Free Full Text]

Cui, X., J. T. Hwang, J. Qiu, N. J. Blades, and G. A. Churchill. 2005. Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 6:59–75.[Abstract]

Dennis, G., Jr., B. T. Sherman, D. A. Hosack, J. Yang, W. Gao, H. C. Lane, and R. A. Lempicki. 2003. David: Database for annotation, visualization, and integrated discovery. Genome Biol. 4:R60.[CrossRef]

Evans, A. C., J. L. Ireland, M. E. Winn, P. Lonergan, G. W. Smith, P. M. Coussens, and J. J. Ireland. 2004. Identification of genes involved in apoptosis and dominant follicle development during follicular waves in cattle. Biol. Reprod. 70:1475–1484.[Abstract/Free Full Text]

Hosack, D. A., G. Dennis, Jr., B. T. Sherman, H. C. Lane, and R. A. Lempicki. 2003. Identifying biological themes within lists of genes with ease. Genome Biol. 4:R70.[CrossRef][Medline]

Kanehisa, M., and S. Goto. 2000. Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28:27–30.[Abstract/Free Full Text]

Mihm, M., P. J. Baker, J. L. Ireland, G. W. Smith, P. M. Coussens, A. C. Evans, and J. J. Ireland. 2006. Molecular evidence that growth of dominant follicles involves a reduction in follicle-stimulating hormone dependence and an increase in luteinizing hormone dependence in cattle. Biol. Reprod. 74:1051–1059.[Abstract/Free Full Text]

Patel, O. V., A. Bettegowda, J. J. Ireland, P. M. Coussens, P. Lonergan, and G. W. Smith. 2005. Functional genomics studies of oocyte and early embryonic development: Potential association of follistatin transcript abundance with oocyte competence. Biol. Reprod. (Special issue):145.

Patel, O. V., A. Bettegowda, J. Yao, S. Suchyta, J. J. Ireland, P. M. Coussens, and G. W. Smith. 2004. Validation and application of linear amplification procedures for study of oocyte genomics: Gene expression profiling of bovine oocyte maturation. Biol. Reprod. 70(Special issue):161.

Rosa, G. J., J. P. Steibel, and R. J. Tempelman. 2005. Reassessing design and analysis of two colour microarray experiments using mixed effects models. Comp. Funct. Genomics 6:123–131.

Storey, J. D., and R. Tibshirani. 2003. Statistical significance for genome wide studies. Proc. Natl. Acad. Sci. USA 100:9440–9445.[Abstract/Free Full Text]

Suchyta, S. P., S. Sipkovsky, R. Kruska, A. Jeffers, A. McNulty, M. J. Coussens, R. J. Tempelman, R. G. Halgren, P. M. Saama, D. E. Bauman, Y. R. Boisclair, J. L. Burton, R. J. Collier, E. J. DePeters, T. A. Ferris, M. C. Lucy, M. A. McGuire, J. F. Medrano, T. R. Overton, T. P. Smith, G. W. Smith, T. S. Sonstegard, J. N. Spain, D. E. Spiers, J. Yao, and P. M. Coussens. 2003. Development and testing of a high-density cDNA microarray resource for cattle. Physiol. Genomics 15:158–164.[Abstract/Free Full Text]



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Smith, G. W.
Right arrow Articles by Rosa, G. J. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Smith, G. W.
Right arrow Articles by Rosa, G. J. M.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS