|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ANIMAL GENETICS |

* Division of Animal Sciences, University of Missouri, Columbia 65211;
Pathology and Anatomical Sciences, University of Missouri, Columbia 65212
| Abstract |
|---|
|
|
|---|
Key Words: quantitative trait loci expression quantitative trait loci whole genome selection RNA interference epigenetics marker assisted selection
| INTRODUCTION |
|---|
|
|
|---|
Strategies for the Identification of Quantitative Trait Nucleotides in Sequenced Species
The identification of genes as QTL and mutations (quantitative trait nucleotides, QTN) underlying economically important traits in livestock species has been hampered by several factors, including 1) the use of inappropriate and inadequate mapping population designs, 2) the inability to rapidly fine-map genomic regions harboring QTL, and 3) the inability to detect and implicate mutations within genomic regions harboring QTL as plausible candidates for QTN. In this section we shall discuss the consequences of these limitations, the impending technologies that will soon provide their remediation, and finally, we propose a strategy for the simultaneous identification of QTN that underlie variation in all phenotypes analyzed in a mapping study. Without loss of generality and for the sake of space we constrain discussion in this section to QTL mapping in beef cattle.
Many of the beef cattle QTL mapping populations assembled in the early 1990s were either half-sib families or backcross and F2 designs based upon Bos taurus x Bos indicus F1 (Stone et al., 1999
; Kim et al., 2003
). The logic supporting this cross argued that the genetic and phenotypic divergence for meat quality traits between the subspecies was the greatest among the commercially relevant cattle breeds and that the cross would maximize the probability of detecting QTL of large effect, should they exist. Usually these experiments utilized very small numbers of parents and a small number of progeny (200 to 500), which has been characteristic of all QTL mapping studies in livestock to date, regardless of species. The reason for the small numbers of progeny is frequently because microsatellite markers have been used for mapping. Although micro-satellites are widely available, highly polymorphic, abundant, and relatively easy to score, they are not suited to automation and DNA samples must be repetitively processed at considerable cost to produce even low density linkage maps. To ensure family sizes with at least a modest power to detect QTL, few parents were used and this limits the number of segregating QTL that are present within the experiment. If there were 50 QTL influencing a trait within a population (Hayes and Goddard, 2001
) with an average heterozygosity (say) of 20% and the average power for detection of these QTL was 40%, we might expect to detect 4 QTL within a single half-sib family. We might also expect that different QTL studies would tend to detect the same high minor allele frequency QTL, but different subsets of the 50 QTL with low minor allele frequency depending on which happened to be heterozygous in the sampled parents. This number of nonoverlapping QTL detected per study is close to historical reality (http://bovineqtl.tamu.edu; http://www.animalgenome.org/QTLdb/; last accessed April 25, 2007). Finally, whereas the use of B. indicus x B. taurus parents did allow the detection of QTL, it has hindered the identification of the underlying QTN. The divergence between B. indicus and B. taurus is about 500,000 yr (Miretti et al., 2002
), and consequently, mutations with fixed allelic differences have accumulated about every 2 kb within these genomes (Taylor et al., 2006
). Consequently, within any QTL critical region (QTLCR; the smallest genomic region believed to harbor a QTL) there are thousands of mutations that are consistent with a B. indicus vs. B. taurus QTN, and it is statistically impossible to differentiate between these within the experimental design. The same phenomenon occurs when mapping is performed within a single half-sib family in which all heterozygous loci within a QTLCR are candidates for the QTN and none can be statistically eliminated due to their complete confounding within the experimental design.
The traditional approach for the positional cloning of QTL has involved a low resolution, whole genome (WG) microsatellite QTL scan followed by fine-mapping to minimize the size of each individual QTLCR and then the identification and testing of candidate mutations within each QTLCR to find those that are most strongly associated with the QTL. This approach is complicated by the fact that each QTLCR must be tackled independently for both the fine-mapping and mutation detection steps, and until recently WGS for livestock species have not been available to assist in this process. However, the recent production of livestock species WGS has removed several major impediments. Because millions of putative SNP (many are actually sequencing errors) are generated and mapped to genome coordinates in the production of a WGS, these may be used to fine-map individual regions and the sequence within each QTLCR can be used to develop primers for resequencing within the mapping population parents for candidate mutation discovery. Of far greater significance, the genotyping of SNP can be automated and massively parallelized and several groups are in the process of developing WG SNP panels on the Illumina Infinium and Affymetrix platforms that will allow the simultaneous genotyping of tens to hundreds of thousands of SNP (unpublished results). Clearly, with the strategic selection of SNP according to their genomic location, these technologies will allow the combining of the QTL scan and fine-mapping steps of a QTL project. Furthermore, because the cost per sample will be much less than for performing a microsatellite-based QTL scan, limitations on the number of progeny that are genotyped will also be alleviated.
A consortium to which we belong has developed a bovine Illumina Infinium assay (BovineSNP50) with probes for 58,366 SNP (unpublished results; http://investor.illumina.com/phoenix.zhtml?c=121127&p=irol-newsArticleamp;ID=899024&highlight=). The SNP in this assay were selected on several criteria, but a major objective was to produce an interSNP interval of as close to 60 kb as possible. Because linkage disequilibrium (LD) in cattle disappears at a range of about 500 kb (McKay et al., 2007
), this assay will allow both linkage and association mapping and should allow the resolution of QTLCR to 1 to 2 Mb (unpublished results). Despite the enormous improvement over the size of QTLCR produced by linkage mapping, performing megabase-scale sequencing to identify candidate mutations remains a daunting task. However, the second generation sequencing technologies currently available from 454 Life Sciences, Illumina, and Applied Biosystems may provide solutions to this problem. The long-range PCR of 1 to 2 Mb genomic regions may be feasible and QTLCR amplicon libraries prepared from mapping population parents could be sequenced to simultaneously identify all of the mutations within the region. Alternatively, resequencing the entire genomes of these parents could simultaneously identify all of the mutations within all of the QTLCR, potentially allowing the identification of all economically important QTN within a livestock species. Theoretically, the generation of a 6x genome sequence from an individual is currently possible for less than $100,000 (E. Mardis, Washington University, St. Louis, MO; personal communication); however, the ability to assemble WG random shotgun reads from these short-read technologies is unproven, and because they have greater intrinsic error rates than conventional Sanger sequencing, the suitability of these technologies for this mutation detection strategy is not known.
We have devised a strategy for the positional cloning of QTL in beef cattle which capitalizes on the technologies made available by access to a WGS. The elements of this strategy are: 1) assemble a large number of paternal half-sib families across breeds of beef cattle with at least 50 progeny per family and which have been similarly phenotyped for traits of interest; 2) genotype these animals with the BovineSNP50 Illumina assay and conduct joint linkage and LD analyses to simultaneously identify the most likely genomic locations of QTL, their QTLCR, and the segregation status of each of the sires for every QTL; 3) sequence the whole genomes of a subset of the sires to identify the mutations within each of the QTLCR that are concordant with the QTL segregation status of the sequenced sires; and 4) genotype all of the candidate mutations detected in the sequencing analysis in all of the sires to identify those that are concordant with QTL segregation status in all of the sires of these families. Of course, if whole genome shotgun resequencing proves feasible and becomes inexpensive, all of the sires could be sequenced in step 3 and step 4 would become unnecessary. Because the families that primarily will be available for the implementation of this strategy will be paternally derived, it is not suitable for X-linked QTL. Despite the fact that the first bovine X chromosome linkage map was produced more than 10 yr ago (Yeh et al., 1996
) sex-linked QTL are under-researched and under-represented within the literature. One of the few exceptions is Sandor et al. (2006)
.
There are several important concepts embodied within this strategy that lead to some important conclusions. First, there could be 1,000 QTL (20 traits each with 50 QTL) underlying the economically important traits of cattle (Hayes and Goddard, 2001
; Rothschild et al., 2007
), and therefore, strategies which address the simultaneous identification of these loci are needed. Second, because the single gene tests that are currently being commercialized in the cattle industry explain relatively little of the variation within a trait (Van Eenennaam et al., 2007
), they probably possess little economic value to producers. Third, whereas independent validation likely will be required for commercialization, we must avoid the need for functional confirmation of each and every candidate mutation to validate its identity as a QTN while being fairly certain that we have correctly identified the causal QTN. We have recently shown (Schnabel et al., 2005
) that support for a candidate polymorphism as a QTN can be assessed statistically from the number of sires that have concordant QTL and candidate polymorphism genotypes. Notably, we were unable to produce strong statistical support for a mutation in OPN for a milk protein percent QTN in North American Holsteins because the putatively causal OPN allele primarily resides on the same haplotype as the ABCG2 allele suggested by Cohen-Zinder et al. (2005)
. In fact, when we screened 144 Holstein sires the strength of LD between the 2 candidate QTN (r2 = 0.52) precluded us from identifying families that segregated for one of these mutations but not the other, and we were unable to independently test these mutations for concordance of genotype with QTL segregation status (unpublished data). For this reason, we now believe that families representing several breeds should be used within the mapping population. Because cattle breeds are diverged by no more than the 10,000 yr which have followed their domestication, we expect the vast majority of QTL in cattle to be segregating in all breeds; however, allele frequencies and haplotypes that segregate within any genomic region should differ because of the bottleneck events associated with breed formation and the subsequent effects of selection and drift. Thus, the final determination of whether it is the OPN or ABCG2 mutation, or perhaps neither of these (de Koning, 2006
), that underlies the protein percent QTL in dairy cattle may need to be resolved in Guernsey, Jersey, or Norwegian Red cattle. Fourth, by sampling a large number of half-sib families we maximize the likelihood that at least one family will be segregating for each QTL that underlies a trait; hence, we capture all of the genetic diversity that is present within cattle. However, the issue of family size is not trivial because for this strategy to succeed, care must be taken to avoid false negatives because a sire that is heterozygous but that is not detected to segregate for a QTL (a type II error) will produce a discordance between segregation status and QTN genotype, which may cause rejection of the causal QTN. Fifth, the use of a large number of half-sib families also allows the differentiation of pleiotropic and closely linked QTL. For a QTL to be pleiotropic, all families that segregate for a QTL that influences one trait must also segregate at the same genomic location for a QTL influencing the second trait. However, for closely linked QTL, it should be possible to identify families that segregate for only one of the QTL. Of course, the ability to distinguish between pleiotropic and closely linked QTL is influenced by family size and power for QTL detection. Finally, because so many QTL appear to affect the large number of recorded traits, any randomly selected marker has a strong likelihood of being associated with at least 1 trait. Therefore, candidate gene analyses, which are unable to localize polymorphisms relative to the detected QTL effects (using flanking markers and linkage or LD analysis) have little intrinsic value and should be strongly discouraged (Morsci et al., 2006
).
Applications to Genetic Improvement and Improving Animal Management
The ability to predict the total genetic merit of livestock using molecular or biochemical markers would allow the opportunity to completely redesign animal breeding and management programs. For example, the need to progeny test dairy bulls for the milk production of their daughters or beef bulls for the meat tenderness of their steer progeny will evanesce and the ability to manage groups of feedlot steers according to their most profitable market opportunity will materialize. However, this can only be accomplished by technologies that are able to explain economically significant amounts of the (primarily additive) genetic variation that underlies each trait. Whereas several swine breeding companies have developed proprietary marker assisted selection programs (which must generate value or they would not persist), there are no published data describing these programs and the majority of publications on marker assisted selection in livestock are theoretical (e.g., Dekkers, 2004
). There appear to be several reasons for this, and the 2 most significant appear to be that insufficient markers have been identified to explain more than a few percent of the variation in any livestock trait (e.g., Van Eenennaam et al., 2007
), and few systematic approaches have been made toward the integration of molecular data into the genetic evaluation systems of any livestock species (Dekkers, 2004
; Druet et al., 2006
).
Kadarmideen et al. (2006)
have suggested the use of transcript profiles, whereas Meuwissen et al. (2001)
have suggested a marker-based approach, which has come to be known as whole genome selection (WGSL) for the prediction of genetic merit in the absence of phenotypes. In WGSL, animals with phenotypes or predicted additive genetic merits are genotyped at high-density with 30,000 or more SNP that are evenly distributed throughout the genome. Either the SNP themselves, or (say) 1,000 chromosomal regions containing haplotypes formed from
30 SNP are analyzed as independent random effects under a mixed linear model to simultaneously determine the genomic regions that contribute to phenotype or additive genetic merit and predict the additive values of each of the haplotypes within each region. From these predicted haplotype values, the phenotype or genetic merit of an animal can be predicted based solely upon its genotype. Meuwissen et al. (2001)
assumed equal variances associated with each chromosomal segment and independence between regions to avoid the problem of estimating a large number of variance components for regions and to make the approach statistically tractable. However, these assumptions are clearly violated by the existence of long range LD (McKay et al., 2007
) and because those regions closest to QTL will contribute much more variance to the trait than the rest. Of the 2 approaches, WGSL appears to have the greatest utility because the cost and logistical difficulties of producing gene expression profiles from appropriate tissues are far greater than are producing high-density SNP genotype profiles. Several groups have performed preliminary WGSL analyses in dairy cattle using an Affymetrix 10,000 SNP assay, and although these studies are not yet published, the results must have been promising because these groups and others have shown considerable interest in the Illumina BovineSNP50 Infinium assay that will be available late in 2007. In our opinion, WGSL using high-density SNP genotyping platforms is the most promising application of molecular genetics in livestock populations since work began almost 20 yr ago. If it can be demonstrated that additive genetic merits can be precisely predicted from molecular breeding values, the design of breeding programs will rapidly evolve and rates of genetic improvement will increase.
The statistical methodology that underlies WGSL is a novel and interesting extension of association analysis in which the association between phenotype and genotype is accomplished for all regions of the genome simultaneously rather than 1 locus at a time. This approach has obvious advantages for the identification of colin-earity among markers when a large number of markers are scored in relatively few individuals and should increase the power for detecting associations just as composite interval mapping improves the power for detecting QTL over simple interval mapping. Whereas the approach should have great utility for the dissection of complex human diseases, it has not yet been recognized by the human mapping community. However, by increasing the rate of response to selection, WGSL also has the potential to dramatically increase the rate of inbreeding and loss of diversity among breeds of livestock, which appears to be becoming precarious in some species (Mc Parland et al., 2007
). One might argue that this could be averted by constrained selection schemes in which individuals are selected to maximize response on QTL regions while maximizing diversity in nonQTL regions. However, if there are hundreds, or thousands, of QTL regions within the genome of a species, the effects of linkage almost precludes the existence of nonQTL regions. Finally, applications of WGSL must be developed to consider the chromosomal architectures of QTL within a population. If we consider 2 closely linked QTL that affect 2 different traits, individuals that are heterozygous for both QTL will have the same contributions to their molecular breeding values for each of the traits; however, the contribution to the breeding objective will differ for individuals that have alleles in coupling (desirable alleles on 1 chromosome and undesirable alleles on the other) rather than repulsion phase because more progeny will be available for selection that inherited both of the desirable alleles. Therefore, the development and application of WGSL strategies will most benefit the livestock industries that use them within the context of selection indices rather than for the estimation of single trait breeding values.
Characterization of Expression QTL
The central paradigm of molecular biology states that DNA is transcribed into RNA and RNA is translated into the proteins that affect cellular function and ultimately determine phenotype. This simplistic model allows extension to numerous layers of complexity including variation in coding and regulatory DNA sequence, alternate splicing of RNA to mRNA, the regulation by small RNA of translation through transcript degradation and the posttranslational modification of proteins en route to the determination of phenotype. The extent to which these layers of genomic complexity underlie a phenotype might be measured by the narrow sense heritability. Traits with high heritabilities, such as growth, may have relatively few major pathways through the complex network of regulatory processes that connect genotype to phenotype, whereas traits with low heritability such as fertility may have many environmentally sensitive pathways, all of small effect. We already know from QTL mapping studies that genes of large effect underlie variation in most phenotypes, even for traits with low heritability (Cobanoglu et al., 2005
). The key question therefore is to what extent are QTL explained by cis-acting mutations that regulate the expression level of nearby genes vs. mutations, which act by alternative means such as charged amino acid substitutions, which may affect protein structure.
All of the methods used to examine the expression levels of genes possess an inherent measurement error. Furthermore, the level of expression of individual genes is dependent on both the environmental circumstances of the individual (tissue or cell line) and the character state of other genes within the genome. Therefore, measures of gene expression can be considered to be quantitative traits, much as we think about production traits in livestock agriculture. Just as there is a heritable component to weaning weight, the transcription level of an individual gene has a heritable component (Gibson and Weir, 2005
), which may be population- and environment-specific. Consequently, the mapping populations that are used to dissect the genetic architecture of production traits in livestock into additive and nonadditive QTL effects can also be used to dissect the cis- and trans-acting QTL that influence the expression of genes (collectively called expression QTL; eQTL) within specific tissues harvested from members of the mapping population (Doss et al., 2005
). Provided the appropriate tissues are profiled at the appropriate stages of development, eQTL mapping can be used both to identify candidate genes for production trait QTL and to elucidate the gene regulatory networks and the key regulators of these networks that lead to variation in phenotypic traits. Microarrays are currently the most comprehensive tools for high-throughput, WG assessment of gene expression and are just beginning to become widely available for transcription profiling in livestock (Tsai et al., 2006
; Misirlioglu et al., 2006
), but to this point have not been used for eQTL detection.
Jansen and Nap (2001)
were the first to propose the merging of genetic mapping and gene expression data and coined the term genetical genomics to describe the approach. Whereas QTL mapping is a powerful method for identifying the genomic regions which harbor loci that cause variation within a quantitative trait, the approach does not readily facilitate the identification of the causal gene or mutation. However, knowledge of the genes within a QTLCR for which expression is correlated with the trait phenotype can significantly reduce the candidates to be screened for potentially causal mutations (Walker et al., 2004
). Wayne and Mc-Intyre (2002)
combined QTL mapping and microarray expression data to identify 34 candidate genes for ovariole number in Drosophila melanogaster. In the summer of 2006, Animal Genetics published the proceedings of the Iowa State University "Integration of Structural and Functional Genomics" symposium as a special issue, and 3 of the featured articles focused on the genetical genomics approach (Tuggle et al., 2006
). Likewise, Mammalian Genome has recently dedicated an entire issue to eQTL research (Churchill, 2006
).
Due to the availability of inbred lines and high-throughput genotyping and expression analysis platforms, mouse, and rat have been among the first to successfully employ eQTL mapping (Walker et al., 2004
; Doss et al., 2005
; Schadt et al., 2005
, Drake et al., 2006
; Ghazalpour et al., 2006
; Petretto et al., 2006
). Ghazalpour et al. (2006)
used liver expression profiles and a marker map scored in an F2 mouse intercross to construct modules of coregulated genes that were associated with 22 physiological traits. By routine mapping approaches, they were able to identify chromosomal locations harboring module QTL that perturbed gene expression within these modules. Next, by using the coexpression relationships among genes within a BW-related module, they were able to identify the network elements that determine the relationship between gene expression profiles and BW. This approach appears to offer great power for the elucidation of the key genes that modulate phenotypic variation in economically important traits that are peculiar to livestock species such as milk yield and meat tenderness. Industry populations such as large half-sib families with phenotypic data and DNA samples are relatively abundant within (e.g., the beef, dairy, and swine industries) and make livestock populations highly suitable for eQTL analysis (Haley and de Koning, 2006
). However, it would be extremely costly to procure multiple tissue samples for gene expression profiling from significant numbers of individuals in these populations. Consequently, eQTL studies have yet to be conducted in livestock, and the approach is unlikely to be as widely adopted as QTL mapping. However, we believe that the data produced by genetical genomics approaches are those that have been critically lacking to guide livestock transgenic research. Without knowledge of the identity of the key regulatory genes, it has not yet been possible to accurately predict target genes or engineer constructs that are likely to predictably alter the expression of production traits of livestock.
Other significant limitations to the application of genetical genomics are those that are inherent to all microarray experiments. Because mapping experiments require a relatively large number of individuals to provide a sufficient power to detect QTL or eQTL (de Koning et al., 2005
; Peirce et al., 2006
), the cost, logistical, and technical difficulties of conducting hundreds of microarray hybridizations possibly involving several tissue types at several developmental stages are substantial. However, this might be alleviated if a selective expression profiling strategy, analogous to the selective genotyping approach of Darvasi and Soller (1994)
, is utilized and only those individuals with extreme phenotypes for a few traits are profiled. Of course, this approach would limit the ability of the experiment to detect pleiotropic, trans-acting master regulator genes, which influence the expression of genes in multiple networks that underlie different traits (Walsh and Henderson, 2004
; Kendziorski and Wang, 2006
). Furthermore, the choice of tissue and developmental stage at which to profile gene expression to elucidate the regulation of genes that lead to specific quantitative phenotypes is generally not well understood (Doerge, 2002
). Finally, processes such as the cellular localization of mRNA, efficiency of translation, and posttranslational modifications all attenuate the ability to explain the relationships between gene expression and phenotype (Pomp et al., 2004
).
Understanding the Roles of Epigenetic Effects
"Epigenetic modifications at the DNA, nucleosomal and chromosomal level affect gene expression and ultimately phenotypes, by providing differential access to the underlying genetic information" (Richards, 2006
). Epigenetic modifications are potentially reversible and do not alter the underlying DNA sequence, but rather exert an influence on phenotypes by permitting differential access of the transcriptional machinery to the DNA by altering chromatin structure. Epigenetic mechanisms such as DNA and histone modifications influence numerous processes including parental imprinting, gene silencing, X-inactivation, and embryonic reprogramming. To make genetic progress in livestock, we must consider how epigenetic mechanisms influence economically important phenotypes. Breeding strategies that make use of imprinting information, diet adjustments that influence methylation patterns in offspring, and the elucidation of DNA and histone modification patterns to improve cloning rates could all impact livestock improvement and management.
The DNA can be methylated at the 5-position of cytosine residues in CpG dinucleotides to convert cytosine to 5-methylcytosine via the action of enzymes known as DNA methyltransferases (Dnmt). This epigenetic event occurs globally in the normal genome and affects 70 to 80% of all CpG dinucleotides in human cells. These dinucleotides are not uniformly distributed in the genome, but occur in clusters such as large repetitive sequences or in short CG-rich DNA stretches known as CpG islands (CGI). The majority of affected CpG dinucleotides are found in intragenic regions that harbor repetitive sequences such as satellite sequences and centromeric repeats while CGI, which are preferentially found in the promoter regions of genes, appear to be protected from this modification in somatic cells. Exceptions to this rule include CGI located on the inactive X chromosome in females and those associated with imprinted genes (where only the paternally or maternally inherited allele is expressed) that are methylated in the normal state. It is common for methylation status to vary among regions of the promoter CGI, and gene silencing has been attributed to both the density of CpG methylation (proportion of methylated CpG dinucleotides within a CGI) and with site-specific CpG methylation (methylation of specific CpG dinucleotides). Histones create a compact conformation with the methylated region that renders the DNA inaccessible to the transcriptional machinery. However, when these regions are unmethylated, the chromatin structure is relaxed and the gene can be expressed (Lewin, 1997
).
Sequencing of bisulfite-treated DNA, which converts unmethylated cytosines to uracils, has been the gold standard for the determination of DNA methylation; however, the process is low-throughput, laborious, and expensive. Recent advances in second generation sequencing and microarray platforms have produced new high-throughput technologies for the WG analysis of cytosine methylation patterns. Promoter, CGI, or tiling arrays can detect target methylated sequences prepared for hybridization to the array by a number of procedures. Differential methylation hybridization uses methylation-sensitive endonucleases to enrich for the methylated fraction of DNA but is limited by the fact that restriction sites are not present in all genomic regions of interest (Huang et al., 1999
). Methylated DNA immunoprecipitation captures the methylated fraction of DNA fragments produced by sonication using a monoclonal antibody to 5-methylcytosine (Weber et al., 2005
) to produce the target population. Alternatively, the second generation sequencing strategy relies upon the massively parallel sequencing of bisulfite-treated DNA to identify known template cytosines, which are converted to uracils when unmethylated. Because bisulfite treatment causes considerable DNA damage, the short read-lengths of the 454 Life Sciences FLX system appear ideal for this purpose, and the utility of the approach has recently been demonstrated (Taylor et al., 2007a
).
A recent low resolution whole-genome differential methylation hybridization survey of aberrant methylation in the bone marrow samples of patients with acute lymphoblastic leukemia (ALL) has revealed that aberrant methylation is not random in ALL and that methylation hot-spots exist on human chromosomes 11, 18, and 19 (Taylor et al., 2007b
). Several mechanisms may underlie the nonrandom genomic distribution of aberrant methylation in ALL. A genetic event responsible for the activation of an oncogene could trigger the initiation of tumor development that leads to the disregulation of the DNA methylation machinery. Alternatively, some sequences may be unmethylatable and be protected by the presence of DNA binding factors that prevent the methylation machinery from gaining access, or conversely, methylatable sequences may contain specific DNA binding sequence motifs that aid in the recruitment of DNA methyltransferases. Finally, certain regions may be more prone to aberrant methylation due to their cellular localization within the nucleus at key developmental stages. Whatever the cause(s), the distribution of and variability of whole genome methylation and its effects on gene transcription in livestock species are completely unknown and are likely to be fertile ground for future investigation.
Genomic imprinting involves differential allele DNA methylation patterns in sex cell lineages (Kierszenbaum, 2002
) and results in a gene being expressed only when inherited from the sire (maternal imprinting), or alternatively, from the dam (paternal imprinting). Although the oocyte and sperm contribute allelic DNA sequences to a developing zygote, epigenetic imprinting prevents one of the parental alleles from being expressed. A germ cells genomic methylation pattern depends on Dnmt activity (Kierszenbaum, 2002
). In the developing zygote, the paternal genome is actively de-methylated soon after fertilization; in mice, this occurs within hours. The maternal genome is passively de-methylated through a replication-dependent method that is completed by the 8 to 16 cell stage. By the time the bovine or mouse embryo reaches the morula stage, the genome has been remethylated and, in a sense, reprogrammed (Reik et al., 2003
). Dean et al. (2001)
showed that this methylation reprogramming in the developing embryo is conserved in eutherian mammals, specifically the mouse, rat, cow, and pig.
Imprinting of the insulin-like growth factor 2 gene (IGF2) has been reported in sheep (McLaren and Montgomery, 1999
), swine (Nezer et al., 1999
), and cattle (Dindot et al., 2004
). Whereas IGF2 is maternally imprinted, the insulin-like growth factor 2 receptor gene (IGF2R) is paternally imprinted. Wutz et al. (1997)
discovered that the methylation pattern of IGF2R differs according to the parental origin of the allele. In the maternally inherited allele, the promoter region is unmethylated, allowing transcription, whereas CpG islands downstream in an antisense promoter are methylated. In the paternally inherited allele, transcription is inactivated due to promoter methylation, whereas the downstream antisense CpG islands are unmethylated. Transcription of the paternal antisense RNA further serves to repress IGF2R transcription (Wutz et al., 1997
). The callipyge phenotype in sheep, which is characterized by large, heavily muscled buttocks with very little fat, is also due to imprinting. However, the imprinted genomic region that harbors the callipyge locus (CLPG) also harbors a large number of maternally expressed microRNA (miRNA) genes (Davis et al., 2005
). Callipyge is an inherited muscular hypertrophy that affects only heterozygous individuals that inherit the CLPG allele from their sire (Georges et al., 2003
). This novel mode of inheritance is called polar overdominance, and this mode of inheritance has been shown to operate within the orthologous region of the swine genome for a DLK1 polymorphism that is associated with variation in growth and fatness (Kim et al., 2004
).
The natural or aberrant methylation of CpG dinucleotides by Dnmt may be encoded in the underlying sequence. However, the extent of DNA methylation is also influenced by the environment. Wolff et al. (1998)
showed that feeding a methionine-supplemented diet to agouti viable yellow (Avy) mice influenced their progenys coat color. Methylation occurring upstream of the Agouti gene where an intracisternal A particle is located leads to ectopic expression of Agouti, resulting in the normal Agouti color pattern. When fed a diet low in methionine, the intracisternal A particle is unmethylated and abnormal coloring results (Reik et al., 2003
). Wolff et al. (1998)
demonstrated that providing a high methionine diet to female mice before and during pregnancy skewed the coat color of their Avy/a offspring toward the pseudo-agouti phenotype. Waterland (2006)
identified the preimplantation embryonic development period as the interval when the Avy locus is sensitive to environmentally induced methylation. Thus, dietary methionine supplementation during pregnancy has the potential to induce physiologically relevant changes in expression of epigenetically targeted genes via CpG methylation.
Histone modification influences transcription by altering chromatin conformation to determine the accessibility of genes to the transcription machinery. The 5 major classes of histones; H1, H2A, H2B, H3, and H4 all have a large proportion of positively charged residues, which interact with the negatively charged phosphate backbone of DNA (Voet et al., 1999
) to package DNA into chromatin. Histones can be posttranslationally modified through methylation, acetylation, phosphorylation, or ubiquitination to alter their secondary structure (Berger, 2002
). The modification of specific residues decreases their positive charge, which alters the interaction of the histone with DNA. Acetylated histones are generally associated with an active or open chromatin conformation and expressed genes, whereas methylated histones are associated with condensed or closed chromatin and transcriptional repression (Reik et al., 2003
). Methylated histones are also usually deacetylated, resulting in a tight binding of the DNA and ionic and stearic hindrance to transcription factors (Bestor, 1998
).
Histone and DNA epigenetic patterns can affect animal cloning experiments. Suteevun et al. (2006)
found that without properly demethylated and acetylated residues, the efficiency of cloning in swamp buffalo was decreased. Santos et al. (2003)
compared the methylation patterns of bovine embryos that were produced by nuclear transfer (NT) from fetal fibroblast or granulosa cells to those of normal embryos. A significant number of the granulosa-derived NT embryos were found to be methylated at the lysine 9 residue of H3, whereas DNA methylation patterns were similar to those of the normal embryos. Significantly more granulosa-derived NT embryos survived to blastocyst than did the fetal fibroblast-derived NT embryos. This suggests that the epigenetic patterning of an NT or cloned embryo is necessary for, and indicative of, its viability.
Structures consisting of DNA wrapped around an octamer of histones are called nucleosomes and are fundamental units of chromatin structure. Recently, Segal et al. (2006)
has shown that the positioning and stability of nucleosomes could be critical for gene regulation and other chromosome functions. Genes that are located in the regions of DNA between nucleosomes are expressed. When nucleosome positioning is not stable, the specific 146 bp of DNA organized on the 8 histone nucleosome complex can be shifted along the chromosome to allow the transcription of a previously inaccessible gene. The nucleosome code, which determines the genomic location for the formation of nucleosomes, may play an important role in determining which genes are expressed in which tissues and at which times.
As researchers begin to understand the complexity of epigenetic controls and their effects on phenotype, these data need to be presented to producers in a meaningful way. The parental imprinting of genes is generationally stable, and mutations within imprinted genes that influence phenotype could quite readily be incorporated into estimated breeding values, according to the gender of the individual. This approach would allow the optimization of the breeding of progeny for production purposes. However, it is less clear that this approach would be optimum for the selection of parents to breed sires or replacement females because the genotypes of the parents must now be optimized to maximize the production capability of their grandprogeny.
RNA Interference
Messenger RNA, which until recently has been the best-characterized RNA, serves as a transporter of information from DNA to a functional protein. However, there is also an extremely diverse repertoire of small RNA molecules that play an important role in the control of gene expression and cellular function. The RNA silencing pathway is initiated by 2 types of small RNA: small interfering RNA (siRNA) and miRNA. These siRNA and miRNA have been studied using many different methods in many different model systems, including livestock. The first characterized miRNA was a nonprotein coding transcript of the lin-4 gene identified in C. elegans (Lee et al., 1993
).
The RNA interference (RNAi) is the process by which double-stranded RNA (dsRNA) is cut by Dicer into short dsRNA molecules that recruit the RNA inducing silencing complex to cut, and thereby destroy, the dsRNA (Fire et al., 1998
). Short RNA that are complementary to endogenous RNA can induce this pathway by binding to their complementary sequence and initiating the RNA inducing silencing complex process. In mammalian cells, the RNAi pathway can be induced by the addition of small, 21 nucleotide siRNA which silence gene expression by knocking down the endogenous mRNA before it can be translated into a protein (Elbashir et al., 2001
; Tuschl, 2001
). The RNAi studies in livestock species are in early stages, but because it is a much simpler method for conducting gene knockout analyses than by physically knocking the gene out of the genome, we predict that RNAi will be widely used for the experimental control of gene expression. Because siRNA is effective in mammalian cells, RNAi can be applied for functional analysis in both mammalian cell culture and animal models. For animal studies, a siRNA library must be constructed and characterized for large-scale screening. The RNAi Consortium has constructed a mammalian RNAi library targeting 15,000 mouse and 15,000 human genes to facilitate the identification of human disease genes. This library can also be used for knock-down studies in livestock species because the sequence conservation of most of the mRNA within the small regions targeted by the siRNA will often be identical between human, mouse, and livestock species. For livestock species with draft WGS, the adequacy of this assumption can be directly established from sequence data.
In bovine cells, the direct injection of dsRNA results in the transient ablation of gene expression. Paradis et al. (2005)
found that knocking down cyclin B1 mRNA expression in the bovine oocyte by an injection of cyclin B1-targeted dsRNA led to reduced protein and activation in 10% of the oocytes. Knockdown of the DNMT1 transcript, a methyltransferase involved in the maintenance of DNA methylation that may be responsible for the aberrant methylation of in vitro produced embryos, by siRNA has also been accomplished in primary murine and bovine fibroblast cells (Adams et al., 2005
). Finally, E-Cadherin gene expression was knocked down by dsRNA as was protein expression in bovine preim-plantation embryos, resulting in a reduced rate of progression to blastocysts (Nganvongpanit et al., 2006
). Despite these accomplishments, the direct injection of dsRNA is not feasible for accomplishing a persistent or a whole-animal ablation of gene expression. Historically, human and mouse U6 promoters have been used to drive the expression of short hairpin RNA (shRNA) that form a double-stranded RNA molecule when transcribed in custom-designed plasmid vectors. However, for bovine applications a RNA polymerase III promoter that is the putative homolog of the human U6 promoter has been identified and shown to efficiently knock down an exogenous reporter gene and an endogenous bovine gene (Lambeth et al., 2005
). Novel bovine RNA polymerase III promoters continue to be developed and evaluated for use in bovine-specific RNAi research (Lambeth et al., 2006
).
The efficiency of RNAi in swine has been evaluated in granulosa cells transfected with plasmids containing constructs for green and red fluorescent proteins (GFP and RFP). When these cells were injected with siRNA specific to GFP, GFP expression was decreased by nearly 70%, and similar results were observed for RFP (Hirano et al., 2004
). He et al. (2007)
were able to vaccinate against the porcine reproductive and respiratory virus replication machinery in cells that were first transfected with a vector containing the mouse U6 promoter driving a targeted shRNA and which were then exposed to the porcine reproductive and respiratory virus. Replication of the virus was knocked down, and the cells remained healthy. Miyagawa et al. (2005)
targeted the p30 capsid protein common to porcine endogenous retroviruses (PERV) using shRNA vectors and successfully knocked down the expression of the mRNA and protein in PERV-infected human cell lines. This work is targeted at the development of a transgenic pig expressing the siRNA for PERV in the hope that the transmission of PERV infections might be eliminated in the xenotransplantation of pig organs into human.
A point mutation within the 3' untranslated region of myostatin creates a target site for at least 2 endogenous miRNA that are highly expressed in the skeletal muscle of Texel sheep (Clop et al., 2006
). As a consequence, the expression of myostatin mRNA and protein is knocked down, and there is uncontrolled muscular hypertrophy. Although controlled by a distinct mutation, it has been hypothesized but not yet demonstrated that RNAi may also be involved in producing the double muscled phenotype in callipyge sheep (Bidwell et al., 2004
). Finally, Pfeifer et al. (2006)
generated lentiviral vectors expressing cellular prion protein specific shRNA that efficiently suppressed the expression of the abnormally folded prion in scrapie-infected mouse primary neuronal cells. The success of this delivery system has profound implications not only for consumer safety and animal health, but also for human health by offering promise for treating Creutzfeldt-Jakob disease, which is related to scrapie.
The RNAi is at the forefront of genomics research. The technique promises to generate useful data in the fields of genetic and viral disease, immunology, tumor behavior, and drug target identification. Exploring the opportunities and limitations of RNAi in model systems, including livestock species, will become increasingly important not only for basic research but also for identifying potential therapeutics for human and livestock applications. The greatest impediment to RNAi is that the knock-down effect is only transient. Attempts are now underway to persistently express siRNA in mammalian systems by using vector delivery systems and constitutively active promoters to persistently knock-down target mRNA.
| PROSPECTIVE |
|---|
|
|
|---|
| Footnotes |
|---|
2 List of abbreviations used: ALL = acute lymphoblastic leukemia; Avy = agouti viable yellow; CGI = CpG islands; dsRNA = double-stranded RNA; eQTL = expression QTL; GFP = green fluorescent protein; IGF2 = insulin-like growth factor 2 gene; IGF2R = insulin-like growth factor 2 receptor gene; LD = linkage disequilibrium; miRNA = microRNA; NT = nuclear transfer; PERV= porcine endogenous retroviruses; QTLCR = QTL critical region; QTN = quantitative trait nucleotides; RFP = red fluorescent protein; RNAi = RNA interference; shRNA = short hairpin RNA; siRNA = small interfering RNA; smRNA = small RNA; WG = whole genome; WGS = whole genome sequence; and WGSL = whole genome selection. ![]()
3 Corresponding author: taylorjerr{at}missouri.edu
Received for publication May 21, 2007. Accepted for publication August 13, 2007.
| LITERATURE CITED |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. J. Garrett, G. Rincon, J. F. Medrano, M. A. Elzo, G. A. Silver, and M. G. Thomas Promoter region of the bovine growth hormone receptor gene: Single nucleotide polymorphism discovery in cattle and association with performance in Brangus bulls J Anim Sci, December 1, 2008; 86(12): 3315 - 3323. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. J. Lupton ASAS CENTENNIAL PAPER: Impacts of animal science research on United States sheep production and predictions for the future J Anim Sci, November 1, 2008; 86(11): 3252 - 3274. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |