BEMP: Methodological Research for Genetics and Genomics

Genomics is transforming clinical and translational science research. New experimental technologies have generated high-dimensional and complex data sets and analysis and inferences from these data are intrinsically statistical. The area of statistical genetics provides novel, efficient, and powerful statistical and computational methods for analysis of genetics and genomics data. The creation of the Center for Genetics of Complex Traits within the CCEB, and the recruitment of outstanding faculty specializing in statistical genetics, provides a resource crucial to the success of the CTSA. This group has been very active in methodological research directly related to collaborative applications, as discussed above. The need for support for this level and type of research is only expected to increase as laboratory methods and advances become more available. Two areas of active research are briefly described below.

Statistical Methods for eQTL Analysis: It has become clear that gene expression levels vary among individuals and can be analyzed like other quantitative phenotypes such as height and serum glucose level. The genetics of gene expression, also known as genetical genomics or eQTL studies, is the study of the genetic basis of variation in gene expression. eQTL studies take advantage of this natural variation, enabling the study of gene expression. The results have already uncovered interesting and unexpected aspects of gene regulation. The key idea of the eQTL studies is that the abundance of a gene transcript is directly modified by polymorphisms in regulatory elements. Consequently, transcript abundance (i.e., gene expression levels) can be considered as a quantitative trait that can be mapped with genetic linkage or association studies. However, methods for effectively analyzing such data are still very limited, most on single SNP and single transcript analysis. In addition, there is a need for methods that can link both the genetics variants and gene expression data to disease phenotypes and methods that can lead to causal relationships between DNA mRNA and phenotypes.

Statistical and Computational Methods for Next Generation Sequences Data: A new generation of deep sequencing technologies, including Applied Biosystems' SOLiD, Helicos BioSciences' HeliScope, Illumina's Solexa, and Roche's 454 Life Sciences sequencing systems, has delivered on promises of sequencing DNA at unprecedented speed, thereby enabling impressive scientific achievements and novel biological applications. These high-throughput sequencing technologies have already been applied for studying genome-wide transcription levels, transcription factor binding sites, chromatin structure and DNA methylation status and metagenomics. While sequencing-based technologies provide high-resolution measurements of various biological quantities, these new biotechnologies also raise novel statistical and computational challenges, in areas such as image analysis, base-calling and read-mapping in initial analysis, peak finding and differential comparisons in comparative experiments and mixture modeling in metagenomics. For example, there is a need for eQTL analysis methods when the mRNA-seq is used for measuring the transcript levels in order to study the genetics of alternative splicing and isoform-specific variants. There is also a great need for methods for integrative analysis of multiple types of sequences data in order to understand the mechanisms of gene regulations, both at the genomic and at the epigenomic levels.