Statistical and Computational Methods for the Analysis of Massive Genomic Datasets

Areas of Investigation
Our group develops statistical and computational methods for the analysis of massive genomic datasets. We are interested in genome evolution, in particular identifying genome sequences that differ significantly between or within species and their relationship to biomedical traits of interest. Many of these sequences are non-coding, such as regulatory signals, structural sites, and RNA genes. One of our aims is to identify specific DNA alterations that are responsible for variation in gene expression. Current projects focus on two major areas: (1) fast evolving regions of the human genome and (2) adaptive evolution in microbial communities.

Approaches
Our human research makes use of the increasing number of vertebrate whole genome sequences as well as human population genomic data.

Our microbial work utilizes whole genomes and metagenomic data - DNA sequenced directly from the ecosystem, representing a pool of genome fragments from multiple species. In both cases, we develop and apply probabilistic models of molecular evolution to detect sequences that evolve uniquely in one lineage (clade, species or sub-population). We then use statistical modeling, bioinformatics, and experimental validation to associate these changes in the mode or tempo of evolution with changes in biological function.

Significance
Understanding the genetic basis for the human-specific aspects of our biology and health is of fundamental interest. Cardiovascular disease is the leading cause of death in humans, but not in other primates. Furthermore, many human diseases are unique to our species (e.g. progression of HIV to AIDS, plaque formation in Alzheimer's) or vary significantly in prevalence among human populations. Metagenomics of the human microbiome promises to shed light on the significant role that adaptation to microbial flora has played and continues to play in shaping our genome. Approaches developed in the lab can also be used to understand the evolving health of our planet.