Title: PhyloGibbs-MP: Detecting cis-regulatory modules by Gibbs sampling


Cis-regulatory modules (CRMs) in higher eucaryotes are regions of
non-coding DNA that contain binding sites for regulatory proteins for
nearby genes.  CRMs are typically located upstream of their target
genes but are also found in introns or downstream of genes, and are
typically about 1-2 kilobases in length.  Predicting these amid tens
of kilobases (or more) of intergenic sequence is an important
computational problem.  Most approaches (eg Stubb, Cis-Analyst)
consist of clustering predicted binding sites, which requires prior
knowledge of transcription factors (TFs) that are likely to regulate
the gene in question.  After briefly reviewing these, we present
PhyloGibbs-MP, an extension to our recent motif-finder PhyloGibbs,
that is capable of predicting CRMs ab initio, as well as predicting
binding sites within those CRMs.  Essentially, PhyloGibbs-MP localises
predictions to short regions of a pre-specified length but varying
position.  We examine its performance in predicting well-known
enhancers (primarily in the early development/segmentation genes) in
Drosophila melanogaster, and also discuss new predictions (primarily
in myoblast development).  We also briefly consider some other
advances in PhyloGibbs-MP.  If prior information in the form of weight
matrices for already-characterised TFs is available, PhyloGibbs-MP
makes use of that information.  Additionally, when groups of genes are
believed to be differently regulated, PhyloGibbs-MP can find motifs
that occur ``differentially'' in one group of genes over another.
Meanwhile, it retains most of the features of PhyloGibbs (with several
improvements in the algorithm, including considerable speed
improvements) and can continue to be used as a regular motif-finder.