[BioC] Motif enrichment analysis: Error in transfac format and background frequencies from BSGenome
deepti anand
anand.deepti at outlook.com
Mon Sep 8 20:13:01 CEST 2014
Hi Robert,
Thank you for example codes. I am able to extract all the 528 Mmusculus motifs from MotifDB by running the example codes you send. The code below gives me error when I try to get the A,C,G,T counts using getBackgroundFrequencies(). > prior = getBackgroundFrequencies("mm9")Error in pickGenome(organism) : Please pick one of the valid organisms: "dm3" or provide a BSgenome object of the target genome.
I have updated version of PWMEnrich (3.6.1) installed. Could you please suggest me how to proceed with this error. I appreciate your help.
-Dips-Date: Mon, 8 Sep 2014 18:22:28 +0100From: rainmansr at gmail.com
To: anand.deepti at outlook.com
CC: bioconductor at r-project.org
Subject: Re: [BioC] Motif enrichment analysis: Error in transfac format and background frequencies from BSGenome
Hi Dips,
If you haven't already done so, please first update to the latest
version of PWMEnrich (in release this is 3.6.1). I would recommend
converting the MotifDb motifs directly into PFMs that PWMEnrich
expects. The only issue here is that MotifDb motifs come from
different sources and are not always in the same format (i.e.
sometimes they are probabilites, sometimes count matrices). Here
is some example code to extract the motifs from MotifDb:
# extract mouse
motifs
d = values(MotifDb)
dm.sel = which(d$organism == "Mmusculus")
# output list of motifs
motifs = list()
for(i in dm.sel){
seq.count = d$sequenceCount[i]
if(is.na(seq.count))
seq.count = 100
motifs[[length(motifs)+1]] = apply(round(MotifDb[[i]] *
seq.count), 1:2, as.integer)
}
motif.names = d$geneSymbol[dm.sel]
motif.ids = d$providerName[dm.sel]
motif.names[is.na(motif.names)] = motif.ids[is.na(motif.names)]
names(motifs) = motif.ids
# get A,C,G,T counts
prior = getBackgroundFrequencies("mm9")
# convert to PWMenrich PWM format
pwms = PFMtoPWM(motifs, id=motif.ids, name=motif.names,
prior.params=prior)
# create background distributions
bg = makeBackground(pwms, "mm9")
The last line is using the mm9 promoters that are built-in into
PWMEnrich as genomic background. If you want to use a different
set of promoter sequences (i.e. mm10), you will have to extract
them yourself into a DNAStringSet object and pass them like this:
bg =
makeBackground(pwms, bg.seq=your_DNAStringSet_object)
Cheers, Robert
On 07/09/14 23:17, deepti anand wrote:
Hi Roberts,
Thank you for suggestion. The backgrounds available in
PWMEnrich for mouse are in mm9 assembly (current is mm10). Also, I found that it has 329 PWMs which is
less than current MotifDb (528 motifs). That is why I want
to create a background with the current mouse genome and
use 528 motifs for enrichment analysis in my gene list
Could you please tell me how can I export the motifs in
'transfac ' format and get the
background frequencies from 'BSgenome.Mmusculus.UCSC.mm10'.
I would appreciate it.
Dips
> Date: Sun, 7 Sep 2014 19:38:43 +0100
> From: rainmansr at gmail.com
> To: anand.deepti at outlook.com
> CC: bioconductor at r-project.org
> Subject: Re: [BioC] Motif enrichment analysis:
Error in transfac format and background frequencies from
BSGenome
>
>
> Dear Deepti,
>
> If you want to use the mouse MotifDB motifs you can
retrieve them in the
> correct format for PWMEnrich here:
>
>
http://bioconductor.org/packages/2.14/data/experiment/html/PWMEnrich.Mmusculus.background.html
>
> Cheers, Robert
>
> On 07/09/14 16:47, deepti anand wrote:
> > Hi all,
> > I am scanning a geneset for all the Mmusculus
motifs and comparing their enrichment to genomic
background. I am using MotifDb package to retrieve
motifs and PWMEnrich for doing motif enrichment. I am
getting error in the below code-
> >
> > 1). Get all motifs in Mmusculus from MotifDb
in transfac format-
> > In this step when exporting the motifs as
TRANSFAC format I am getting error. Here are my codes:
> >
> >
> >> motifs.denovo = query(MotifDb,
'Mmusculus')
> >>
export(motifs.denovo,con='MotifDBFile',format='transfac')
> > Error in cat(list(...), file, sep, fill,
labels, append) :
> > argument 1 (type 'closure') cannot be handled
by 'cat'
> >
> >
> >
> > 2). Convert count matrices into PWMs: In this
step the error is in getting the background frequencies
from Mmusculus BSgenome. Here are my code:
> >
> >
> >> library(BSgenome.Mmusculus.UCSC.mm10)
> >> genome = BSgenome.Mmusculus.UCSC.mm10
> >> genomic.acgt =
getBackgroundFrequencies("BSgenome.Mmusculus.UCSC.mm10")
> > Error in pickGenome(organism) :
> > Please pick one of the valid organisms: "dm3"
or provide a BSgenome object of the target genome.
> >
> >
> > I would appreciate any help
> >
> >
> > Dips
> > [[alternative HTML version deleted]]
> >
> >
_______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> >
https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
[[alternative HTML version deleted]]
More information about the Bioconductor
mailing list