[BioC] ChIPpeakAnno
Bahar
baharulchoudhury at gmail.com
Tue Mar 11 15:24:06 CET 2014
Paul Shannon <pshannon at ...> writes:
>
> I offer below a short example of using rGADEM with MotifDb which you may
find useful.
>
> The example motivates and then demonstrates "seeded" search, in which a
candidate TF is provided to
> rGADEM. (The rGADEM vignette illustrates the simpler task of de novo
motif search.
>
> - Paul
>
> On Feb 27, 2013, at 9:03 AM, Zhu, Lihua (Julie) wrote:
>
> > Jose,
> >
> > Thanks for your kind word!
> >
> > You might want to check out rGADEM package for de nova motif discovery.
> >
> > Best regards,
> >
> > Julie
> >
>
> library(rGADEM)
> library(MotifDb)
> library(BSgenome.Hsapiens.UCSC.hg19)
> path <- system.file("extdata/Test_100.fasta", package="rGADEM")
> sequences <- readDNAStringSet(path, "fasta")
> length(sequences) # 49
>
> # you can use the UCSC genome browser to BLAT the first 4 sequences,
one at a time,
> # against human hg19, and see what TFs bind there
> # http://genome.ucsc.edu/cgi-bin/hgBlat?command=start
>
> as.character(substr(sequences[1:4], 1, 80))
>
> # Once you blat, click through to view the blat match in the genome
browser,
> # and enable the "Transcription Factor ChIP-seq from ENCODE" track.
> # The FOXA1 Tf has a high score in this region of chr1
> # for most of these sequences
>
> # obtain the reported motif for this TF from the Bioc MotifDb package
>
> query(MotifDb, 'foxa1')
>
> # MotifDb object of length 1
> # | Created from downloaded public sources: 2012-Nov-01
> # | 1 position frequency matrices from 1 source:
> # | JASPAR_CORE: 1
> # | 1 organism/s
> # | Hsapiens: 1
> # Hsapiens-JASPAR_CORE-FOXA1-MA0148.1
>
> # now use rGADEM to do a seeded search:
>
> pwm.foxa1 <- as.list(MotifDb["Hsapiens-JASPAR_CORE-FOXA1-MA0148.1"])
> gadem.foxa1 <-GADEM(sequences,verbose=1,genome=Hsapiens,Spwm=pwm.foxa1,
seed=TRUE)
> consensus(gadem.foxa1)
>
> # [1] "nTGTTTACwyw" "GmwGrrrsswGGvAGn"
>
> # how do the 49 sequences match to these two motifs?
> mean(sapply(gadem.foxa1 <at> motifList[[1]] <at> alignList, function(x) x
<at> pval))
> mean(sapply(gadem.foxa1 <at> motifList[[2]] <at> alignList, function(x) x
<at> pval))
>
> On Feb 27, 2013, at 9:03 AM, Zhu, Lihua (Julie) wrote:
>
> > Jose,
> >
> > Thanks for your kind word!
> >
> > You might want to check out rGADEM package for de nova motif discovery.
> >
> > Best regards,
> >
> > Julie
> >
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
Hi Paul,
Thank you very much for your example rGADEM and motIV analysis.
I am working with a set of sequences in fasta format and looking for
potential motif denovo. Once identified, I wanted to verify if those match
with available databases (JASPAR etc.) However I fail to get already
reported motifs using the same sequences. I copied the script below for your
input and suggestions please.
library(rGADEM)
pwd<-"" #INPUT FILES- BedFiles, FASTA, etc.
path<- system.file("extdata/mysequences.fasta",package="rGADEM")
FastaFile<-paste(pwd,path,sep="")
Sequences <- readDNAStringSet(FastaFile, "fasta")
gadem<-GADEM(Sequences,
seed=1,
verbose=1,
numWordGroup=3,
numTop3mer=20,
numTop4mer=40,
numTop5mer=60,
numGeneration=5,
pValue=0.0002,
eValue=0.0,
maskR=1,
maxSpaceWidth=10,
useChIPscore=0,
numEM=40,
fEM=0.5,
fullScan=1,
slideWinPWM=6,
stopCriterion=1,
numBackgSets=10,
weightType=0,
bFileName="NULL",
nmotifs=25,
extTrim=1,
minSpaceWidth=0,
minSites =-1)
motifs <- getPWM(gadem)
print(motifs)
Thank you very much for your comments.
Bahar
More information about the Bioconductor
mailing list