[BioC] heatmap.2 - change column & row locations; angle / rotate

Fri Jul 23 15:16:09 CEST 2010

Hi Karl,
The only way I know to rotate the labels is pretty crude. You will have to reconstitute the labels using the text() function.
The caveat here is you'll have to play around to get this right.

Try something like this:

Library(gplots)
x <- matrix(rnorm(25), 5)
heatmap.2(x, labRow="", labCol="") #remove the labels
# plot the text, perhaps someone can think of a smarter way of getting the labels in position...
text(seq(par("xaxp")[1]+par("xaxp")[2]/par("xaxp")[3], par("xaxp")[2], by=0.8*(par("xaxp")[2]/par("xaxp")[3])),par("usr")[3], par("usr")[3] - 0.2, labels = c("first", "second", "third", "fourth", "fifth"), srt = 45, pos = 1, xpd = TRUE)

Unfortunatetly the heatmap is laid out in a 2x2 matrix with the dendrograms and key in the first 3 cells and the heatmap in the bottom right -- I'm not sure if it is possible to access the axes of this element independently. If one could then it might make positioning the labels for the heatmap moiety of the plot simple.

Amos

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of bioconductor-request at stat.math.ethz.ch
Sent: 23 July 2010 11:00
To: bioconductor at stat.math.ethz.ch
Subject: Bioconductor Digest, Vol 89, Issue 22

Send Bioconductor mailing list submissions to
	bioconductor at stat.math.ethz.ch

To subscribe or unsubscribe via the World Wide Web, visit
	https://stat.ethz.ch/mailman/listinfo/bioconductor
or, via email, send a message with subject or body 'help' to
	bioconductor-request at stat.math.ethz.ch

You can reach the person managing the list at
	bioconductor-owner at stat.math.ethz.ch

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioconductor digest..."

Today's Topics:

   1. heatmap.2 - change column & row locations; angle  / rotate
      (Karl Brand)
   2. In limma, how to set quility weight for each spot. (Jinyan Huang)
   3. Re: In limma, how to set quility weight for each spot.
      (Sean Davis)
   4. Re: exonmap/xmapcore error (Crispin Miller)
   5. Heatmap.2 scale problems: Sacling inside the function gives
      different results than scaling outside!!! (Elmer Fern?ndez)
   6. Re: exonmap/xmapcore error (Crispin Miller)
   7. Re: Heatmap.2 scale problems: Sacling inside the function
      gives different results than scaling outside!!! (Sean Davis)
   8. ShortRead QA (Alex Gutteridge)
   9. Re: Heatmap.2 scale problems: Sacling inside the	function
      gives different results than scaling outside!!! (Bazeley, Peter)
  10. Re: Heatmap.2 scale problems: Sacling inside the	function
      gives different results than scaling outside!!! (Benjamin Otto)
  11. Biostrings - vcountPattern optimization (Erik Wright)
  12. Re: Biostrings - vcountPattern optimization (Steve Lianoglou)
  13. problem about hgu133plus2 annotation (Gina Liao)
  14. Re: Heatmap.2 scale problems: Sacling inside the function
      gives different results than scaling outside!!! (Elmer Fern?ndez)
  15. Re: problem about hgu133plus2 annotation (Marc Carlson)
  16. Re: problem about hgu133plus2 annotation (James W. MacDonald)
  17. Re: Biostrings - vcountPattern optimization (Patrick Aboyoun)
  18. Re: feature request - pairwiseAlignment() in Biostrings
      (Patrick Aboyoun)
  19. Re: Biostrings - vcountPattern optimization (Erik Wright)
  20. Re: feature request - pairwiseAlignment() in Biostrings
      (Michael Lawrence)
  21. Re: Heatmap.2 scale problems: Sacling inside the function
      gives different results than scaling outside!!! (Steve Lianoglou)
  22. Re: Biostrings - vcountPattern optimization (Herv? Pag?s)
  23. Re: Heatmap.2 scale problems: Sacling inside the function
      gives different results than scaling outside!!! (Elmer Fern?ndez)
  24. Re: Heatmap.2 scale problems: Sacling inside the function
      gives different results than scaling outside!!! (Sean Davis)
  25.  the design matrix again (Gordon K Smyth)
  26. Open Postdoc Positions (Thomas Girke)
  27. Re: htQPCR (Heidi Dvinge)
  28. Re: Problem with function limmaCtData in HTqPCR package:
      "leading minor of order 2 is not positive definite" (Heidi Dvinge)
  29. building a refseq-based transcriptDb: warnings of interest?
      (Vincent Carey)

----------------------------------------------------------------------

Message: 1
Date: Thu, 22 Jul 2010 12:18:16 +0200
From: Karl Brand <k.brand at erasmusmc.nl>
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] heatmap.2 - change column & row locations; angle  /
	rotate
Message-ID: <4C481AE8.7060701 at erasmusmc.nl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

<Reposting from "r-help at r-project.org">

Esteemed BioC user's,

I'm struggling to achieve some details of a heatmap using heatmap.2():

1. Change label locations, for both rows & columns from the default
right & bottom, to left and top.
Can this be done within heatmap.2()? Or do i need to suppress this
default behavior (how) and call a new function to relabel (what)
specifying locations?

2. Change the angle of the labels.
By default column labels are 90deg anti-clock-wise from horizontal. How
to bring them back to horizontal? Or better, rotate 45deg clock-wise
from horizontal (ie., rotate 135deg a.clock.wise from default)?

Any suggestions or pointers to helpful resources greatly appreciated,

Karl

-- 
Karl Brand
Department of Genetics
Erasmus MC
Dr Molewaterplein 50
3015 GE Rotterdam
T +31 (0)10 704 3457 |F +31 (0)10 704 4743 |M +31 (0)642 777 268

------------------------------

Message: 2
Date: Thu, 22 Jul 2010 13:39:46 +0200
From: Jinyan Huang <jhuang.ceph at gmail.com>
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] In limma, how to set quility weight for each spot.
Message-ID:
	<AANLkTilvAvDQrbcp-lBFA8Pct7SUT2vguxMOv5L4dzUn at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi all,
My data is from GoldenGate Methylation Cancer Panel I. For each spot,
there are a p-value for quility. I want to use limma to analysis the
data. How can I set the quility weight for each spot? From the manual
of limma, it can be set by read.maimages. But my data is not import by
read.maimages.

Thanks.

------------------------------

Message: 3
Date: Thu, 22 Jul 2010 06:02:28 -0600
From: Sean Davis <sdavis2 at mail.nih.gov>
To: Jinyan Huang <jhuang.ceph at gmail.com>
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] In limma, how to set quility weight for each spot.
Message-ID:
	<AANLkTin2pna5TERLtX53tLQIw0Za5rzqlnLtEKidC8hD at mail.gmail.com>
Content-Type: text/plain

On Thu, Jul 22, 2010 at 5:39 AM, Jinyan Huang <jhuang.ceph at gmail.com> wrote:

> Hi all,
> My data is from GoldenGate Methylation Cancer Panel I. For each spot,
> there are a p-value for quility. I want to use limma to analysis the
> data. How can I set the quility weight for each spot? From the manual
> of limma, it can be set by read.maimages. But my data is not import by
> read.maimages.
>
>
Hi, Jinyan.  You'll want to read the help for lmFit().

Sean

	[[alternative HTML version deleted]]

------------------------------

Message: 4
Date: Thu, 22 Jul 2010 13:58:04 +0100
From: "Crispin Miller" <cmiller at picr.man.ac.uk>
To: "Bioconductor" <bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] exonmap/xmapcore error
Message-ID: <C86DFEEC.CC8D%cmiller at picr.man.ac.uk>
Content-Type: text/plain

Dear Anupam,

Since we published exonmap, we've released a newer package, xmapcore. This
focuses on the core database connectivity and has a significant amount of
work done behind the API to make certain bits of it much much quicker. We'll
put a note in the exonmap vignette to point people to the new package, since
it's obviously causing a bit of confusion.

One thing that xmapcore does is use a smaller database that's been optimised
for some of the queries that were slower in exonmap than we would have liked
- this also means that you no longer have to install Ensembl - the xmapcore
database, on it's own, will do the job.

Have a look at the documentation for the xmapcore package (especially
INSTALL.pdf) that provides step-by-step installation instructions.

As we mention in the exonmap vignette, there were some basic utility
functions to help people load and begin to explore exon array data. As
you'll see from the vignette, we've not duplicated these in xmapcore.

Crispin

On 20/07/2010 17:00, "anupam sinha" <anupam.contact at gmail.com> wrote:

> Dear all,
>                 I have been learning to use exonmap/xmapcore from the
> tutorial ""Comprehensive analysis of Affymetrix Exon arrays Using
> BioConductor" .
> But I have run into some problems. I have installed
> "xmapcore_homo_sapiens_58" on my system as per instructions .
> Do I also have to install ensemble and old exonmap databases? Can
> someone help me out ? Thanks in advance for any suggestions.
> 
> 
>> > library(xmapcore)
>> > library(exonmap)
> Loading required package: affy
> Loading required package: Biobase
> 
> Welcome to Bioconductor
> 
>   Vignettes contain introductory material. To view, type
>   'openVignette()'. To cite Bioconductor, see
>   'citation("Biobase")' and for packages 'citation(pkgname)'.
> 
> 
> Attaching package: 'Biobase'
> 
> The following object(s) are masked from 'package:IRanges':
> 
>     updateObject
> 
> Loading required package: genefilter
> Loading required package: RColorBrewer
> 
> Attaching package: 'exonmap'
> 
> The following object(s) are masked from 'package:xmapcore':
> 
>     exon.details, exon.to.gene, exon.to.probeset, exon.to.transcript,
>     exonic, exons.in.range, gene.details, gene.to.exon,
>     gene.to.probeset, gene.to.transcript, genes.in.range, intergenic,
>     intronic, is.exonic, is.intergenic, is.intronic, probes.in.range,
>     probeset.to.exon, probeset.to.gene, probeset.to.probe,
>     probeset.to.transcript, probesets.in.range, symbol.to.gene,
>     transcript.details, transcript.to.exon, transcript.to.gene,
>     transcript.to.probeset, transcripts.in.range
> 
> 
>> > setwd("/home/aragorn/R_Workspace/ExonarraysMCF7andMCF10Adata_cel/")
>> > raw.data<-read.exon()
>> > raw.data at cdfName<-"exon.pmcdf"
>> > x.rma<-rma(raw.data)
> Background correcting
> Normalizing
> Calculating Expression
>> > pc.rma<-pc(x.rma,"group",c("a","b"))
>> > keep<-(abs(fc(pc.rma))>1)&tt(pc.rma)< 1e-4
>> > sigs<-featureNames(x.rma)[keep]
>> > xmapConnect()
> Select a database to connect to:
> 
> 1: Hman ('xmapcore_homo_sapiens_58')
> 
> Selection: 1
> password:
> Warning message:
> In .xmap.load.config() :
>   Environment 'R_XMAP_CONF_DIR' not set. Please refer to INSTALL.TXT for
> information on how to set this up.
> 
> Trying '.exonmap'.
> 
>> > probeset.to.exon(sigs[1:5])
> *Error in mysqlExecStatement(conn, statement, ...) :
>   RS-DBI driver: (could not run statement: PROCEDURE
> xmapcore_homo_sapiens_58.xmap_probesetToExon does not exist)*
>> > xmapConnect()
> Select a database to connect to:
> 
> 1: Hman ('xmapcore_homo_sapiens_58')
> 
> Selection: 1
> 
>> > probeset.to.exon(sigs[1:5])
> Error in mysqlExecStatement(conn, statement, ...) :
>   RS-DBI driver: (could not run statement: PROCEDURE
> xmapcore_homo_sapiens_58.xmap_probesetToExon does not exist)
> 
>> > xmap.connect()
> password:
> Disconnecting from xmapcore_homo_sapiens_58 (localhost)
> Connected to xmapcore_homo_sapiens_58 (localhost)
> Selected array 'HuEx-1_0' as a default.
>> > probeset.to.exon(sigs[1:5])
> *Error in mysqlExecStatement(conn, statement, ...) :
>   RS-DBI driver: (could not run statement: PROCEDURE
> xmapcore_homo_sapiens_58.xmap_probesetToExon does not exist)*
>> > sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-redhat-linux-gnu
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
>  [1] exon.pmcdf_1.1     exonmap_2.6.0      RColorBrewer_1.0-2
> genefilter_1.30.0
>  [5] affy_1.26.1        Biobase_2.8.0      xmapcore_1.2.5
> digest_0.4.2
>  [9] IRanges_1.6.8      RMySQL_0.7-4       DBI_0.2-5
> 
> loaded via a namespace (and not attached):
>  [1] affyio_1.16.0         annotate_1.26.1       AnnotationDbi_1.10.2
>  [4] preprocessCore_1.10.0 RSQLite_0.9-1         splines_2.11.0
>  [7] survival_2.35-8       tcltk_2.11.0          tools_2.11.0
> [10] xtable_1.5-6
> 
> Regards,
> 
> Anupam
> --
> Graduate Student,
> Center For DNA Fingerprinting And Diagnostics,
> 4-1-714 to 725/2, Tuljaguda complex
> Mozamzahi Road, Nampally,
> Hyderabad-500001
> 
>         [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
--------------------------------------------------------
This email is confidential and intended solely for the u...{{dropped:15}}

------------------------------

Message: 5
Date: Thu, 22 Jul 2010 10:05:39 -0300
From: Elmer Fern?ndez <elmerfer at gmail.com>
To: Bioconductor mailing list <bioconductor at stat.math.ethz.ch>
Subject: [BioC] Heatmap.2 scale problems: Sacling inside the function
	gives	different results than scaling outside!!!
Message-ID:
	<AANLkTilQKsufWajTT9SKcSCaV0dUTqie7iL2mmwxQDYP at mail.gmail.com>
Content-Type: text/plain

Dear Users
I'm working with the heatmap.2 function and I realize that if you use the
scale input paramenter gives different results than usign the scale function
outsie and feed the heatmap.2 fucntion with the scaled matrix. I attached
the results of the two approaches and the used data matrix (M.csv).
SO, what I'm doing wrong?

R Code

library(gplots)
M=matrix(c(rnorm(10*3,1,2),rnorm(10*2,-0.5,1)),ncol=5)
heatmap.2(M,scale="column",trace="none",main="scaled inside")
x11();heatmap.2(scale(M),scale="none",trace="none",main="scaled outside")

> sessionInfo()
R version 2.10.0 (2009-10-26)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C
LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8
 [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods
base

other attached packages:
[1] gplots_2.7.4   caTools_1.10   bitops_1.0-4.1 gdata_2.7.1
gtools_2.6.1   rkward_0.5.1

loaded via a namespace (and not attached):
[1] tools_2.10.0

-- 
Elmer A. Fernández (Bioing. PhD)
Investigador Asistente CONICET - Research Assistant CONICET
Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
tel: +54-(0)351-4938000 int 145
Fax: +54-(0)351-4938081
web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
http://sites.google.com/site/biologicaldatamininggroup/Home/
mail address: Camino Alta Gracia Km 7.1/2- Córdoba-5017-Argentina

-- 
Elmer A. Fernández (Bioing. PhD)
Investigador Asistente CONICET - Research Assistant CONICET
Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
tel: +54-(0)351-4938000 int 145
Fax: +54-(0)351-4938081
web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
http://sites.google.com/site/biologicaldatamininggroup/Home/
mail address: Camino Alta Gracia Km 7.1/2- Córdoba-5017-Argentina

	[[alternative HTML version deleted]]

------------------------------

Message: 6
Date: Thu, 22 Jul 2010 14:09:55 +0100
From: "Crispin Miller" <cmiller at picr.man.ac.uk>
To: "Bioconductor" <bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] exonmap/xmapcore error
Message-ID: <C86E01B3.CC91%cmiller at picr.man.ac.uk>
Content-Type: text/plain

Hi Paul,

Hopefully it's simpler now - with xmapcore, you need to install just the
xmapcore database into a working MySQL instance (and the package itself, of
course).

There's also a pretty detailed walk through in the INSTALL.pdf document that
forms part of the xmapcore package.

Crispin

> 
> Yeah originally, they did a pretty poor job at describing how to do
> that, it was the largest impediment to otherwise using a very nice
> package. They threw you to the wolves by pointing to a section that
> describes how to entire the whole ensemble DB and web interface. I
> notice they have the new xmapcore database , are those the ones you are
> using?:
> 
> http://xmap.picr.man.ac.uk/download/index#hsxmapcore
> 
> I have NOT used those
> 
> but at  least in the beginning of the year , You only need SQL to
> install ,you do not need to install ensemble , just the "core" data
> base.
> As I recall you need to go into the SQl and get create the database
> then you need to run the script that makes the tables.
> Then these are filled (but a second script, cat's recall)
> 
> my notes indicate I also inatall exon.pmcdf: (in above web link)
> R CMD INSTALL --clean exon.pmcdf_1.1.tar.gz
> 
> 
> 
> you may need to run something like this on the command line first to
> start the service:
> 
> mysql -h host_computer -u xmap -pPassword ## where the host_compueter is
> where the db is and Password is the password)
> 
> then in R
> 
> xmapConnect("human")
> 
> 
> ##################
> In my home directory there is a .exnmap file with:
> a file database.txt attached
> 
> and a subfolder db.local that has
> a file starts.core.homo_sapiens_core_56_37a.R a larget 3.7Mb file
> 
> and  in bashrc:
> export XMAP_BRIDGE_CACHE=/home/pleo/.xmb_cache
> #######
> 
> I think now with the new core database you might be better off using
> documentation in the latest exonmap or xmapcore  libraries than that original
> manuscript. They have made some changes.
> 
> Hope that helps
> Paul
> 
> 
> 
> -----Original Message-----
> From: anupam sinha <anupam.contact at gmail.com>
> To: bioc <Bioconductor at stat.math.ethz.ch>
> Subject: [BioC] exonmap/xmapcore error
> Date: Tue, 20 Jul 2010 21:30:24 +0530
> 
> 
> Dear all,
>                 I have been learning to use exonmap/xmapcore from the
> tutorial ""Comprehensive analysis of Affymetrix Exon arrays Using
> BioConductor" .
> But I have run into some problems. I have installed
> "xmapcore_homo_sapiens_58" on my system as per instructions .
> Do I also have to install ensemble and old exonmap databases? Can
> someone help me out ? Thanks in advance for any suggestions.
> 
> 
>> > library(xmapcore)
>> > library(exonmap)
> Loading required package: affy
> Loading required package: Biobase
> 
> Welcome to Bioconductor
> 
>   Vignettes contain introductory material. To view, type
>   'openVignette()'. To cite Bioconductor, see
>   'citation("Biobase")' and for packages 'citation(pkgname)'.
> 
> 
> Attaching package: 'Biobase'
> 
> The following object(s) are masked from 'package:IRanges':
> 
>     updateObject
> 
> Loading required package: genefilter
> Loading required package: RColorBrewer
> 
> Attaching package: 'exonmap'
> 
> The following object(s) are masked from 'package:xmapcore':
> 
>     exon.details, exon.to.gene, exon.to.probeset, exon.to.transcript,
>     exonic, exons.in.range, gene.details, gene.to.exon,
>     gene.to.probeset, gene.to.transcript, genes.in.range, intergenic,
>     intronic, is.exonic, is.intergenic, is.intronic, probes.in.range,
>     probeset.to.exon, probeset.to.gene, probeset.to.probe,
>     probeset.to.transcript, probesets.in.range, symbol.to.gene,
>     transcript.details, transcript.to.exon, transcript.to.gene,
>     transcript.to.probeset, transcripts.in.range
> 
> 
>> > setwd("/home/aragorn/R_Workspace/ExonarraysMCF7andMCF10Adata_cel/")
>> > raw.data<-read.exon()
>> > raw.data at cdfName<-"exon.pmcdf"
>> > x.rma<-rma(raw.data)
> Background correcting
> Normalizing
> Calculating Expression
>> > pc.rma<-pc(x.rma,"group",c("a","b"))
>> > keep<-(abs(fc(pc.rma))>1)&tt(pc.rma)< 1e-4
>> > sigs<-featureNames(x.rma)[keep]
>> > xmapConnect()
> Select a database to connect to:
> 
> 1: Hman ('xmapcore_homo_sapiens_58')
> 
> Selection: 1
> password:
> Warning message:
> In .xmap.load.config() :
>   Environment 'R_XMAP_CONF_DIR' not set. Please refer to INSTALL.TXT for
> information on how to set this up.
> 
> Trying '.exonmap'.
> 
>> > probeset.to.exon(sigs[1:5])
> *Error in mysqlExecStatement(conn, statement, ...) :
>   RS-DBI driver: (could not run statement: PROCEDURE
> xmapcore_homo_sapiens_58.xmap_probesetToExon does not exist)*
>> > xmapConnect()
> Select a database to connect to:
> 
> 1: Hman ('xmapcore_homo_sapiens_58')
> 
> Selection: 1
> 
>> > probeset.to.exon(sigs[1:5])
> Error in mysqlExecStatement(conn, statement, ...) :
>   RS-DBI driver: (could not run statement: PROCEDURE
> xmapcore_homo_sapiens_58.xmap_probesetToExon does not exist)
> 
>> > xmap.connect()
> password:
> Disconnecting from xmapcore_homo_sapiens_58 (localhost)
> Connected to xmapcore_homo_sapiens_58 (localhost)
> Selected array 'HuEx-1_0' as a default.
>> > probeset.to.exon(sigs[1:5])
> *Error in mysqlExecStatement(conn, statement, ...) :
>   RS-DBI driver: (could not run statement: PROCEDURE
> xmapcore_homo_sapiens_58.xmap_probesetToExon does not exist)*
>> > sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-redhat-linux-gnu
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
>  [1] exon.pmcdf_1.1     exonmap_2.6.0      RColorBrewer_1.0-2
> genefilter_1.30.0
>  [5] affy_1.26.1        Biobase_2.8.0      xmapcore_1.2.5
> digest_0.4.2
>  [9] IRanges_1.6.8      RMySQL_0.7-4       DBI_0.2-5
> 
> loaded via a namespace (and not attached):
>  [1] affyio_1.16.0         annotate_1.26.1       AnnotationDbi_1.10.2
>  [4] preprocessCore_1.10.0 RSQLite_0.9-1         splines_2.11.0
>  [7] survival_2.35-8       tcltk_2.11.0          tools_2.11.0
> [10] xtable_1.5-6
> 
> Regards,
> 
> Anupam
> --
> Graduate Student,
> Center For DNA Fingerprinting And Diagnostics,
> 4-1-714 to 725/2, Tuljaguda complex
> Mozamzahi Road, Nampally,
> Hyderabad-500001
> 
>         [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
--------------------------------------------------------
This email is confidential and intended solely for the u...{{dropped:15}}

------------------------------

Message: 7
Date: Thu, 22 Jul 2010 08:17:21 -0600
From: Sean Davis <sdavis2 at mail.nih.gov>
To: Elmer Fern?ndez <elmerfer at gmail.com>
Cc: Bioconductor mailing list <bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
	function	gives different results than scaling outside!!!
Message-ID:
	<AANLkTimZp4HrxsUyyokxJGrs7ajwFhgvG1NYRNFaZpyd at mail.gmail.com>
Content-Type: text/plain

2010/7/22 Elmer FernÃ¡ndez <elmerfer at gmail.com>

> Dear Users
> I'm working with the heatmap.2 function and I realize that if you use the
> scale input paramenter gives different results than usign the scale
> function
> outsie and feed the heatmap.2 fucntion with the scaled matrix. I attached
> the results of the two approaches and the used data matrix (M.csv).
> SO, what I'm doing wrong?
>
>
Hi, Elmer.

The default distance function used by heatmap.2 is dist() which is not going
to be invariant under centering and scaling, I don't think.  It looks like
you are using that default.

Sean

> R Code
>
> library(gplots)
> M=matrix(c(rnorm(10*3,1,2),rnorm(10*2,-0.5,1)),ncol=5)
> heatmap.2(M,scale="column",trace="none",main="scaled inside")
> x11();heatmap.2(scale(M),scale="none",trace="none",main="scaled outside")
>
> > sessionInfo()
> R version 2.10.0 (2009-10-26)
> x86_64-unknown-linux-gnu
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C
> LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8
> LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8
>  [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8
> LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
>
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
> [1] gplots_2.7.4   caTools_1.10   bitops_1.0-4.1 gdata_2.7.1
> gtools_2.6.1   rkward_0.5.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.10.0
>
>
> --
> Elmer A. FernÃ¡ndez (Bioing. PhD)
> Investigador Asistente CONICET - Research Assistant CONICET
> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
> tel: +54-(0)351-4938000 int 145
> Fax: +54-(0)351-4938081
> web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> http://sites.google.com/site/biologicaldatamininggroup/Home/
> mail address: Camino Alta Gracia Km 7.1/2- CÃ³rdoba-5017-Argentina
>
>
>
> --
> Elmer A. FernÃ¡ndez (Bioing. PhD)
> Investigador Asistente CONICET - Research Assistant CONICET
> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
> tel: +54-(0)351-4938000 int 145
> Fax: +54-(0)351-4938081
> web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> http://sites.google.com/site/biologicaldatamininggroup/Home/
> mail address: Camino Alta Gracia Km 7.1/2- CÃ³rdoba-5017-Argentina
>
>        [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

	[[alternative HTML version deleted]]

------------------------------

Message: 8
Date: Thu, 22 Jul 2010 15:26:21 +0100
From: Alex Gutteridge <alexg at ruggedtextile.com>
To: <bioconductor at stat.math.ethz.ch>
Subject: [BioC] ShortRead QA
Message-ID: <da36088b4e3acb477e837c2e970fd5a9 at ruggedtextile.com>
Content-Type: text/plain; charset=UTF-8

I'm dealing with some Solexa/Illumina data with ShortRead for the first
time and had a couple of questions relating to QA:

1. Memory requirements: My data comprises 7 s_N_export.txt files. Each one
comprises 10-20 million aligned reads. If I try to run qa() over the whole
directory my machine rapidly grinds to a halt. Tackling each file
individually keeps my machine running, but takes >1 hour for each one. The
ShortRead vignette says evaluating a single lane can take 'several
minutes', so I'm wondering if anyone can offer any clues as to why I'm
struggling so much? The machine in question has 6GB of RAM - do I just need
more?

2. Read distribution: The QA results I'm getting for the 'read
distribution' section don't quite look like those presented in the example
ShortRead Solexa QA report. My interpretation is that this is because my
data is actually rather high quality, but I'd appreciate a second opinion. 

To quote from the ShortRead QA report: 

'Ideally, the cumulative proportion of reads will transition sharply from
low to high. Portions to the left of the transition might correspond
roughly to sequencing or sample processing errors, and correspond to reads
that are represented relatively infrequently [...]. Portions to the right
of the transition represent reads that are over-represented compared to
expectation.'

Typically the read distribution plots I'm seeing look like this:
http://dl.dropbox.com/u/419878/readOccurences.jpg

There is a sharp transition, but no portion to the left. I interpret this
as a good sign: most of the reads are seen a small number of times (<10),
and there are relatively few over-represented reads. Is there anything
there that would worry more experienced heads?

-- 
Alex Gutteridge

------------------------------

Message: 9
Date: Thu, 22 Jul 2010 14:25:54 +0000
From: "Bazeley, Peter" <Peter.Bazeley at rockets.utoledo.edu>
To: Elmer Fern?ndez <elmerfer at gmail.com>
Cc: Sean Davis <sdavis2 at mail.nih.gov>,	Bioconductor mailing list
	<bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
	function	gives different results than scaling outside!!!
Message-ID:
	<5C621FDF7E426B4AAE3B2364B7EF07371F654407 at BL2PRD0103MB050.prod.exchangelabs.com>

Content-Type: text/plain; charset="iso-8859-1"

Hi Elmer,

The default scale option in heatmap.2 scales by row, whereas the scale() function scales by column, so this is probably why there is a difference. I think whichever dimension contains unique samples is how you want to scale (if this was expression data, for example).

Pete
________________________________________
From: bioconductor-bounces at stat.math.ethz.ch [bioconductor-bounces at stat.math.ethz.ch] on behalf of Sean Davis [sdavis2 at mail.nih.gov]
Sent: Thursday, July 22, 2010 9:17 AM
To: Elmer Fern?ndez
Cc: Bioconductor mailing list
Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the function       gives different results than scaling outside!!!

2010/7/22 Elmer Fern?ndez <elmerfer at gmail.com>

> Dear Users
> I'm working with the heatmap.2 function and I realize that if you use the
> scale input paramenter gives different results than usign the scale
> function
> outsie and feed the heatmap.2 fucntion with the scaled matrix. I attached
> the results of the two approaches and the used data matrix (M.csv).
> SO, what I'm doing wrong?
>
>
Hi, Elmer.

The default distance function used by heatmap.2 is dist() which is not going
to be invariant under centering and scaling, I don't think.  It looks like
you are using that default.

Sean

> R Code
>
> library(gplots)
> M=matrix(c(rnorm(10*3,1,2),rnorm(10*2,-0.5,1)),ncol=5)
> heatmap.2(M,scale="column",trace="none",main="scaled inside")
> x11();heatmap.2(scale(M),scale="none",trace="none",main="scaled outside")
>
> > sessionInfo()
> R version 2.10.0 (2009-10-26)
> x86_64-unknown-linux-gnu
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C
> LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8
> LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8
>  [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8
> LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
>
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
> [1] gplots_2.7.4   caTools_1.10   bitops_1.0-4.1 gdata_2.7.1
> gtools_2.6.1   rkward_0.5.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.10.0
>
>
> --
> Elmer A. Fern?ndez (Bioing. PhD)
> Investigador Asistente CONICET - Research Assistant CONICET
> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
> tel: +54-(0)351-4938000 int 145
> Fax: +54-(0)351-4938081
> web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> http://sites.google.com/site/biologicaldatamininggroup/Home/
> mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
>
>
>
> --
> Elmer A. Fern?ndez (Bioing. PhD)
> Investigador Asistente CONICET - Research Assistant CONICET
> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
> tel: +54-(0)351-4938000 int 145
> Fax: +54-(0)351-4938081
> web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> http://sites.google.com/site/biologicaldatamininggroup/Home/
> mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
>
>        [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

        [[alternative HTML version deleted]]

------------------------------

Message: 10
Date: Thu, 22 Jul 2010 16:38:16 +0200
From: Benjamin Otto <b.otto at uke.uni-hamburg.de>
To: "Bazeley, Peter" <Peter.Bazeley at rockets.utoledo.edu>
Cc: Sean Davis <sdavis2 at mail.nih.gov>,	Bioconductor mailing list
	<bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
	function	gives different results than scaling outside!!!
Message-ID: <61679366-2C04-4959-8D3D-997A45BF45F5 at uke.uni-hamburg.de>
Content-Type: text/plain;  charset="utf-8"

Hi Guys,

do note that the scale() function in heatmap doesn't scale your values till AFTER clustering for visualization purpose! So if you provide already scaled data, you naturally will expect a different result.

cheers

Benjamin

Am 22.07.2010 um 16:25 schrieb Bazeley, Peter:

> Hi Elmer,
> 
> The default scale option in heatmap.2 scales by row, whereas the scale() function scales by column, so this is probably why there is a difference. I think whichever dimension contains unique samples is how you want to scale (if this was expression data, for example).
> 
> 
> Pete
> ________________________________________
> From: bioconductor-bounces at stat.math.ethz.ch [bioconductor-bounces at stat.math.ethz.ch] on behalf of Sean Davis [sdavis2 at mail.nih.gov]
> Sent: Thursday, July 22, 2010 9:17 AM
> To: Elmer Fern?ndez
> Cc: Bioconductor mailing list
> Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the function       gives different results than scaling outside!!!
> 
> 2010/7/22 Elmer Fern?ndez <elmerfer at gmail.com>
> 
>> Dear Users
>> I'm working with the heatmap.2 function and I realize that if you use the
>> scale input paramenter gives different results than usign the scale
>> function
>> outsie and feed the heatmap.2 fucntion with the scaled matrix. I attached
>> the results of the two approaches and the used data matrix (M.csv).
>> SO, what I'm doing wrong?
>> 
>> 
> Hi, Elmer.
> 
> The default distance function used by heatmap.2 is dist() which is not going
> to be invariant under centering and scaling, I don't think.  It looks like
> you are using that default.
> 
> Sean
> 
> 
>> R Code
>> 
>> library(gplots)
>> M=matrix(c(rnorm(10*3,1,2),rnorm(10*2,-0.5,1)),ncol=5)
>> heatmap.2(M,scale="column",trace="none",main="scaled inside")
>> x11();heatmap.2(scale(M),scale="none",trace="none",main="scaled outside")
>> 
>>> sessionInfo()
>> R version 2.10.0 (2009-10-26)
>> x86_64-unknown-linux-gnu
>> 
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C
>> LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8
>> LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8
>> [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8
>> LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
>> 
>> attached base packages:
>> [1] grid      stats     graphics  grDevices utils     datasets  methods
>> base
>> 
>> other attached packages:
>> [1] gplots_2.7.4   caTools_1.10   bitops_1.0-4.1 gdata_2.7.1
>> gtools_2.6.1   rkward_0.5.1
>> 
>> loaded via a namespace (and not attached):
>> [1] tools_2.10.0
>> 
>> 
>> --
>> Elmer A. Fern?ndez (Bioing. PhD)
>> Investigador Asistente CONICET - Research Assistant CONICET
>> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
>> tel: +54-(0)351-4938000 int 145
>> Fax: +54-(0)351-4938081
>> web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
>> http://sites.google.com/site/biologicaldatamininggroup/Home/
>> mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
>> 
>> 
>> 
>> --
>> Elmer A. Fern?ndez (Bioing. PhD)
>> Investigador Asistente CONICET - Research Assistant CONICET
>> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
>> tel: +54-(0)351-4938000 int 145
>> Fax: +54-(0)351-4938081
>> web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
>> http://sites.google.com/site/biologicaldatamininggroup/Home/
>> mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
>> 
>>       [[alternative HTML version deleted]]
>> 
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
> 
>        [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

___________________________________________
Benjamin Otto, PhD
University Medical Center Hamburg-Eppendorf
Institute For Clinical Chemistry / Central Laboratories
Campus Forschung N27
Martinistr. 52,
D-20246 Hamburg

Tel.: +49 40 7410 51908
Fax.: +49 40 7410 54971
___________________________________________

-- 
Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG):

Universit?tsklinikum Hamburg-Eppendorf
K?rperschaft des ?ffentlichen Rechts
Gerichtsstand: Hamburg

Vorstandsmitglieder:
Prof. Dr. J?rg F. Debatin (Vorsitzender)
Dr. Alexander Kirstein
Joachim Pr?l?
Prof. Dr. Dr. Uwe Koch-Gromus

------------------------------

Message: 11
Date: Thu, 22 Jul 2010 10:54:28 -0500
From: Erik Wright <eswright at wisc.edu>
To: BioC list <bioconductor at stat.math.ethz.ch>
Subject: [BioC] Biostrings - vcountPattern optimization
Message-ID: <3E19C211-BA75-4C68-88DE-1079FE64CAB0 at wisc.edu>
Content-Type: text/plain; CHARSET=US-ASCII

Hello,

Lately I have been working on counting sequence fragments in larger sets of sequences.  I am searching for thousands of fragments of 30 to 130 bases in hundreds of thousands of sequences between 1200 and 1600 bases.  Currently I am using the following method to count the number of "hits":

#### start ####
library(Biostrings)
fragments <- DNAStringSet(c("ACTG","AAAA"))
sequence_set <- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))

for (i in 1:length(fragments)) {
	counts <- vcountPattern(fragments[[i]],
		sequence_set,
		max.mismatch=1)
	hits <- length(which(counts > 0))
	print(hits)
}
#### end ####

This method is taking a long time to complete, so I am wondering if I am doing this in the most efficient manner?  Does anyone have a suggestion for how I can accomplish the same task more efficiently?

Thanks!,
Erik

> sessionInfo()
R version 2.11.0 (2010-04-22) 
x86_64-apple-darwin9.8.0 

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Biostrings_2.16.0 IRanges_1.6.0    

loaded via a namespace (and not attached):
[1] Biobase_2.8.0

------------------------------

Message: 12
Date: Thu, 22 Jul 2010 12:19:21 -0400
From: Steve Lianoglou <mailinglist.honeypot at gmail.com>
To: Erik Wright <eswright at wisc.edu>
Cc: BioC list <bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] Biostrings - vcountPattern optimization
Message-ID:
	<AANLkTil5przSiPsdXNg8fSZyVci5rCdqn5ZgaHy8RSWA at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

On Thu, Jul 22, 2010 at 11:54 AM, Erik Wright <eswright at wisc.edu> wrote:
> Hello,
>
> Lately I have been working on counting sequence fragments in larger sets of sequences. ?I am searching for thousands of fragments of 30 to 130 bases in hundreds of thousands of sequences between 1200 and 1600 bases. ?Currently I am using the following method to count the number of "hits":

Would using bowtie as an intermediary be an option?

For instance, you could consider:

(i) making a bowtie-index out of your 1200-1600 bp "references"
(ii) aligning your 30-130bp fragments agains it and output to SAM
format (give each read a unique id so you can hunt for it later)
(iii) convert SAM -> indexed BAM
(iv) process bam file w/ Rsamtools -- perhaps you could simply do a
`table()` on the sequence IDs of each alignment if all you want is a
count -- of course now that the sequences are aligned, the data is in
"good shape" to do other types of analyses as well (whatever it is
that you're doing).

> #### start ####
> library(Biostrings)
> fragments <- DNAStringSet(c("ACTG","AAAA"))
> sequence_set <- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))
>
> for (i in 1:length(fragments)) {
> ? ? ? ?counts <- vcountPattern(fragments[[i]],
> ? ? ? ? ? ? ? ?sequence_set,
> ? ? ? ? ? ? ? ?max.mismatch=1)
> ? ? ? ?hits <- length(which(counts > 0))
> ? ? ? ?print(hits)
> }
> #### end ####
>
> This method is taking a long time to complete, so I am wondering if I am doing this in the most efficient manner? ?Does anyone have a suggestion for how I can accomplish the same task more efficiently?

I don't really have any suggestions to make the above R code run
faster ... sorry.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
?| Memorial Sloan-Kettering Cancer Center
?| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

------------------------------

Message: 13
Date: Thu, 22 Jul 2010 17:11:26 +0800
From: Gina Liao <yi713 at hotmail.com>
To: <bioconductor at stat.math.ethz.ch>
Subject: [BioC] problem about hgu133plus2 annotation
Message-ID: <BAY146-w70AE25532AD94D7BD9116EAA20 at phx.gbl>
Content-Type: text/plain

Dear All,
I have 20 chips, and I used R to standardize the CEL files.Then, i got an expression value data of all chips.And I also downloaded the annotation csv format from NetAffy.(HG-U133_Plus_2 Annotations, CSV format, Release 30 (22 MB, 11/15/09))
Here's my code. 
########test = justRMA()eset.st = standardise(test)
exprs.st = exprs(eset.st)e.out = exprs.stdim(e.out) #* 54675 20########
However, i found out that the order of the rownames(e.out) is a little different to the row name of hgu133plus2.csv. The order from 54630 to 54640 is not the same to these two rows. 
They should be the same,right? Is "hgu133plus2cdf" the problem? How could I solve it?
Thanks!!!!! 
Best,Gina 		 	   		  
_________________________________________________________________

	[[alternative HTML version deleted]]

------------------------------

Message: 14
Date: Thu, 22 Jul 2010 13:34:28 -0300
From: Elmer Fern?ndez <elmerfer at gmail.com>
To: Benjamin Otto <b.otto at uke.uni-hamburg.de>
Cc: Sean Davis <sdavis2 at mail.nih.gov>,	Bioconductor mailing list
	<bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
	function	gives different results than scaling outside!!!
Message-ID:
	<AANLkTindAgCqq5caPzkk6LtEYpEU4Kr0BymUe9SJU_jp at mail.gmail.com>
Content-Type: text/plain

Hy Benjamin
Are you sure about that? If so, I think that it is not correct, right?
best
Elmer

2010/7/22 Benjamin Otto <b.otto at uke.uni-hamburg.de>

> Hi Guys,
>
> do note that the scale() function in heatmap doesn't scale your values till
> AFTER clustering for visualization purpose! So if you provide already scaled
> data, you naturally will expect a different result.
>
> cheers
>
> Benjamin
>
> Am 22.07.2010 um 16:25 schrieb Bazeley, Peter:
>
> > Hi Elmer,
> >
> > The default scale option in heatmap.2 scales by row, whereas the scale()
> function scales by column, so this is probably why there is a difference. I
> think whichever dimension contains unique samples is how you want to scale
> (if this was expression data, for example).
> >
> >
> > Pete
> > ________________________________________
> > From: bioconductor-bounces at stat.math.ethz.ch [
> bioconductor-bounces at stat.math.ethz.ch] on behalf of Sean Davis [
> sdavis2 at mail.nih.gov]
> > Sent: Thursday, July 22, 2010 9:17 AM
> > To: Elmer Fernández
> > Cc: Bioconductor mailing list
> > Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the function
>       gives different results than scaling outside!!!
> >
> > 2010/7/22 Elmer Fernández <elmerfer at gmail.com>
> >
> >> Dear Users
> >> I'm working with the heatmap.2 function and I realize that if you use
> the
> >> scale input paramenter gives different results than usign the scale
> >> function
> >> outsie and feed the heatmap.2 fucntion with the scaled matrix. I
> attached
> >> the results of the two approaches and the used data matrix (M.csv).
> >> SO, what I'm doing wrong?
> >>
> >>
> > Hi, Elmer.
> >
> > The default distance function used by heatmap.2 is dist() which is not
> going
> > to be invariant under centering and scaling, I don't think.  It looks
> like
> > you are using that default.
> >
> > Sean
> >
> >
> >> R Code
> >>
> >> library(gplots)
> >> M=matrix(c(rnorm(10*3,1,2),rnorm(10*2,-0.5,1)),ncol=5)
> >> heatmap.2(M,scale="column",trace="none",main="scaled inside")
> >> x11();heatmap.2(scale(M),scale="none",trace="none",main="scaled
> outside")
> >>
> >>> sessionInfo()
> >> R version 2.10.0 (2009-10-26)
> >> x86_64-unknown-linux-gnu
> >>
> >> locale:
> >> [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C
> >> LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8
> >> [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8
> >> LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8
> >> [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8
> >> LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
> >>
> >> attached base packages:
> >> [1] grid      stats     graphics  grDevices utils     datasets  methods
> >> base
> >>
> >> other attached packages:
> >> [1] gplots_2.7.4   caTools_1.10   bitops_1.0-4.1 gdata_2.7.1
> >> gtools_2.6.1   rkward_0.5.1
> >>
> >> loaded via a namespace (and not attached):
> >> [1] tools_2.10.0
> >>
> >>
> >> --
> >> Elmer A. Fernández (Bioing. PhD)
> >> Investigador Asistente CONICET - Research Assistant CONICET
> >> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
> >> tel: +54-(0)351-4938000 int 145
> >> Fax: +54-(0)351-4938081
> >> web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> >> http://sites.google.com/site/biologicaldatamininggroup/Home/
> >> mail address: Camino Alta Gracia Km 7.1/2- Córdoba-5017-Argentina
> >>
> >>
> >>
> >> --
> >> Elmer A. Fernández (Bioing. PhD)
> >> Investigador Asistente CONICET - Research Assistant CONICET
> >> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
> >> tel: +54-(0)351-4938000 int 145
> >> Fax: +54-(0)351-4938081
> >> web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> >> http://sites.google.com/site/biologicaldatamininggroup/Home/
> >> mail address: Camino Alta Gracia Km 7.1/2- Córdoba-5017-Argentina
> >>
> >>       [[alternative HTML version deleted]]
> >>
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
> >        [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
> ___________________________________________
> Benjamin Otto, PhD
> University Medical Center Hamburg-Eppendorf
> Institute For Clinical Chemistry / Central Laboratories
> Campus Forschung N27
> Martinistr. 52,
> D-20246 Hamburg
>
> Tel.: +49 40 7410 51908
> Fax.: +49 40 7410 54971
> ___________________________________________
>
>
>
>
>
> --
> Pflichtangaben gemäß Gesetz über elektronische Handelsregister und
> Genossenschaftsregister sowie das Unternehmensregister (EHUG):
>
> Universitätsklinikum Hamburg-Eppendorf
> Körperschaft des öffentlichen Rechts
> Gerichtsstand: Hamburg
>
> Vorstandsmitglieder:
> Prof. Dr. Jörg F. Debatin (Vorsitzender)
> Dr. Alexander Kirstein
> Joachim Prölß
> Prof. Dr. Dr. Uwe Koch-Gromus
>

-- 
Elmer A. Fernández (Bioing. PhD)
Investigador Asistente CONICET - Research Assistant CONICET
Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
tel: +54-(0)351-4938000 int 145
Fax: +54-(0)351-4938081
web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
http://sites.google.com/site/biologicaldatamininggroup/Home/
mail address: Camino Alta Gracia Km 7.1/2- Córdoba-5017-Argentina

	[[alternative HTML version deleted]]

------------------------------

Message: 15
Date: Thu, 22 Jul 2010 09:38:19 -0700
From: Marc Carlson <mcarlson at fhcrc.org>
To: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] problem about hgu133plus2 annotation
Message-ID: <4C4873FB.5030207 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1

Hi Gina,

I am afraid it's a little hard to tell what is going on here.  For
example, I don't see sessionInfo() so it is hard to tell what you were
running.  And I only have enough code to wildly speculate about what you
were doing.  You might want to see our posting guide here:

http://www.bioconductor.org/docs/postingGuide.html

  Marc

On 07/22/2010 02:11 AM, Gina Liao wrote:
> Dear All,
> I have 20 chips, and I used R to standardize the CEL files.Then, i got an expression value data of all chips.And I also downloaded the annotation csv format from NetAffy.(HG-U133_Plus_2 Annotations, CSV format, Release 30 (22 MB, 11/15/09))
> Here's my code. 
> ########test = justRMA()eset.st = standardise(test)
> exprs.st = exprs(eset.st)e.out = exprs.stdim(e.out) #* 54675 20########
> However, i found out that the order of the rownames(e.out) is a little different to the row name of hgu133plus2.csv. The order from 54630 to 54640 is not the same to these two rows. 
> They should be the same,right? Is "hgu133plus2cdf" the problem? How could I solve it?
> Thanks!!!!! 
> Best,Gina 		 	   		  
> _________________________________________________________________
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

------------------------------

Message: 16
Date: Thu, 22 Jul 2010 12:41:42 -0400
From: "James W. MacDonald" <jmacdon at med.umich.edu>
To: Gina Liao <yi713 at hotmail.com>
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] problem about hgu133plus2 annotation
Message-ID: <4C4874C6.9090008 at med.umich.edu>
Content-Type: text/plain; charset="iso-8859-1"; format="flowed"

Hi Gina,

On 7/22/2010 5:11 AM, Gina Liao wrote:
>
> Dear All,
> I have 20 chips, and I used R to standardize the CEL files.Then, i got an expression value data of all chips.And I also downloaded the annotation csv format from NetAffy.(HG-U133_Plus_2 Annotations, CSV format, Release 30 (22 MB, 11/15/09))
> Here's my code.
> ########test = justRMA()eset.st = standardise(test)
> exprs.st = exprs(eset.st)e.out = exprs.stdim(e.out) #* 54675 20########
> However, i found out that the order of the rownames(e.out) is a little different to the row name of hgu133plus2.csv. The order from 54630 to 54640 is not the same to these two rows.
> They should be the same,right? Is "hgu133plus2cdf" the problem? How could I solve it?

I would recommend you use the annotation packages that are available 
from Bioconductor rather than downloading the annotation packages from 
Affymetrix. The BioC annotation packages contain the same information, 
but are designed to be easily used from within R, and you will find the 
.csv files you can get from Affy are not as user-friendly.

You can get the annotation package using biocLite():

biocLite("hgu133plus2.db")

Note that there is no reason to expect that the order of annotation data 
will be the same as the order of expression data. Re-ordering things is 
exceedingly simple in R, so this point is irrelevant.

Using the annotation packages will take some reading on your part, but 
once you get the hang of things, I think you will like how they work. 
You might start with

library(hgu133plus2.db)
?hgu133plus2.db

as well as

openVignette() and choose the AnnotationDbi vignette.

If you are interested in annotating the set of interesting genes from 
your experiment, you will want to look at the annaffy package, which 
will allow you to output both HTML and text files with your results and 
annotations for each gene.

In addition, you might want to look at the affycoretools package, which 
helps automate some of the steps required to annotate results. This 
package is also integrated with limma, so you can go straight from your 
linear model fits to output in one function call.

Best,

Jim

> Thanks!!!!!
> Best,Gina 		 	   		
> _________________________________________________________________
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 

------------------------------

Message: 17
Date: Thu, 22 Jul 2010 10:11:28 -0700
From: Patrick Aboyoun <paboyoun at fhcrc.org>
To: Erik Wright <eswright at wisc.edu>
Cc: BioC list <bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] Biostrings - vcountPattern optimization
Message-ID: <4C487BC0.6010309 at fhcrc.org>
Content-Type: text/plain; charset=windows-1252; format=flowed

Erik,
Have you tried vcountPDict? It will use an Aho - Corasick matching 
algorithm 
(http://en.wikipedia.org/wiki/Aho?Corasick_string_matching_algorithm) 
that is pretty fast, albeit memory intensive.

library(Biostrings)
fragments<- DNAStringSet(c("ACTG","AAAA"))
sequence_set<- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))
pdict<- PDict(fragments)
counts<- vcountPDict(pdict, sequence_set)

>  counts
      [,1] [,2]
[1,]    0    0
[2,]    0    0

>  sessionInfo()
R version 2.12.0 Under development (unstable) (2010-07-18 r52554)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.17.26 IRanges_1.7.13

loaded via a namespace (and not attached):
[1] Biobase_2.9.0 tools_2.12.0

Patrick

On 7/22/10 8:54 AM, Erik Wright wrote:
> Hello,
>
> Lately I have been working on counting sequence fragments in larger sets of sequences.  I am searching for thousands of fragments of 30 to 130 bases in hundreds of thousands of sequences between 1200 and 1600 bases.  Currently I am using the following method to count the number of "hits":
>
> #### start ####
> library(Biostrings)
> fragments<- DNAStringSet(c("ACTG","AAAA"))
> sequence_set<- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))
>
> for (i in 1:length(fragments)) {
> 	counts<- vcountPattern(fragments[[i]],
> 		sequence_set,
> 		max.mismatch=1)
> 	hits<- length(which(counts>  0))
> 	print(hits)
> }
> #### end ####
>
> This method is taking a long time to complete, so I am wondering if I am doing this in the most efficient manner?  Does anyone have a suggestion for how I can accomplish the same task more efficiently?
>
> Thanks!,
> Erik
>
>
>
>
>    
>> sessionInfo()
>>      
> R version 2.11.0 (2010-04-22)
> x86_64-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] Biostrings_2.16.0 IRanges_1.6.0
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.8.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

------------------------------

Message: 18
Date: Thu, 22 Jul 2010 10:26:48 -0700
From: Patrick Aboyoun <paboyoun at fhcrc.org>
To: "Coghlan, Avril" <A.Coghlan at ucc.ie>
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] feature request - pairwiseAlignment() in
	Biostrings
Message-ID: <4C487F58.1060305 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Avril,
I wont have time to extend pairwiseAlignment, but you are more then 
welcome to. It is written mainly in C with an R wrapper. You can grab it 
via svn at the URL

https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Biostrings

with username: readonly and password: readonly.

The particular files you'll want to look at are

https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Biostrings/src/align_pairwiseAlignment.c
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Biostrings/R/pairwiseAlignment.R

I can provide you with a code walkthrough if you like. Since I optimized 
the code for speed and memory usage, you may find it is easier to write 
your own C level function that will be used instead of the code I have 
since I don't keep enough information around to be able to select the 
top X alignments.

Cheers,

Patrick

On 7/22/10 1:54 AM, Coghlan, Avril wrote:
> Dear Patrick and Steve,
>
> I am wondering whether it would be possible to add an option to the
> pairwiseAlignment() function in Biostrings, so that it could print out:
> (i) all the top-scoring alignments for 2 sequences, if there are more
> than one equally scoring top-scoring alignments ?
> (ii) the top X top-scoring alignments for 2 sequences, where the user
> specifies the number X, and where the X alignments don't have to have
> equal scores, but are ordered by decreasing score ?
>
> I'm not sure if these options are easy to add, but would be very useful
> if you could add them.
>
> If you haven't time to do this, I would be willing to try to help add
> the features to the pairwiseAlignment() function, if you can point me
> towards the code.
>
> Kind Regards,
> Avril
>
> Avril Coghlan
> University College Cork
> Ireland
>
>
>
>
>

------------------------------

Message: 19
Date: Thu, 22 Jul 2010 12:32:39 -0500
From: Erik Wright <eswright at wisc.edu>
To: Patrick Aboyoun <paboyoun at fhcrc.org>
Cc: BioC list <bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] Biostrings - vcountPattern optimization
Message-ID: <FBDE47F7-A49A-4D50-93BB-0AE8D9097DA7 at wisc.edu>
Content-Type: text/plain; charset=windows-1252

Hi Patrick,

Thanks, this looks promising.  Three possible complications are:
(1)  The fragments are not all the same width.  Is this possible with Pdict?
(2)  I allow a variable number of mismatches based on each individual fragment's width.
(3)  The fragments sometimes include ambiguity letters (IUPAC extended letters).

A more accurate example would be:

#### start ####
fragments <- DNAStringSet(c("ACS","NCCAGAA")) # no indels
sequence_set <- DNAStringSet(c("ATAGCATACKACCA","GATTACGTACCADADATTACA") # variable widths
for (i in 1:length(fragments)) {
	counts <- vcountPattern(fragments[[i]],
		sequence_set,
		max.mismatch=floor(length(fragments[[i]])/5)) # variable mis-matches
	hits <- length(which(counts > 0))
	print(hits)
}
#### end ####

Do think it is possible to make this work Pdict for a speed improvement?

Thanks again!,
Erik

On Jul 22, 2010, at 12:11 PM, Patrick Aboyoun wrote:

> Erik,
> Have you tried vcountPDict? It will use an Aho - Corasick matching algorithm (http://en.wikipedia.org/wiki/Aho?Corasick_string_matching_algorithm) that is pretty fast, albeit memory intensive.
> 
> library(Biostrings)
> fragments<- DNAStringSet(c("ACTG","AAAA"))
> sequence_set<- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))
> pdict<- PDict(fragments)
> counts<- vcountPDict(pdict, sequence_set)
> 
>> counts
>     [,1] [,2]
> [1,]    0    0
> [2,]    0    0
> 
>> sessionInfo()
> R version 2.12.0 Under development (unstable) (2010-07-18 r52554)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] Biostrings_2.17.26 IRanges_1.7.13
> 
> loaded via a namespace (and not attached):
> [1] Biobase_2.9.0 tools_2.12.0
> 
> 
> 
> 
> Patrick
> 
> 
> On 7/22/10 8:54 AM, Erik Wright wrote:
>> Hello,
>> 
>> Lately I have been working on counting sequence fragments in larger sets of sequences.  I am searching for thousands of fragments of 30 to 130 bases in hundreds of thousands of sequences between 1200 and 1600 bases.  Currently I am using the following method to count the number of "hits":
>> 
>> #### start ####
>> library(Biostrings)
>> fragments<- DNAStringSet(c("ACTG","AAAA"))
>> sequence_set<- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))
>> 
>> for (i in 1:length(fragments)) {
>> 	counts<- vcountPattern(fragments[[i]],
>> 		sequence_set,
>> 		max.mismatch=1)
>> 	hits<- length(which(counts>  0))
>> 	print(hits)
>> }
>> #### end ####
>> 
>> This method is taking a long time to complete, so I am wondering if I am doing this in the most efficient manner?  Does anyone have a suggestion for how I can accomplish the same task more efficiently?
>> 
>> Thanks!,
>> Erik
>> 
>> 
>> 
>> 
>>   
>>> sessionInfo()
>>>     
>> R version 2.11.0 (2010-04-22)
>> x86_64-apple-darwin9.8.0
>> 
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>> 
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>> 
>> other attached packages:
>> [1] Biostrings_2.16.0 IRanges_1.6.0
>> 
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.8.0
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>   
> 

------------------------------

Message: 20
Date: Thu, 22 Jul 2010 11:10:03 -0700
From: Michael Lawrence <lawrence.michael at gene.com>
To: Patrick Aboyoun <paboyoun at fhcrc.org>
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] feature request - pairwiseAlignment() in
	Biostrings
Message-ID:
	<AANLkTik92Of_5a3jHph8P_bWALPMT9yq2bRVsCD7B2Lz at mail.gmail.com>
Content-Type: text/plain

Toughest question is probably not how to modify the C code, but how the
results will be represented and manipulated in R.

Good luck

On Thu, Jul 22, 2010 at 10:26 AM, Patrick Aboyoun <paboyoun at fhcrc.org>wrote:

> Avril,
> I wont have time to extend pairwiseAlignment, but you are more then welcome
> to. It is written mainly in C with an R wrapper. You can grab it via svn at
> the URL
>
> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Biostrings
>
> with username: readonly and password: readonly.
>
> The particular files you'll want to look at are
>
>
> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Biostrings/src/align_pairwiseAlignment.c
>
> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Biostrings/R/pairwiseAlignment.R
>
> I can provide you with a code walkthrough if you like. Since I optimized
> the code for speed and memory usage, you may find it is easier to write your
> own C level function that will be used instead of the code I have since I
> don't keep enough information around to be able to select the top X
> alignments.
>
>
> Cheers,
>
> Patrick
>
>
>
> On 7/22/10 1:54 AM, Coghlan, Avril wrote:
>
>> Dear Patrick and Steve,
>>
>> I am wondering whether it would be possible to add an option to the
>> pairwiseAlignment() function in Biostrings, so that it could print out:
>> (i) all the top-scoring alignments for 2 sequences, if there are more
>> than one equally scoring top-scoring alignments ?
>> (ii) the top X top-scoring alignments for 2 sequences, where the user
>> specifies the number X, and where the X alignments don't have to have
>> equal scores, but are ordered by decreasing score ?
>>
>> I'm not sure if these options are easy to add, but would be very useful
>> if you could add them.
>>
>> If you haven't time to do this, I would be willing to try to help add
>> the features to the pairwiseAlignment() function, if you can point me
>> towards the code.
>>
>> Kind Regards,
>> Avril
>>
>> Avril Coghlan
>> University College Cork
>> Ireland
>>
>>
>>
>>
>>
>>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

	[[alternative HTML version deleted]]

------------------------------

Message: 21
Date: Thu, 22 Jul 2010 16:04:06 -0400
From: Steve Lianoglou <mailinglist.honeypot at gmail.com>
To: Elmer Fern?ndez <elmerfer at gmail.com>
Cc: Sean Davis <sdavis2 at mail.nih.gov>,	Bioconductor mailing list
	<bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
	function	gives different results than scaling outside!!!
Message-ID:
	<AANLkTikAPe0F5juvyE1TiOIlCFI_INWnha6vmN5DRhOK at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

2010/7/22 Elmer Fern?ndez <elmerfer at gmail.com>:
> Hy Benjamin
> Are you sure about that?

Looking at the source code for heatmap.2 (and heatmap, for that
matter) it looks as if Benjamin is correct. The scaling is done after
the clustering.

> If so, I think that it is not correct, right?

I guess it depends on what you were expecting it to do :-)

Having just realized this myself (yikes -- see what happens when we
assume(?)), I think I'd more often rather send in a scaled version of
the data and have scale='none' in the heatmap call, to be honest.

-steve

> best
> Elmer
>
> 2010/7/22 Benjamin Otto <b.otto at uke.uni-hamburg.de>
>
>> Hi Guys,
>>
>> do note that the scale() function in heatmap doesn't scale your values till
>> AFTER clustering for visualization purpose! So if you provide already scaled
>> data, you naturally will expect a different result.
>>
>> cheers
>>
>> Benjamin
>>
>> Am 22.07.2010 um 16:25 schrieb Bazeley, Peter:
>>
>> > Hi Elmer,
>> >
>> > The default scale option in heatmap.2 scales by row, whereas the scale()
>> function scales by column, so this is probably why there is a difference. I
>> think whichever dimension contains unique samples is how you want to scale
>> (if this was expression data, for example).
>> >
>> >
>> > Pete
>> > ________________________________________
>> > From: bioconductor-bounces at stat.math.ethz.ch [
>> bioconductor-bounces at stat.math.ethz.ch] on behalf of Sean Davis [
>> sdavis2 at mail.nih.gov]
>> > Sent: Thursday, July 22, 2010 9:17 AM
>> > To: Elmer Fern?ndez
>> > Cc: Bioconductor mailing list
>> > Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the function
>> ? ? ? gives different results than scaling outside!!!
>> >
>> > 2010/7/22 Elmer Fern?ndez <elmerfer at gmail.com>
>> >
>> >> Dear Users
>> >> I'm working with the heatmap.2 function and I realize that if you use
>> the
>> >> scale input paramenter gives different results than usign the scale
>> >> function
>> >> outsie and feed the heatmap.2 fucntion with the scaled matrix. I
>> attached
>> >> the results of the two approaches and the used data matrix (M.csv).
>> >> SO, what I'm doing wrong?
>> >>
>> >>
>> > Hi, Elmer.
>> >
>> > The default distance function used by heatmap.2 is dist() which is not
>> going
>> > to be invariant under centering and scaling, I don't think. ?It looks
>> like
>> > you are using that default.
>> >
>> > Sean
>> >
>> >
>> >> R Code
>> >>
>> >> library(gplots)
>> >> M=matrix(c(rnorm(10*3,1,2),rnorm(10*2,-0.5,1)),ncol=5)
>> >> heatmap.2(M,scale="column",trace="none",main="scaled inside")
>> >> x11();heatmap.2(scale(M),scale="none",trace="none",main="scaled
>> outside")
>> >>
>> >>> sessionInfo()
>> >> R version 2.10.0 (2009-10-26)
>> >> x86_64-unknown-linux-gnu
>> >>
>> >> locale:
>> >> [1] LC_CTYPE=en_US.UTF-8 ? ? ? ? ?LC_NUMERIC=C
>> >> LC_TIME=en_US.UTF-8 ? ? ? ? ? LC_COLLATE=en_US.UTF-8
>> >> [5] LC_MONETARY=en_US.UTF-8 ? ? ? LC_MESSAGES=en_US.UTF-8
>> >> LC_PAPER=en_US.UTF-8 ? ? ? ? ?LC_NAME=en_US.UTF-8
>> >> [9] LC_ADDRESS=en_US.UTF-8 ? ? ? ?LC_TELEPHONE=en_US.UTF-8
>> >> LC_MEASUREMENT=en_US.UTF-8 ? ?LC_IDENTIFICATION=en_US.UTF-8
>> >>
>> >> attached base packages:
>> >> [1] grid ? ? ?stats ? ? graphics ?grDevices utils ? ? datasets ?methods
>> >> base
>> >>
>> >> other attached packages:
>> >> [1] gplots_2.7.4 ? caTools_1.10 ? bitops_1.0-4.1 gdata_2.7.1
>> >> gtools_2.6.1 ? rkward_0.5.1
>> >>
>> >> loaded via a namespace (and not attached):
>> >> [1] tools_2.10.0
>> >>
>> >>
>> >> --
>> >> Elmer A. Fern?ndez (Bioing. PhD)
>> >> Investigador Asistente CONICET - Research Assistant CONICET
>> >> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
>> >> tel: +54-(0)351-4938000 int 145
>> >> Fax: +54-(0)351-4938081
>> >> web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
>> >> http://sites.google.com/site/biologicaldatamininggroup/Home/
>> >> mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
>> >>
>> >>
>> >>
>> >> --
>> >> Elmer A. Fern?ndez (Bioing. PhD)
>> >> Investigador Asistente CONICET - Research Assistant CONICET
>> >> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
>> >> tel: +54-(0)351-4938000 int 145
>> >> Fax: +54-(0)351-4938081
>> >> web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
>> >> http://sites.google.com/site/biologicaldatamininggroup/Home/
>> >> mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
>> >>
>> >> ? ? ? [[alternative HTML version deleted]]
>> >>
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor at stat.math.ethz.ch
>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> Search the archives:
>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>
>> >
>> > ? ? ? ?[[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at stat.math.ethz.ch
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>>
>> ___________________________________________
>> Benjamin Otto, PhD
>> University Medical Center Hamburg-Eppendorf
>> Institute For Clinical Chemistry / Central Laboratories
>> Campus Forschung N27
>> Martinistr. 52,
>> D-20246 Hamburg
>>
>> Tel.: +49 40 7410 51908
>> Fax.: +49 40 7410 54971
>> ___________________________________________
>>
>>
>>
>>
>>
>> --
>> Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und
>> Genossenschaftsregister sowie das Unternehmensregister (EHUG):
>>
>> Universit?tsklinikum Hamburg-Eppendorf
>> K?rperschaft des ?ffentlichen Rechts
>> Gerichtsstand: Hamburg
>>
>> Vorstandsmitglieder:
>> Prof. Dr. J?rg F. Debatin (Vorsitzender)
>> Dr. Alexander Kirstein
>> Joachim Pr?l?
>> Prof. Dr. Dr. Uwe Koch-Gromus
>>
>
>
>
> --
> Elmer A. Fern?ndez (Bioing. PhD)
> Investigador Asistente CONICET - Research Assistant CONICET
> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
> tel: +54-(0)351-4938000 int 145
> Fax: +54-(0)351-4938081
> web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> http://sites.google.com/site/biologicaldatamininggroup/Home/
> mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
>
> ? ? ? ?[[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
?| Memorial Sloan-Kettering Cancer Center
?| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

------------------------------

Message: 22
Date: Thu, 22 Jul 2010 13:14:34 -0700
From: Herv? Pag?s <hpages at fhcrc.org>
To: Erik Wright <eswright at wisc.edu>
Cc: BioC list <bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] Biostrings - vcountPattern optimization
Message-ID: <4C48A6AA.2050407 at fhcrc.org>
Content-Type: text/plain; charset=windows-1252; format=flowed

Hi Erik,

On 07/22/2010 10:32 AM, Erik Wright wrote:
> Hi Patrick,
>
> Thanks, this looks promising.  Three possible complications are:
> (1)  The fragments are not all the same width.  Is this possible with Pdict?

Yes, but given requirement (2), you need another solution.

> (2)  I allow a variable number of mismatches based on each individual fragment's width.

So given (1) and (2), you could group your fragments by equal length,
make a PDict object for each group, and use a single number of
mismatches for that group (seems like this number only depends on
the length of the fragment).

> (3)  The fragments sometimes include ambiguity letters (IUPAC extended letters).

Unfortunately ambiguities are supported only in the subject at the
moment. But you could still treat them separately with vcountPattern()
in a loop.

>
> A more accurate example would be:
>
> #### start ####
> fragments<- DNAStringSet(c("ACS","NCCAGAA")) # no indels
> sequence_set<- DNAStringSet(c("ATAGCATACKACCA","GATTACGTACCADADATTACA") # variable widths
> for (i in 1:length(fragments)) {
> 	counts<- vcountPattern(fragments[[i]],
> 		sequence_set,
> 		max.mismatch=floor(length(fragments[[i]])/5)) # variable mis-matches
> 	hits<- length(which(counts>  0))
> 	print(hits)
> }
> #### end ####
>
> Do think it is possible to make this work Pdict for a speed improvement?

With max.mismatch being a fifth of the fragment length that means it
will be between 6 (for 30bp fragments) and 26 (for 130bp fragments).
Unfortunately, that's way too many mismatches PDict()/vcountPDict()
can handle.

Cheers,
H.

>
> Thanks again!,
> Erik
>
>
>
> On Jul 22, 2010, at 12:11 PM, Patrick Aboyoun wrote:
>
>> Erik,
>> Have you tried vcountPDict? It will use an Aho - Corasick matching algorithm (http://en.wikipedia.org/wiki/Aho?Corasick_string_matching_algorithm) that is pretty fast, albeit memory intensive.
>>
>> library(Biostrings)
>> fragments<- DNAStringSet(c("ACTG","AAAA"))
>> sequence_set<- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))
>> pdict<- PDict(fragments)
>> counts<- vcountPDict(pdict, sequence_set)
>>
>>> counts
>>      [,1] [,2]
>> [1,]    0    0
>> [2,]    0    0
>>
>>> sessionInfo()
>> R version 2.12.0 Under development (unstable) (2010-07-18 r52554)
>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] Biostrings_2.17.26 IRanges_1.7.13
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.9.0 tools_2.12.0
>>
>>
>>
>>
>> Patrick
>>
>>
>> On 7/22/10 8:54 AM, Erik Wright wrote:
>>> Hello,
>>>
>>> Lately I have been working on counting sequence fragments in larger sets of sequences.  I am searching for thousands of fragments of 30 to 130 bases in hundreds of thousands of sequences between 1200 and 1600 bases.  Currently I am using the following method to count the number of "hits":
>>>
>>> #### start ####
>>> library(Biostrings)
>>> fragments<- DNAStringSet(c("ACTG","AAAA"))
>>> sequence_set<- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))
>>>
>>> for (i in 1:length(fragments)) {
>>> 	counts<- vcountPattern(fragments[[i]],
>>> 		sequence_set,
>>> 		max.mismatch=1)
>>> 	hits<- length(which(counts>   0))
>>> 	print(hits)
>>> }
>>> #### end ####
>>>
>>> This method is taking a long time to complete, so I am wondering if I am doing this in the most efficient manner?  Does anyone have a suggestion for how I can accomplish the same task more efficiently?
>>>
>>> Thanks!,
>>> Erik
>>>
>>>
>>>
>>>
>>>
>>>> sessionInfo()
>>>>
>>> R version 2.11.0 (2010-04-22)
>>> x86_64-apple-darwin9.8.0
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] Biostrings_2.16.0 IRanges_1.6.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] Biobase_2.8.0
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

------------------------------

Message: 23
Date: Thu, 22 Jul 2010 17:14:42 -0300
From: Elmer Fern?ndez <elmerfer at gmail.com>
To: Steve Lianoglou <mailinglist.honeypot at gmail.com>
Cc: Sean Davis <sdavis2 at mail.nih.gov>,	Bioconductor mailing list
	<bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
	function	gives different results than scaling outside!!!
Message-ID:
	<AANLkTikLBy_BJnUdymD7aAkEzc0Yl3WscaGrQS7KJnT8 at mail.gmail.com>
Content-Type: text/plain

Dear Steve
You are right when you say that you should scale your data according to what
do you want to do, but from the help it is not clear when the scaling is
done. In most of the R functions, when the scale parameter is present in the
input you assume that the scaling process is permormed BEFORE the main
process. That's why I said that it could not be correct.
Dear guys, THANKS for the discussion!! I'll really appreciated and enjoyed.

Best
Elmer

2010/7/22 Steve Lianoglou <mailinglist.honeypot at gmail.com>

> Hi,
>
> 2010/7/22 Elmer Fernández <elmerfer at gmail.com>:
> > Hy Benjamin
> > Are you sure about that?
>
> Looking at the source code for heatmap.2 (and heatmap, for that
> matter) it looks as if Benjamin is correct. The scaling is done after
> the clustering.
>
> > If so, I think that it is not correct, right?
>
> I guess it depends on what you were expecting it to do :-)
>
> Having just realized this myself (yikes -- see what happens when we
> assume(?)), I think I'd more often rather send in a scaled version of
> the data and have scale='none' in the heatmap call, to be honest.
>
> -steve
>
> > best
> > Elmer
> >
> > 2010/7/22 Benjamin Otto <b.otto at uke.uni-hamburg.de>
> >
> >> Hi Guys,
> >>
> >> do note that the scale() function in heatmap doesn't scale your values
> till
> >> AFTER clustering for visualization purpose! So if you provide already
> scaled
> >> data, you naturally will expect a different result.
> >>
> >> cheers
> >>
> >> Benjamin
> >>
> >> Am 22.07.2010 um 16:25 schrieb Bazeley, Peter:
> >>
> >> > Hi Elmer,
> >> >
> >> > The default scale option in heatmap.2 scales by row, whereas the
> scale()
> >> function scales by column, so this is probably why there is a
> difference. I
> >> think whichever dimension contains unique samples is how you want to
> scale
> >> (if this was expression data, for example).
> >> >
> >> >
> >> > Pete
> >> > ________________________________________
> >> > From: bioconductor-bounces at stat.math.ethz.ch [
> >> bioconductor-bounces at stat.math.ethz.ch] on behalf of Sean Davis [
> >> sdavis2 at mail.nih.gov]
> >> > Sent: Thursday, July 22, 2010 9:17 AM
> >> > To: Elmer Fernández
> >> > Cc: Bioconductor mailing list
> >> > Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
> function
> >>       gives different results than scaling outside!!!
> >> >
> >> > 2010/7/22 Elmer Fernández <elmerfer at gmail.com>
> >> >
> >> >> Dear Users
> >> >> I'm working with the heatmap.2 function and I realize that if you use
> >> the
> >> >> scale input paramenter gives different results than usign the scale
> >> >> function
> >> >> outsie and feed the heatmap.2 fucntion with the scaled matrix. I
> >> attached
> >> >> the results of the two approaches and the used data matrix (M.csv).
> >> >> SO, what I'm doing wrong?
> >> >>
> >> >>
> >> > Hi, Elmer.
> >> >
> >> > The default distance function used by heatmap.2 is dist() which is not
> >> going
> >> > to be invariant under centering and scaling, I don't think.  It looks
> >> like
> >> > you are using that default.
> >> >
> >> > Sean
> >> >
> >> >
> >> >> R Code
> >> >>
> >> >> library(gplots)
> >> >> M=matrix(c(rnorm(10*3,1,2),rnorm(10*2,-0.5,1)),ncol=5)
> >> >> heatmap.2(M,scale="column",trace="none",main="scaled inside")
> >> >> x11();heatmap.2(scale(M),scale="none",trace="none",main="scaled
> >> outside")
> >> >>
> >> >>> sessionInfo()
> >> >> R version 2.10.0 (2009-10-26)
> >> >> x86_64-unknown-linux-gnu
> >> >>
> >> >> locale:
> >> >> [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C
> >> >> LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8
> >> >> [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8
> >> >> LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8
> >> >> [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8
> >> >> LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
> >> >>
> >> >> attached base packages:
> >> >> [1] grid      stats     graphics  grDevices utils     datasets
>  methods
> >> >> base
> >> >>
> >> >> other attached packages:
> >> >> [1] gplots_2.7.4   caTools_1.10   bitops_1.0-4.1 gdata_2.7.1
> >> >> gtools_2.6.1   rkward_0.5.1
> >> >>
> >> >> loaded via a namespace (and not attached):
> >> >> [1] tools_2.10.0
> >> >>
> >> >>
> >> >> --
> >> >> Elmer A. Fernández (Bioing. PhD)
> >> >> Investigador Asistente CONICET - Research Assistant CONICET
> >> >> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @
> UCC
> >> >> tel: +54-(0)351-4938000 int 145
> >> >> Fax: +54-(0)351-4938081
> >> >> web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> >> >> http://sites.google.com/site/biologicaldatamininggroup/Home/
> >> >> mail address: Camino Alta Gracia Km 7.1/2- Córdoba-5017-Argentina
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Elmer A. Fernández (Bioing. PhD)
> >> >> Investigador Asistente CONICET - Research Assistant CONICET
> >> >> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @
> UCC
> >> >> tel: +54-(0)351-4938000 int 145
> >> >> Fax: +54-(0)351-4938081
> >> >> web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> >> >> http://sites.google.com/site/biologicaldatamininggroup/Home/
> >> >> mail address: Camino Alta Gracia Km 7.1/2- Córdoba-5017-Argentina
> >> >>
> >> >>       [[alternative HTML version deleted]]
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> Bioconductor mailing list
> >> >> Bioconductor at stat.math.ethz.ch
> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> >> Search the archives:
> >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >> >>
> >> >
> >> >        [[alternative HTML version deleted]]
> >> >
> >> > _______________________________________________
> >> > Bioconductor mailing list
> >> > Bioconductor at stat.math.ethz.ch
> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> > Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >> >
> >>
> >> ___________________________________________
> >> Benjamin Otto, PhD
> >> University Medical Center Hamburg-Eppendorf
> >> Institute For Clinical Chemistry / Central Laboratories
> >> Campus Forschung N27
> >> Martinistr. 52,
> >> D-20246 Hamburg
> >>
> >> Tel.: +49 40 7410 51908
> >> Fax.: +49 40 7410 54971
> >> ___________________________________________
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Pflichtangaben gemäß Gesetz über elektronische Handelsregister und
> >> Genossenschaftsregister sowie das Unternehmensregister (EHUG):
> >>
> >> Universitätsklinikum Hamburg-Eppendorf
> >> Körperschaft des öffentlichen Rechts
> >> Gerichtsstand: Hamburg
> >>
> >> Vorstandsmitglieder:
> >> Prof. Dr. Jörg F. Debatin (Vorsitzender)
> >> Dr. Alexander Kirstein
> >> Joachim Prölß
> >> Prof. Dr. Dr. Uwe Koch-Gromus
> >>
> >
> >
> >
> > --
> > Elmer A. Fernández (Bioing. PhD)
> > Investigador Asistente CONICET - Research Assistant CONICET
> > Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
> > tel: +54-(0)351-4938000 int 145
> > Fax: +54-(0)351-4938081
> > web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> > http://sites.google.com/site/biologicaldatamininggroup/Home/
> > mail address: Camino Alta Gracia Km 7.1/2- Córdoba-5017-Argentina
> >
> >        [[alternative HTML version deleted]]
> >
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
>
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact<http://cbio.mskcc.org/%7Elianos/contact>
>

-- 
Elmer A. Fernández (Bioing. PhD)
Investigador Asistente CONICET - Research Assistant CONICET
Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
tel: +54-(0)351-4938000 int 145
Fax: +54-(0)351-4938081
web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
http://sites.google.com/site/biologicaldatamininggroup/Home/
mail address: Camino Alta Gracia Km 7.1/2- Córdoba-5017-Argentina

	[[alternative HTML version deleted]]

------------------------------

Message: 24
Date: Thu, 22 Jul 2010 15:00:56 -0600
From: Sean Davis <sdavis2 at mail.nih.gov>
To: Elmer Fern?ndez <elmerfer at gmail.com>
Cc: Bioconductor mailing list <bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
	function	gives different results than scaling outside!!!
Message-ID:
	<AANLkTimX3yUbPv2NsYJCRypaxr4Zon5wFknZQbS5TR0I at mail.gmail.com>
Content-Type: text/plain

2010/7/22 Elmer FernÃ¡ndez <elmerfer at gmail.com>

> Hy Benjamin
> Are you sure about that? If so, I think that it is not correct, right?
> best
> Elmer
>

Hi, Elmer.  My reading of the source code for heatmap.2 suggests that
Benjamin is correct.

Sean

>
> 2010/7/22 Benjamin Otto <b.otto at uke.uni-hamburg.de>
>
> > Hi Guys,
> >
> > do note that the scale() function in heatmap doesn't scale your values
> till
> > AFTER clustering for visualization purpose! So if you provide already
> scaled
> > data, you naturally will expect a different result.
> >
> > cheers
> >
> > Benjamin
> >
> > Am 22.07.2010 um 16:25 schrieb Bazeley, Peter:
> >
> > > Hi Elmer,
> > >
> > > The default scale option in heatmap.2 scales by row, whereas the
> scale()
> > function scales by column, so this is probably why there is a difference.
> I
> > think whichever dimension contains unique samples is how you want to
> scale
> > (if this was expression data, for example).
> > >
> > >
> > > Pete
> > > ________________________________________
> > > From: bioconductor-bounces at stat.math.ethz.ch [
> > bioconductor-bounces at stat.math.ethz.ch] on behalf of Sean Davis [
> > sdavis2 at mail.nih.gov]
> > > Sent: Thursday, July 22, 2010 9:17 AM
> > > To: Elmer FernÃ¡ndez
> > > Cc: Bioconductor mailing list
> > > Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
> function
> >       gives different results than scaling outside!!!
> > >
> > > 2010/7/22 Elmer FernÃ¡ndez <elmerfer at gmail.com>
> > >
> > >> Dear Users
> > >> I'm working with the heatmap.2 function and I realize that if you use
> > the
> > >> scale input paramenter gives different results than usign the scale
> > >> function
> > >> outsie and feed the heatmap.2 fucntion with the scaled matrix. I
> > attached
> > >> the results of the two approaches and the used data matrix (M.csv).
> > >> SO, what I'm doing wrong?
> > >>
> > >>
> > > Hi, Elmer.
> > >
> > > The default distance function used by heatmap.2 is dist() which is not
> > going
> > > to be invariant under centering and scaling, I don't think.  It looks
> > like
> > > you are using that default.
> > >
> > > Sean
> > >
> > >
> > >> R Code
> > >>
> > >> library(gplots)
> > >> M=matrix(c(rnorm(10*3,1,2),rnorm(10*2,-0.5,1)),ncol=5)
> > >> heatmap.2(M,scale="column",trace="none",main="scaled inside")
> > >> x11();heatmap.2(scale(M),scale="none",trace="none",main="scaled
> > outside")
> > >>
> > >>> sessionInfo()
> > >> R version 2.10.0 (2009-10-26)
> > >> x86_64-unknown-linux-gnu
> > >>
> > >> locale:
> > >> [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C
> > >> LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8
> > >> [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8
> > >> LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8
> > >> [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8
> > >> LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
> > >>
> > >> attached base packages:
> > >> [1] grid      stats     graphics  grDevices utils     datasets
>  methods
> > >> base
> > >>
> > >> other attached packages:
> > >> [1] gplots_2.7.4   caTools_1.10   bitops_1.0-4.1 gdata_2.7.1
> > >> gtools_2.6.1   rkward_0.5.1
> > >>
> > >> loaded via a namespace (and not attached):
> > >> [1] tools_2.10.0
> > >>
> > >>
> > >> --
> > >> Elmer A. FernÃ¡ndez (Bioing. PhD)
> > >> Investigador Asistente CONICET - Research Assistant CONICET
> > >> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @
> UCC
> > >> tel: +54-(0)351-4938000 int 145
> > >> Fax: +54-(0)351-4938081
> > >> web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> > >> http://sites.google.com/site/biologicaldatamininggroup/Home/
> > >> mail address: Camino Alta Gracia Km 7.1/2- CÃ³rdoba-5017-Argentina
> > >>
> > >>
> > >>
> > >> --
> > >> Elmer A. FernÃ¡ndez (Bioing. PhD)
> > >> Investigador Asistente CONICET - Research Assistant CONICET
> > >> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @
> UCC
> > >> tel: +54-(0)351-4938000 int 145
> > >> Fax: +54-(0)351-4938081
> > >> web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> > >> http://sites.google.com/site/biologicaldatamininggroup/Home/
> > >> mail address: Camino Alta Gracia Km 7.1/2- CÃ³rdoba-5017-Argentina
> > >>
> > >>       [[alternative HTML version deleted]]
> > >>
> > >>
> > >> _______________________________________________
> > >> Bioconductor mailing list
> > >> Bioconductor at stat.math.ethz.ch
> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> > >> Search the archives:
> > >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >>
> > >
> > >        [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at stat.math.ethz.ch
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >
> >
> > ___________________________________________
> > Benjamin Otto, PhD
> > University Medical Center Hamburg-Eppendorf
> > Institute For Clinical Chemistry / Central Laboratories
> > Campus Forschung N27
> > Martinistr. 52,
> > D-20246 Hamburg
> >
> > Tel.: +49 40 7410 51908
> > Fax.: +49 40 7410 54971
> > ___________________________________________
> >
> >
> >
> >
> >
> > --
> > Pflichtangaben gemÃ¤ÃŸ Gesetz Ã¼ber elektronische Handelsregister und
> > Genossenschaftsregister sowie das Unternehmensregister (EHUG):
> >
> > UniversitÃ¤tsklinikum Hamburg-Eppendorf
> > KÃ¶rperschaft des Ã¶ffentlichen Rechts
> > Gerichtsstand: Hamburg
> >
> > Vorstandsmitglieder:
> > Prof. Dr. JÃ¶rg F. Debatin (Vorsitzender)
> > Dr. Alexander Kirstein
> > Joachim PrÃ¶lÃŸ
> > Prof. Dr. Dr. Uwe Koch-Gromus
> >
>
>
>
> --
> Elmer A. FernÃ¡ndez (Bioing. PhD)
> Investigador Asistente CONICET - Research Assistant CONICET
> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @ UCC
> tel: +54-(0)351-4938000 int 145
> Fax: +54-(0)351-4938081
> web page : http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> http://sites.google.com/site/biologicaldatamininggroup/Home/
> mail address: Camino Alta Gracia Km 7.1/2- CÃ³rdoba-5017-Argentina
>
>        [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

	[[alternative HTML version deleted]]

------------------------------

Message: 25
Date: Fri, 23 Jul 2010 09:13:56 +1000 (AUS Eastern Standard Time)
From: Gordon K Smyth <smyth at wehi.EDU.AU>
To: HuW at mskcc.org
Cc: Bioconductor mailing list <bioconductor at stat.math.ethz.ch>
Subject: [BioC]  the design matrix again
Message-ID: <Pine.WNT.4.64.1007230912030.2728 at PC602.alpha.wehi.edu.au>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

Looks correct.

Gordon

> Date: Tue, 20 Jul 2010 17:44:07 -0400
> From: HuW at mskcc.org
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] the design matrix again
>
>
> Hi everyone,
>
> I know my question is answered in some extent on mail list. But I am 
> still not feel very confidence about my design. I really appreciate if 
> anyone can help me on this.
>
> the data set is about the patients before and after treatment. for 
> example, for 3 patients. I want to find out the genes that changed 
> expression before and after treatment. if I have 3 patients, I did like 
> this:
>
>> design
>  patient1 patient2 patient3 treatment14
> 1        1        0        0           0
> 2        0        1        0           0
> 3        0        0        1           0
> 4        1        0        0           1
> 5        0        1        0           1
> 6        0        0        1           1
> attr(,"assign")
> [1] 1 1 1 2
> attr(,"contrasts")
> attr(,"contrasts")$patient
> [1] "contr.treatment"
>
> attr(,"contrasts")$treatment
> [1] "contr.treatment"
>
>> eset.rma.fit = lmFit(eset.rma, design);
>> eset.rma.bayes = eBayes(eset.rma.fit);
>> topTable(eset4.rma.bayes, coef = "treatment14", adjust = "BH");
>
> thank you very much.
>
> Wenhuo Hu

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}

------------------------------

Message: 26
Date: Thu, 22 Jul 2010 16:23:53 -0700
From: Thomas Girke <thomas.girke at ucr.edu>
To: Bioconductor mailing list <bioconductor at stat.math.ethz.ch>,
	bioc-sig-sequencing at stat.math.ethz.ch
Subject: [BioC] Open Postdoc Positions
Message-ID: <20100722232353.GA18501 at biocluster.ucr.edu>
Content-Type: text/plain; charset=us-ascii

Dear List Members,

There are currently two open postdoc positions in my group with secured
funding for 3-4 years. One position is in the area of next generation
sequencing and the other one in the chemical informatics field related to
chemical genomics and drug discovery. Both positions will involve a combination
of software development and data analysis/mining tasks. Ideal candidates should
have a strong background in computer sciences and scientific data analysis, and
should be proficient in at least two of the following programming languages:
C/C++, Python and R. Experience with web and database programming is also
beneficial, especially with Python/Django and MySQL/PostgreSQL, respectively. 

To apply, please email your CV with a detailed description of your professional 
skills to thomas.girke at ucr.edu.

Thomas

--
Thomas Girke
Associate Professor of Bioinformatics
Director, IIGB Bioinformatic Facility
Institute for Integrative Genome Biology (IIGB)
1207F Genomics Building
University of California
Riverside, CA 92521

E-mail: thomas.girke at ucr.edu
Personal Site: http://girke.bioinformatics.ucr.edu
Ph: 951-905-5232
Fax: 951-827-5155

------------------------------

Message: 27
Date: Fri, 23 Jul 2010 09:38:50 +0100
From: Heidi Dvinge <heidi at ebi.ac.uk>
To: David martin <vilanew at gmail.com>
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] htQPCR
Message-ID: <C49C2983-A056-4DB2-B43C-1D35F91A194E at ebi.ac.uk>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed

Hello David,

Thanks for the feedback on HTqPCR. I've never really thought of  
filtering out samples during my own analysis, hence no option in  
filterCtData. The default way is by doing subsetting, such as qPCRset 
[,c(1:3,5)], or by using sample names as you do in your example.  
However, I guess a specific filtering option might also be useful in  
other cases, such as potentially removing samples that have a high  
proportion of NA values and can therefore be considered failed plates/ 
samples.

I'll put it on the todo list of HTqPCR improvements.

CHeers
\Heidi

On 22 Jul 2010, at 10:47, David martin wrote:

> Hello,
> I would like to suggest a filtering method based on sample name.  
> FilterCTdata contains a lot of filtering methods but didn't see any  
> to filter based on sample names,
>
> Actually i use the match function do remove samples from the analysis.
>
> e.g
> tofilter=c("sample1","sample2",...)
> exprs(qpcrObj)[,-match(tofilter,colnames(exprs(qpcrObj)))]
>
> thanks,
> david
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/ 
> gmane.science.biology.informatics.conductor

------------------------------

Message: 28
Date: Fri, 23 Jul 2010 10:11:47 +0100
From: Heidi Dvinge <heidi at ebi.ac.uk>
To: "Bass, Kevin" <BassK1 at email.chop.edu>
Cc: BioC List <Bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] Problem with function limmaCtData in HTqPCR
	package:	"leading minor of order 2 is not positive definite"
Message-ID: <12D85D00-CC61-4304-9112-7F870CA0A9D9 at ebi.ac.uk>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed

Hello Kevin,

On 21 Jul 2010, at 19:50, Bass, Kevin wrote:

> Hi,
>
> I am having a problem with using the function limmaCtData on a qPCRset
> object created with the package HTqPCR.  When I try to execute
> limmaCtData, I get the following error:
>
> "Error in chol.default(V) :
>   the leading minor of order 2 is not positive definite"
>
as your traceback() shows,  in the first step the error comes from  
lmFit from the limma package. As I recall, it means that one of  
internal design matrices  has become singular. I'm afraid I don't  
know exactly why this is happening, however it can be caused by  
trying to do to much with too few observations/replicates. Is it  
possible to use a smaller design matrix? Looking at your design  
matrix it would appear that you have no replicates of either of the 5  
treatments you list there. Based on your description of the  
experiment, I'm not really sure whether this is the case or not?

By the way, it looks like you have quite a complex plate/sample  
combination design compared to a standard qPCR analysis - I can see  
we you end up with an object called "raw_monster2" after all the  
different rbind and cbind ;)

Cheers
\Heidi

> Below, I will describe the experimental design and the steps taken to
> create my qPCRset object.  Then I will paste the commands used, and
> their results, in the steps leading up to running the limmaCtData
> function on my qPCRset object.
>
> We have 21 96-well plates.  Each plate contains 5 experimental groups
> and 4 genes--2 target genes, and 2 endogenous controls.  Each
> experimental group sampled all 4 genes, and there were 3 biological
> replicates per sample, for a total of 12 wells per experimental group.
>
> Every 7 plates among the total 21 plates constitutes a "set" of
> plates: they each contain the same 14 target genes.  This means that
> each gene, in each experimental condition, has 3 samples among the 21
> plates--one sample per experimental condition for each 7-plate set.
>
> The goal is to compare the Ct values for each gene in each
> experimental group, to the Ct values for the same gene in every other
> experimental group.
>
> Using rbind (HTqPCR), I collated 7 of the data files into one file,
> so that all 14 genes could be analyzed simultaneously, at least among
> a single set of plates--once I had figured that part out, I had
> planned on combining the 3 sets.
>
> To give a clear idea what my data looks like--and how it was
> implemented in my qPCRset object--this is the Slot "history" and Slot
> "exprs" of my combined qPCRset object (with the data removed):
>
> Slot "exprs":
>         01_veh+FA       02_low+FA       03_mid+FA       04_high+FA
> 05_no_treatment
> PGES
> PGES
> PGES
> c-Fos
> c-Fos
> c-Fos
> SPP1
> SPP1
> SPP1
> CD200
> CD200
> CD200
> COX-1
> COX-1
> COX-1
> COX-2
> COX-2
> COX-2
> OX-42
> OX-42
> OX-42
> iBA-1
> iBA-1
> iBA-1
> IL-2
> IL-2
> IL-2
> IL-4
> IL-4
> IL-4
> IL-6
> IL-6
> IL-6
> IL-8
> IL-8
> IL-8
> IL-10
> IL-10
> IL-10
> CD4
> CD4
> CD4
>
> Slot "history":
>                                                        history
> 1   raw8: readCtData(files = "NS398_08b.txt", path = barrPath,
>     n.features = 12,
> 2   flag = NULL, feature = 5, type = 7, position = 2, Ct = 6,
> 3   header = TRUE, n.data = 5)
> 4   raw9: readCtData(files = "NS398_09b.txt", path = barrPath,
>     n.features = 12,
> 5   flag = NULL, feature = 5, type = 7, position = 2, Ct = 6,
> 6   header = TRUE, n.data = 5)
> 7   raw10: readCtData(files = "NS398_10b.txt", path = barrPath,
>     n.features = 12,
> 8   flag = NULL, feature = 5, type = 7, position = 2, Ct = 6,
> 9   header = TRUE, n.data = 5)
> 10  raw11: readCtData(files = "NS398_11b.txt", path = barrPath,
>     n.features = 12,
> 11  flag = NULL, feature = 5, type = 7, position = 2, Ct = 6,
> 12  header = TRUE, n.data = 5)
> 13  raw12: readCtData(files = "NS398_12b.txt", path = barrPath,
>     n.features = 12,
> 14  flag = NULL, feature = 5, type = 7, position = 2, Ct = 6,
> 15  header = TRUE, n.data = 5)
> 16  raw13: readCtData(files = "NS398_13b.txt", path = barrPath,
>     n.features = 12,
> 17  flag = NULL, feature = 5, type = 7, position = 2, Ct = 6,
> 18  header = TRUE, n.data = 5)
> 19  raw14: readCtData(files = "NS398_14b.txt", path = barrPath,
>     n.features = 12,
> 20  flag = NULL, feature = 5, type = 7, position = 2, Ct = 6,
> 21  header = TRUE, n.data = 5)
> 22  rbind(deparse.level, ..1, ..2, ..3, ..4, ..5, ..6, ..7)
> 23  normalizeCtData(q = raw_monster2, norm = "deltaCt",
>     deltaCt.genes = "GAPDH")
> 24  filterCtDataNew(q = d.raw2, remove.type = "Endogenous
>     Control")
> 25  setCategory(q = fd.raw2, Ct.max = 100, Ct.min = 0,
>     quantile = 0.9,
>
> So, then I prepared the matrix for analysis with limma:
>
>> design<-model.matrix(~0+sampleNames(test.d.raw2))
> Warning message:
> In model.matrix.default(~0 + sampleNames(test.d.raw2)) :
>   variable 'sampleNames(test.d.raw2)' converted to a factor
>> colnames(design)<-c("VehFA","LowFA","MidFA","HighFA","NoTreat")
>> print(design)
>   VehFA LowFA MidFA HighFA NoTreat
> 1     1     0     0      0       0
> 2     0     1     0      0       0
> 3     0     0     1      0       0
> 4     0     0     0      1       0
> 5     0     0     0      0       1
> attr(,"assign")
> [1] 1 1 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$`sampleNames(test.d.raw2)`
> [1] "contr.treatment"
>> contrasts<-makeContrasts(VehFA-LowFA, VehFA-MidFA, VehFA-HighFA,
> + VehFA-NoTreat, LowFA-MidFA, LowFA-HighFA, LowFA-NoTreat,
> + MidFA-HighFA, MidFA-NoTreat,HighFA-NoTreat, levels=design)
>> colnames(contrasts)<-c("V-L", "V-M", "V-H", "V-NT", "L-M", "L-H",
> + "L-NT", "M-H", "M-NT", "H-NT")
>> print(contrasts)
>          Contrasts
> Levels    V-L V-M V-H V-NT L-M L-H L-NT M-H M-NT H-NT
>   VehFA     1   1   1    1   0   0    0   0    0    0
>   LowFA    -1   0   0    0   1   1    1   0    0    0
>   MidFA     0  -1   0    0  -1   0    0   1    1    0
>   HighFA    0   0  -1    0   0  -1    0  -1    0    1
>   NoTreat   0   0   0   -1   0   0   -1   0   -1   -1
>> test.d.raw2b<-test.d.raw2[order(featureNames(test.d.raw2)), ]
> ====================================================================== 
> =
>> qDE.limma <- limmaCtData(test.d.raw2b,design=design,
> + contrasts=contrasts,ndups=3,spacing=1)
> Error in chol.default(V) :
>   the leading minor of order 2 is not positive definite
> In addition: Warning message:
> In sqrt(dfitted.values) : NaNs produced
>> traceback()
> 6: .Call("La_chol", as.matrix(x), PACKAGE = "base")
> 5: chol.default(V)
> 4: chol(V)
> 3: gls.series(y$exprs, design = design, ndups = ndups,
>        spacing = spacing, block = block, correlation = correlation,
>        weights = weights, ...)
> 2: lmFit(data, design = design, ndups = ndups, spacing = spacing,
>        correlation = dup.cor$consensus, ...)
> 1: limmaCtData(test.d.raw2b, design = design, contrasts = contrasts,
>        ndups = 3, spacing = 1)
>
> Any ideas on why I am getting this error and what I might do to avoid
> it?  If there is any other information needed, please let me know.
>
> Thanks,
> Kevin
> bassk1 at email.chop.edu
>
>
>
> =====
>
> Kevin Bass, Research Technician
> Barr Lab
> Children's Hospital of Philadelphia
> Abramson Research Center
> 3615 Civic Center Blvd, Suite 714
> Philadelphia PA 19104-4399
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/ 
> gmane.science.biology.informatics.conductor

------------------------------

Message: 29
Date: Fri, 23 Jul 2010 05:50:49 -0400
From: Vincent Carey <stvjc at channing.harvard.edu>
To: bioconductor <bioconductor at stat.math.ethz.ch>
Subject: [BioC] building a refseq-based transcriptDb: warnings of
	interest?
Message-ID:
	<AANLkTikXJH9DBszeynccWST2HJ15SbohmBnd8E46m5_- at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

> hg18r.txdb = makeTranscriptDbFromUCSC(tablename="refGene")
Download the refGene table ... OK
Download the refLink table ... OK
Extract the 'transcripts' data frame ... OK
Extract the 'splicings' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TranscriptDb object ... OK
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i],
exon_locs$start[[i]],  ... :
  UCSC data anomaly in transcript NM_017940: the cds cumulative length
is not a multiple of 3
2: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i],
exon_locs$start[[i]],  ... :
  UCSC data anomaly in transcript NM_001037675: the cds cumulative
length is not a multiple of 3
3: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i],
exon_locs$start[[i]],  ... :
  UCSC data anomaly in transcript NM_001039703: the cds cumulative
length is not a multiple of 3
4: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i],
exon_locs$start[[i]],  ... :

and so on.  Does this need to be reported to UCSC?

>  sessionInfo()
R version 2.12.0 Under development (unstable) (2010-06-30 r52417)
Platform: x86_64-apple-darwin10.3.0/x86_64 (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices datasets  tools     utils     methods
[8] base

other attached packages:
[1] GenomicFeatures_1.1.6 GenomicRanges_1.1.15  IRanges_1.7.13
[4] weaver_1.15.0         codetools_0.2-2       digest_0.4.2

loaded via a namespace (and not attached):
[1] BSgenome_1.17.5    Biobase_2.9.0      Biostrings_2.17.26 DBI_0.2-5
[5] RCurl_1.4-2        RSQLite_0.9-1      XML_3.1-0          biomaRt_2.5.1
[9] rtracklayer_1.9.3

------------------------------

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor

End of Bioconductor Digest, Vol 89, Issue 22
********************************************

**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager (it.support at cancer.ucl.ac.uk).
**********************************************************************