[BioC] HTqPCR problems
Simon Melov
smelov at buckinstitute.org
Fri Jul 6 20:54:34 CEST 2012
Great,
thanks Heidi. This did the job. Quick question, can you make scatterplots instead of barcharts for any of the plot functions? I've looked at plotCtOverview, and I dont see a way to specify the chart type. Its helpful to have some flexibility here, so you can see how tight the spread is (personal preference I know, as you allow CI to be displayed, I just prefer to see the actual data)
thanks
s
On Jul 6, 2012, at 1:07 AM, Heidi Dvinge wrote:
> Hi Simon,
>
>> Hi Heidi,
>> I've a followup from the question from listed below. I've started to merge
>> multiple plates together using rbind as you suggested. I've identical gene
>> order per plate, but different biological samples belonging to different
>> groups per plate. I can manipulate these using your graphing functions on
>> a per plate basis, but I'm unclear how to address comparisons with regards
>> to groups once the plates are merged.
>>
>> For example, I have 5 text files which tie sample ID to specific groups of
>> interest. After importing and merging the 5 separate Biomark files by
>> rbind, my merged object is 48x240. How do I then contrast the different
>> groups? I'm clear on how to do this in a single plate (using one of the 5
>> individual text files), but confused with regards to the process after
>> merging. Should I have a vector with 240 identifiers for the merged files
>> which is in the same order the merged plates are? (top to bottom (row),
>> and left to right? (by column).
>>
> Yes (presuming I've understood your description correct). You can either
> also just combine your individual text files with samples descriptions, or
> create a whole new object describing each of your samples, as long as the
> order of your descriptions are the same as the columns in your qPCRset
> object.
>
> Alternatively, please note that as of version 1.9.0 (AFAIR) qPCRsets now
> inherit from class ExpressionSet, i.e. they have a slot called phenoData
> (an AnnotatedDataFrame) similar to what objects for microarray analysis
> have. E.g.
>
>> data(qPCRraw)
>> phenoData(qPCRraw)
> An object of class "AnnotatedDataFrame"
> sampleNames: sample1 sample2 ... sample6 (6 total)
> varLabels: sample
> varMetadata: labelDescription
>> pData(qPCRraw)
> sample
> sample1 1
> sample2 2
> sample3 3
> sample4 4
> sample5 5
> sample6 6
>> pData(qPCRraw)$Genotype <- c("A", "B", "A", "A", "B", "B")
>> pData(qPCRraw)
> sample Genotype
> sample1 1 A
> sample2 2 B
> sample3 3 A
> sample4 4 A
> sample5 5 B
> sample6 6 B
>
> You can use this to store information about samples within the qPCRset
> object itself, rather than in 'external' objects. There's more info about
> this in the general help files for:
>
> ?AnnotatedDataFrame
> ?ExpressionSet
>
> \Heidi
>
>
>> thanks
>>
>> Simon
>> On Jun 27, 2012, at 3:27 PM, Heidi Dvinge wrote:
>>
>>>> Hi Heidi,
>>>> you are correct, yes 48.48.
>>>> The example you provide below is exactly what I needed for
>>>> clarification
>>>> for groups. I was trying to reverse engineer what you had done with the
>>>> original expression set package for microarrays, but from below, I can
>>>> get
>>>> this to work now.
>>>>
>>> Glad it works. Hopefully by the next BioConductor release I'll remember
>>> to
>>> clarify the plotCtOverview help file.
>>>
>>>> Just to be clear, I have 5 48.48 plates. Should I normalize each
>>>> individually prior to combining, or should I reformat to a 2304x1 each,
>>>> combine, then normalize (not sure if you can do that or not)
>>>>
>>> Hm, that's one of the questions I've also been asking myself, so I would
>>> be curious to hear what your results from this are.
>>>
>>> If you suspect that there are some major factors influencing the 5
>>> plates
>>> systematically, then normalising them in a 2304 x 5 object should
>>> (hopefully) correct for that. For example, they may have been run on
>>> different days, by different people, or perhaps there was a short power
>>> cut during the processing of one of them. This might be visible if you
>>> have for example a boxplot of Ct from all 48*5 samples, and you see
>>> blocks
>>> of them shifted up or down.
>>>
>>> Obviously, this doesn't take care of normalisation between your samples
>>> within each plate though. If you suspect your samples to have some
>>> systematic variation that you need to account for, then you can
>>> normalise
>>> each plate individually (as a 48 x 48) object. Alternatively, you can
>>> try
>>> to combine within- and between-sample normalisation by taking all 48 x
>>> 240
>>> values at once.
>>>
>>> In principle, you can split, reformat and the recombine the data in
>>> however many ways you like. Personally, with any sort of data I prefer
>>> to
>>> go with as little preprocessing as possible, since each additional step
>>> can potentially introduce its own biases into the data. So unless there
>>> are some obvious variation between your 5 plates, I'd probably stick
>>> with
>>> just normalisation between the samples, e.. using a 48 x 240 object.
>>>
>>> Of course, you may have different preferences, or find out that a
>>> completely different approach is required for this particular data set.
>>>
>>> \Heidi
>>>
>>>> thanks again for your prompt responses!
>>>>
>>>> best
>>>>
>>>> s
>>>>
>>>> On Jun 27, 2012, at 2:26 PM, Heidi Dvinge wrote:
>>>>
>>>>> Hi Simon,
>>>>>
>>>>>> Thanks for the help Heidi,
>>>>>> but I'm still having troubles, your comments on the plotting helped
>>>>>> me
>>>>>> solve the outputs. But if I want to just display some groups (for
>>>>>> example
>>>>>> the LO group in the example below), how do I associate a group with
>>>>>> multiple samples (ie biological reps)?
>>>>>>
>>>>>> Currently I'm associating genes with samples by reading in the file
>>>>>> as
>>>>>> below
>>>>>> plate6=read.delim("plate6Sample.txt", header=FALSE)
>>>>>> #this is a file to associate sample ID with the genes in the biomark
>>>>>> data,
>>>>>> as currently HTqPCR does not seem to associate the sample info in the
>>>>>> Biomark output to the gene IDs
>>>>>>
>>>>> Erm, no, it doesn't :-/
>>>>>
>>>>>> samples=as.vector(t(plate6))
>>>>>> raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48,
>>>>>> n.data=48, samples=samples)
>>>>>> #now I have samples and genes similar to your example in the guide,
>>>>>> but
>>>>>> I
>>>>>> want to associate samples to groups now. In the guide, you have an
>>>>>> example
>>>>>> where you have entire files as distinct samples, but in our runs, we
>>>>>> never
>>>>>> have that situation. I have a file which associates samples to
>>>>>> groups,
>>>>>> which I read in...
>>>>>>
>>>>>> groupID=read.csv("plate6key.csv")
>>>>>>
>>>>>> but how do I associate the samples with their appropriate groups for
>>>>>> biological replicates with any of the functions in HtQPCR?
>>>>>
>>>>> I'm afraid I'm slightly confused here (sorry, long day). Exactly how
>>>>> is
>>>>> your data formatted? I.e. are the columns either individual samples,
>>>>> or
>>>>> from files containing multiple samples? So for example for a single
>>>>> 48.48
>>>>> arrays, is your qPCRset object 2304 x 1 or 48 x 48?
>>>>>
>>>>> From your readCtData command I'm guessing you have 48 x 48, i.e. all
>>>>> 48
>>>>> samples from your 1 array are in columns. In that case the 'groups'
>>>>> parameter in plotCtOverview will need to be a vector of length 48,
>>>>> indicating how you want the 48 columns in your qPCRset object to be
>>>>> grouped together.
>>>>>
>>>>> Below is an example of (admittedly ugly) plots. I don't know if that's
>>>>> similar to your situation at all.
>>>>>
>>>>> \Heidi
>>>>>
>>>>>> # Reading in data
>>>>>> exPath <- system.file("exData", package = "HTqPCR")
>>>>>> raw1 <- readCtData(files = "BioMark_sample.csv", path = exPath,
>>>>>> format
>>>>>> =
>>>>> "BioMark", n.features = 48, n.data = 48)
>>>>>> # Check sample names
>>>>>> head(sampleNames(raw1))
>>>>> [1] "Sample1" "Sample2" "Sample3" "Sample4" "Sample5" "Sample6"
>>>>>> # Associate samples with (randomly chosen) groups
>>>>>> anno <- data.frame(sampleID=sampleNames(raw1), Group=rep(c("A", "B",
>>>>> "C", "D"), times=c(4,24,5,15)))
>>>>>> head(anno)
>>>>> sampleID Group
>>>>> 1 Sample1 A
>>>>> 2 Sample2 A
>>>>> 3 Sample3 A
>>>>> 4 Sample4 A
>>>>> 5 Sample5 B
>>>>> 6 Sample6 B
>>>>>> # Plot the first gene - for each sample individually
>>>>>> plotCtOverview(raw1, genes=featureNames(raw1)[1], legend=FALSE,
>>>>> col=1:nrow(anno))
>>>>>> # Plot the first gene - for each group
>>>>>> plotCtOverview(raw1, genes=featureNames(raw1)[1], group=anno$Group,
>>>>> legend=FALSE, col=1:length(unique(anno$Group)))
>>>>>> # Plot the first gene, using group "A" as a control
>>>>>> plotCtOverview(raw1, genes=featureNames(raw1)[1], group=anno$Group,
>>>>> legend=FALSE, col=1:length(unique(anno$Group)), calibrator="A")
>>>>>
>>>>>
>>>>>
>>>>>> You recommend below using a vector, but I dont see how that helps me
>>>>>> associate the samples in the Expression set.
>>>>>>
>>>>>> thanks again!
>>>>>>
>>>>>> s
>>>>>>
>>>>>> On Jun 26, 2012, at 12:48 PM, Heidi Dvinge wrote:
>>>>>>
>>>>>>>> Hi,
>>>>>>>> I'm having some troubles selectively sub-setting, and graphing up
>>>>>>>> QPCR
>>>>>>>> data within HTqPCR for Biomark plates (both 48.48 and 96.96
>>>>>>>> plates).
>>>>>>>> I'd
>>>>>>>> like to be able to visualize specific genes, with specific groups
>>>>>>>> we
>>>>>>>> run
>>>>>>>> routinely on our Biomark system. Typical runs are across multiple
>>>>>>>> plates,
>>>>>>>> and have multiple biological replicates, and usually 2 or more
>>>>>>>> technical
>>>>>>>> replicates (although we are moving away from technical reps, as the
>>>>>>>> CVs
>>>>>>>> are so tight).
>>>>>>>>
>>>>>>>> Can anyone help with this? Heidi?
>>>>>>>>
>>>>>>>> raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48,
>>>>>>>> n.data=48, samples=samples)
>>>>>>>> #Ive read the samples in from a separate file, as when you read it
>>>>>>>> in,
>>>>>>>> it
>>>>>>>> doesnt take the sample names supplied in the biomark output#
>>>>>>>> #Next, I want to plot genes of interest, with samples of interest,
>>>>>>>> and
>>>>>>>> I'm
>>>>>>>> having trouble getting an appropriate output#
>>>>>>>>
>>>>>>>> g=featureNames(raw6)[1:2]
>>>>>>>> plotCtOverview(raw6, genes=g, groups=groupID$Treatment,
>>>>>>>> col=rainbow(5))
>>>>>>>>
>>>>>>>> #This plots 1 gene across all 48 samples#
>>>>>>>> #but the legend doesnt behave, its placed on top of the plot, and I
>>>>>>>> cant
>>>>>>>> get it to display in a non-overlapping fashion#
>>>>>>>> #I've tried all sorts of things in par, but nothing seems to shift
>>>>>>>> the
>>>>>>>> legend's position#
>>>>>>>>
>>>>>>> As the old saying goes, whenever you want a job done well, you'll
>>>>>>> have
>>>>>>> to
>>>>>>> do it yourself ;). In this case, the easiest thing is probably to
>>>>>>> use
>>>>>>> legend=FALSE in plotCtOverview, and then afterwards add it yourself
>>>>>>> in
>>>>>>> the
>>>>>>> desired location using legend(). That way, if you have a lot of
>>>>>>> different
>>>>>>> features or groups to display, you can also use the ncol parameter
>>>>>>> in
>>>>>>> legend to make several columns within the legend, such as 3x4
>>>>>>> instead
>>>>>>> of
>>>>>>> the default 12x1.
>>>>>>>
>>>>>>> Alternatively, you can use either xlim or ylim in plotCtOverview to
>>>>>>> add
>>>>>>> some empty space on the side where there's then room for the legend.
>>>>>>>
>>>>>>>> #I now want to plot a subset of the samples for specific genes#
>>>>>>>>> LOY=subset(groupID,groupID$Treatment=="LO" | groupID$Treatment==
>>>>>>>>> "LFY")
>>>>>>>>> LOY
>>>>>>>> Sample Treatment
>>>>>>>> 2 L20 LFY
>>>>>>>> 5 L30 LFY
>>>>>>>> 7 L45 LO
>>>>>>>> 20 L40 LO
>>>>>>>> 27 L43 LO
>>>>>>>> 33 L29 LFY
>>>>>>>> 36 L38 LO
>>>>>>>> 40 L39 LO
>>>>>>>> 43 L23 LFY
>>>>>>>>
>>>>>>>>
>>>>>>>>> plotCtOverview(raw6, genes=g, groups=LOY, col=rainbow(5))
>>>>>>>> Warning messages:
>>>>>>>> 1: In split.default(t(x), sample.split) :
>>>>>>>> data length is not a multiple of split variable
>>>>>>>> 2: In qt(p, df, lower.tail, log.p) : NaNs produced
>>>>>>>>>
>>>>>>>
>>>>>>> Does it make sense if you split by groups=LOY$Treatment? It looks
>>>>>>> like
>>>>>>> the
>>>>>>> object LOY itself is a data frame, rather than the expected vector.
>>>>>>>
>>>>>>> Also, you may have to 'repeat' the col=rainbow() argument to fit
>>>>>>> your
>>>>>>> number of features.
>>>>>>>
>>>>>>>>
>>>>>>>> #it displays the two groups defined by treatment, but doesnt do so
>>>>>>>> nicely,
>>>>>>>> very skinny bars, and the legend doesnt reflect what its
>>>>>>>> displaying#
>>>>>>>> #again, I've tried monkeying around with par, but not sure what
>>>>>>>> HTqPCR
>>>>>>>> is
>>>>>>>> calling to make the plots#
>>>>>>>>
>>>>>>> If the bars are very skinny, it's probably because you're displaying
>>>>>>> a
>>>>>>> lot
>>>>>>> of features. Nothing much to do about that, except increasing the
>>>>>>> width
>>>>>>> or
>>>>>>> your plot :(.
>>>>>>>
>>>>>>> \Heidi
>>>>>>>
>>>>>>>> please help!
>>>>>>>>
>>>>>>>> thanks
>>>>>>>>
>>>>>>>> Simon.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioconductor mailing list
>>>>>>>> Bioconductor at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>> Search the archives:
>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
More information about the Bioconductor
mailing list