[BioC] HTqPCR question

Tue Dec 7 19:59:12 CET 2010

Hello Simon,

your questions seems to have made it to the Bioconductor list with a few
days delay, so I'm continuing our off-list discussion from last week here
(everything copied in below).

Regarding the normalisation, it sounds like you might want to try either
the "norm.rankinvariant" or "scale.rankinvariant" methods in
normalizeCtData. Both of them identifies a set of rank-invariant genes,
i.e. don't rely on e.g. actin. "scale.rankinvariant" simply scales each
qPCR card by a fixed factor identified based on the rank-invariant genes.
"norm.rankinvariant" fits a smoothing line through the rank-invariant
genes, and then applies this smoothing to the rest of the genes.

The latter works best if there is more than a few rank-invariant genes.
For a comparison of the data before/after normalisation, you can do a
simple scatterplot, similar to figure 7 in the vignette (also included in
the examples in ?normalizeCtData). This gives you an idea of how each
method performs, and what the differences between them are.

Regarding how replicates are treated, then this depends on the function.
Functions such as setCategory() for marking some genes as potentially
unreliable can account for technical replicates within each qPCR plate by
setting replicates=TRUE. However, replicates from different qPCR plates
are treated identically, whether they come from biological or technical
sample replicates.

When testing for differential expression, technical vs biological
replicates can be accounted for using the function limmaCtData, which
incorporates funcitonality from the "limma" package. There are some
examples of this in the limma user manual, accessible via
limmaUsersGuide() in R, especially in section 8.2.

In your case that would involve calling duplicateCorrelation() first,
followed by limmaCtData(). However, if you give some more details about
which samples you have + what type of replicates they are, we can probably
help you construct you design matrix if you haven't used limma before.

HTH
\Heidi

======================
Off-list conversation copied in below, for the archives
======================

Hi Simon,

> Thanks Heidi,
> I posted my questions (plus a few others), on the forum as well.

hm, I seemed to have overlooked those in my Bioconductor email box. Do you
know which date you sent your questions? I that case I can probably chase
them up.

> So for
> the flags, HTqPCR needs them to be either "Passed" or "Flagged"? Then you
> just substitute using rep as you suggest below?

Actually, the flags can be anything. For an example, I just used the
default qPCRset object that comes with HTqPCR. You can read in data
without specifying flags, and alter that later as you wish.

In the example I email you, the flag should be read in automatically:

test <- readCtData(files="1131135160.csv", # file name
+     n.features=2304, # if you use the 48.48 assay
+     flag=9, # the column containing flags
+     feature=5, # column containing gene names
+     type=6, # column containing probe type
+     Ct=7, # column containing the actual values
+     skip=24, # additional argument to read.table; the number of lines to
skip before hte data starts. Probably different in your file!
+     sep=",") # additional argument to read.table; the data is "," not
"\t" separated.
Warning message:
In readCtData(files = "1131135160.csv", n.features = 2304, flag = 9,  :
  2304 gene names (rows) expected, got 2314
show(test)
An object of class "qPCRset"
Size:  2304 features, 1 samples
Feature types:		 NRC, Test
Feature names:		 T09F3_3_HK T09F3_3_HK F33H1_2_HK ...
Feature classes:
Feature categories:	 OK
Sample names:		 1131135160 NA NA ...
head(flag(test))
  1131135160
1       Pass
2       Pass
3       Pass
4       Pass
5       Pass
6       Pass
table(flag(test))

Fail Pass
 835 1469

> Also, if you could respond to the post on the forum (with potential
> answers to the questions about technical versus biological, and potential
> normalizers, that would be great).

I'm not exactly sure what your question here is. You say that you're not
clear about technical versus biological reps. Do you mean how many you
need, whether to normalise one kind different from another or something
else?

Cheers
\Heidi

>> Hi Simon,
>>
>> thanks for your interest in HTqPCR. In principle, there's no reason why it
>> shouldn't be applicable to Fluidigm data. I used it briefly at some point
>> for one of the 96.96 assays, and I've also had other questions from people
>> on the Bioconductor list using HTqPCR for Fluidigm. I've been meaning to
>> add a section about that to the HTqPCR vignette, but didn't have time for
>> the before the previous bioconductor release.
>>
>> For specifying the flags, you can use the flag(<your_object>) <-
>> <your_flags> syntax. The flags have to be in a data.frame, but apart from
>> that they can be anything. Although I must admit that I never really
>> tested this... E.g. for changing just the first column, if flags are
>> already present:
>>
>> data(qPCRraw)
>> head(flag(qPCRraw))
>>  sample1 sample2 sample3 sample4 sample5 sample6
>> 1  Passed  Passed  Passed Flagged  Passed  Passed
>> 2  Passed  Passed  Passed Flagged  Passed  Passed
>> 3  Passed  Passed  Passed  Passed  Passed  Passed
>> 4  Passed  Passed  Passed  Passed  Passed  Passed
>> 5  Passed  Passed  Passed  Passed  Passed  Passed
>> 6 Flagged Flagged Flagged Flagged Flagged Flagged
>> new.flags <- data.frame(rep("OK", 384))
>> flag(qPCRraw)[,1] <- new.flags
>> head(flag(qPCRraw))
>>  sample1 sample2 sample3 sample4 sample5 sample6
>> 1      OK  Passed  Passed Flagged  Passed  Passed
>> 2      OK  Passed  Passed Flagged  Passed  Passed
>> 3      OK  Passed  Passed  Passed  Passed  Passed
>> 4      OK  Passed  Passed  Passed  Passed  Passed
>> 5      OK  Passed  Passed  Passed  Passed  Passed
>> 6      OK Flagged Flagged Flagged Flagged Flagged
>>
>> Alternatively, it should be possible to read them into a qPCRset object at
>> the same time as your data. My mac has done strange things to the line
>> ending in your file, so this might not work directly for you, but
>> something along the lines of:
>>
>> test <- readCtData(files="1131135160.csv", # file name
>>    n.features=2304, # if you use the 48.48 assay
>>    flag=9, # the column containing flags
>>    feature=5, # column containing gene names
>>    type=6, # column containing probe type
>>    Ct=7, # column containing the actual values
>>    skip=24, # additional argument to read.table; the number of lines to
>> skip before hte data starts. Probably different in your file!
>>    sep=",") # additional argument to read.table; the data is "," not "\t"
>> separated.
>>
>> HTH
>> \Heidi

>>> I'm trying to use the package developed by Heidi Dvinge (HTqPCR), and
I've
>>> a couple of basic questions. The package seems very nice, however, I'm
>>> using a high throughput real-time PCR system, the Biomark from Fluidigm,
>>> which can carry out upto 40,000 PCR's in a single run (approximately).
>>> Most of my runs use about 10,000 PCRs, with multiple technical
replicates,
>>> and occasionally I use several plates if I'm assying many samples, and
>>> dozens of genes.
>>>
>>> Typically, I've been using the geNorm approach for identifying potential
>>> normalizers in a suite of genes in any given biological paradigm, as a
>>> priori its always best to identify a gene which doesnt change in the
>>> question of interest (as opposed to using something like actin, and
>>> assuming it doesnt change). Is there a way of identifying the "best"
>>> normalizer in HTqPCR? Also, I'm having trouble specifying flags. I
>>> understand how to designate the appropriate column in the file as the
flag
>>> column, however, I dont understand how to specify the pass/fail criteria.
>>> In the Biomarks output, each PCR reaction is specified as "Pass" or
>>> "Fail". I've been over the vignette for HTqPCR, and cant see how to
define
>>> the "Faliure" criteria.
>>>
>>> In addition, I'm a bit fuzzy on how HtQPCR handles technical reps as
>>> opposed to biological reps. There is some discussion of this in the
>>> vignette, but I'm having trouble following it.
>>>
>>> Here are a few lines of our typical output from our Biomark machine in
>>> case this helps.
>>>
>>>
>>> Chip Run Filename
>>>
>>> X:\Real-time studies\Biomark\HK_Mar2010\ChipRun.bml
>>>
>>>
>>>
>>>
>>>
>