[BioC] edgeR, very big lib.size makes CPM very small

Vang Quy Le / Region Nordjylland vql at rn.dk
Mon Aug 18 21:43:49 CEST 2014


Thank you for the confirmation, Gordon. Even though it might be a small thing, but very assuring for me to know.

Best regards
Vang

On 15 Aug 2014, at 03:04, Gordon K Smyth <smyth at wehi.edu.au<mailto:smyth at wehi.edu.au>> wrote:

Date: Wed, 13 Aug 2014 13:37:23 +0000
From: Vang Quy Le / Region Nordjylland <vql at rn.dk<mailto:vql at rn.dk>>
To: "bioconductor at r-project.org<mailto:bioconductor at r-project.org>" <bioconductor at r-project.org<mailto:bioconductor at r-project.org>>
Subject: [BioC] edgeR, very big lib.size makes CPM very small

Hello,
I am working with count table that has very big lib.size:
dge at .Data[[2]]$lib.size
[1] 3.2e+08 4.2e+08 4.5e+08 3.8e+08 2.3e+08 2.1e+08 3.3e+08 2.8e+08


This causes CPM very small, and consequently very negative logCPM. This is 'head' of my  cpm(counts):

          C1     C2     C3     C4     T1    T2    T3     T4
00000001 0.000 0.0000 0.0000 0.0026 0.0042 0.000 0.000 0.0035
00000002 0.012 0.0092 0.0086 0.0103 0.0042 0.014 0.006 0.0070
00000003 0.073 0.0554 0.0474 0.0620 0.0584 0.056 0.057 0.0525
00000004 0.073 0.0624 0.0496 0.0620 0.0626 0.056 0.060 0.0525
00000005 0.076 0.0624 0.0496 0.0594 0.0584 0.056 0.060 0.0490
00000006 0.067 0.0624 0.0474 0.0620 0.0584 0.046 0.066 0.0630


The point that concerns me here is the effect number of decimal places and rounding of numbers may lose sensitivity.

No, not unless you are planning to run R on a 1960's calculator without floating point arithmetic.

Is this something that can effect the outcome of analysis?

No.  Modern computers with floating point arithmetic have no trouble with trivial issues like this.

Floating point arithmetic means that numbers are not rounded to any fixed number of decimal places.  Rather, all numbers are stored to the same number of significant figures regardless of their absolute size.

If it does, should I just scale the counts up before putting the data through my workflow?

No, you should not falsify the true nature of your data to edgeR.

Gordon

##### body of 'cpm' function/method #######
{
  x <- as.matrix(x)
  if (is.null(lib.size))
      lib.size <- colSums(x)
  if (log) {
      prior.count.scaled <- lib.size/mean(lib.size) * prior.count
      lib.size <- lib.size + 2 * prior.count.scaled
  }
  lib.size <- 1e-06 * lib.size
  if (log)
      log2(t((t(x) + prior.count.scaled)/lib.size))
  else t(t(x)/lib.size)
}


Kind regards,

Vang Quy Le
Bioinformatician, Molecular Biologist, PhD

+45 97 66 56 29
vql at rn.dk<mailto:vql at rn.dk>

AALBORG UNIVERSITY HOSPITAL
Section for Molecular Diagnostics,
Clinical Biochemistry
Reberbansgade
DK 9000 Aalborg
www.aalborguh.rn.dk

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:22}}



More information about the Bioconductor mailing list