[BioC] DEXSeq many exons marked as not testable

Wed Sep 19 13:02:41 CEST 2012

Hello,

I'm using DEXSeq to obtain differentially expressed exons between my 15 
samples and 8 controls. But for some reason most exons are not testable 
and are not assigned a p-value. Do you know what I'm doing wrong?

First I've created a conservative list of exons per gene in the human 
genome through the canonical transcripts table in UCSC (194511 in 
total). Next I've obtained read counts for each exon in each sample 
which are read into one big data.frame. Then the following list of 
commands are executed:

metadata = data.frame(
   row.names=colnames(dat)[7:29],
   condition=c(rep("control", 8), rep("data", 15)),
   replicate=c(c(1:8), c(1:15)),
   libType=rep("paired-end", 23))

## Add strand information that is not available in the read counts 
data.frame
dat = cbind(dat, strand=rep(NA, nrow(dat)))

cds = newExonCountSet(countData = dat[,c(7:29)], design = metadata,
                        geneIDs = dat[,5], exonIDs = dat[,6],
                        exonIntervals=dat[,c(2,3,4, 30)])

cds = estimateSizeFactors(cds)
cds = estimateDispersions(cds, minCount=2, maxExon=90)
cds = fitDispersionFunction(cds)
cds = testForDEU(cds)
cds = estimatelog2FoldChanges(cds)

Now lets see how many exons are expressed significantly different 
between data and controls. Surprisingly only 1850 exons are described 
here, not the full 194511:

res1 = DEUresultTable(cds)
table(res1$padjust < 0.05)

FALSE  TRUE
  1424   426

So I've zoomed in to an example gene. Below it can be seen that indeed 
all exons are marked as not testable. But looking at the individual read 
counts per sample (see further below) it does not make sense that they 
are not testable. Some exons indeed have a low read count and are 
excluded rightfully, but others have enough reads to base dispersion on 
I would think.

If I have understood correctly testable can be false in either of these 
cases:
- Gene has only one exon, dispersion cannot be calculated.
- Readcounts for an exon are too low.
- A combination of the above.

But all do not apply here. Any ideas?

fData for the gene:
                 geneID exonID testable dispBeforeSharing dispFitted 
dispersion pvalue padjust chr    start      end strand transcripts
136687 ENSG00000009780      1    FALSE                NA  0.2026585 
0.2026585     NA      NA   1 28052568 28052672   <NA> <NA>
108973 ENSG00000009780      2    FALSE                NA  0.9549670 
0.9549670     NA      NA   1 28053983 28054047   <NA> <NA>
1100   ENSG00000009780      3    FALSE                NA  1.9560912 
1.9560912     NA      NA   1 28056743 28056844   <NA> <NA>
1      ENSG00000009780      4    FALSE                NA  0.4722995 
0.4722995     NA      NA   1 28059114 28059168   <NA> <NA>
22096  ENSG00000009780      5    FALSE                NA  0.1146813 
0.1146813     NA      NA   1 28060542 28060694   <NA> <NA>
13670  ENSG00000009780      6    FALSE                NA  0.1127563 
0.1127563     NA      NA   1 28071165 28071322   <NA> <NA>
13666  ENSG00000009780      7    FALSE                NA  0.1450013 
0.1450013     NA      NA   1 28075579 28075665   <NA> <NA>
22095  ENSG00000009780      8    FALSE                NA  0.1160550 
0.1160550     NA      NA   1 28081706 28081841   <NA> <NA>
13665  ENSG00000009780      9    FALSE                NA  0.1129432 
0.1129432     NA      NA   1 28086037 28086138   <NA> <NA>
138629 ENSG00000009780     10    FALSE                NA  0.1112855 
0.1112855     NA      NA   1 28087006 28087092   <NA> <NA>

Read counts for the gene per sample:
                exon_id chr    start      end         gene_id exon_nr 
sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8
1      ENSE00000327880   1 28059114 28059168 ENSG00000009780 4      
11       8       4       7       3       1       8       3
1100   ENSE00000571221   1 28056743 28056844 ENSG00000009780 3       
1       0       1       1       0       0       0       1
13665  ENSE00000761751   1 28086037 28086138 ENSG00000009780 9      
91     145      86      47     169      36     231     136
13666  ENSE00000761753   1 28075579 28075665 ENSG00000009780 7      
19      62      32      18      44      23      61      38
13670  ENSE00000761780   1 28071165 28071322 ENSG00000009780 6      
77     139      73      66      78      30     168      88
22095  ENSE00000866714   1 28081706 28081841 ENSG00000009780 8      
78      94      52      47      40      41     121      60
22096  ENSE00000866716   1 28060542 28060694 ENSG00000009780 5      
36      61      50      63      67      10     129      69
108973 ENSE00001625313   1 28053983 28054047 ENSG00000009780 2       
3       5       3       0       1       0       4       2
136687 ENSE00001909353   1 28052568 28052672 ENSG00000009780 1       
2      12       5       0      19       0      34      10
138629 ENSE00001955917   1 28087006 28087092 ENSG00000009780 10      
42      96      50      24      98      23     117      79
        sample9 sample10 sample11 sample12 sample13 sample14 sample15 
sample16 sample17 sample18 sample19 sample20 sample21 sample22
1            1        1        0        3        4        1 1        
0        3        7        0        0        1        0
1100         2        0        0        0        0        1 1        
0        0        3        1        0        0        0
13665       78       64      130       55       66       27 75       
53       49      126       36       37       48       53
13666        7       20       51        6       43        8 15       
19       11       26        7       15       11        5
13670       75       85      186       79       86       58 70       
86       47      149       82       49       66       38
22095       64       64      149       51       91       37 67       
54       58      148       54       41       64       43
22096       90       79      287       66       97       29 75       
92       58      126       67       55       71       51
108973       1        0        0        1        1        0 1        
1        0        4        0        0        0        0
136687       5       28       31        6       27        5 4       
12        5       32        6        7        9       16
138629     119      111      233      100      129       59 141      
103       70      219       92       60       74       67
        sample23 strand
1             0     NA
1100          0     NA
13665        24     NA
13666         7     NA
13670        27     NA
22095        25     NA
22096        21     NA
108973        1     NA
136687        1     NA
138629       54     NA

Design of the experiment:
          condition replicate    libType
sample1    control         1 paired-end
sample2    control         2 paired-end
sample3    control         3 paired-end
sample4    control         4 paired-end
sample5    control         5 paired-end
sample6    control         6 paired-end
sample7    control         7 paired-end
sample8    control         8 paired-end
sample9       data         1 paired-end
sample10      data         2 paired-end
sample11      data         3 paired-end
sample12      data         4 paired-end
sample13      data         5 paired-end
sample14      data         6 paired-end
sample15      data         7 paired-end
sample16      data         8 paired-end
sample17      data         9 paired-end
sample18      data        10 paired-end
sample19      data        11 paired-end
sample20      data        12 paired-end
sample21      data        13 paired-end
sample22      data        14 paired-end
sample23      data        15 paired-end

 > sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i686-pc-linux-gnu (32-bit)

locale:
  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C LC_TIME=en_GB.UTF-8        
LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8
  [6] LC_MESSAGES=C              LC_PAPER=C LC_NAME=C                  
LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods base

other attached packages:
[1] plyr_1.7.1          DEXSeq_1.2.1        BiocInstaller_1.4.3 
DESeq_1.8.3         locfit_1.5-7        Biobase_2.16.0
[7] BiocGenerics_0.2.0

loaded via a namespace (and not attached):
  [1] annotate_1.34.0      AnnotationDbi_1.18.0 biomaRt_2.12.0 
DBI_0.2-5            genefilter_1.38.0    geneplotter_1.34.0
  [7] grid_2.15.0          hwriter_1.3          IRanges_1.14.2 
lattice_0.20-6       RColorBrewer_1.0-5   RCurl_1.91-1
[13] RSQLite_0.11.1       splines_2.15.0       statmod_1.4.15 
stats4_2.15.0        stringr_0.6          survival_2.36-12
[19] tools_2.15.0         XML_3.9-4            xtable_1.7-0

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.