[BioC] preparing sequencing data for use with anota

Nils Grabole [guest] guest at bioconductor.org
Wed Mar 26 13:43:06 CET 2014

I would like to analyse my sequencing data with anota, starting with the function "anotaPerformQc".
Regrettably I get the following error message:

anotaQcOut <- anotaPerformQc(dataT= my_data_cytosolic_mRNA, dataP=my_data_translational_Activity, phenoVec=vec, nDfbSimData=500, useProgBar=TRUE)

Running anotaPerformQc quality control
        Calculating omnibus interactions & effects and dfbetas                                                                                                                                                                                                                                                                                                                            Error in if (groupSlope[i] > 1 | groupSlope[i] < 0) { : missing value where TRUE/FALSE needed

> traceback()
1: anotaPerformQc(dataT = t, dataP = r, phenoVec = vec, nDfbSimData = 500, 
       useProgBar = TRUE)

My input data looks as follows:

> head(my_data_cytosolic_mRNA)
        1  2 3 4 5 6 7  8
A2M     3  0 7 0 6 4 5 13
A2ML1   4 11 3 0 3 1 6  3
A2MP1   2  2 2 0 0 2 2  6
A3GALT2 0  1 1 0 0 0 1  3
A4GALT  0  0 0 0 0 0 0  0
A4GNT   0  0 3 0 0 0 1  0
> head(my_data_translational_Activity)
        1 2  3 4 5  6 7 8
A2M     9 0 18 4 9 41 0 0
A2ML1   4 5  1 1 0  0 2 0
A2MP1   0 0  0 0 0  0 0 0
A3GALT2 2 0  1 0 1  1 5 0
A4GALT  0 0  0 0 0  0 0 0
A4GNT   0 0  0 0 0  0 0 0
> vec
[1] "wt"  "wt"  "wt"  "wt"  "mut" "mut" "mut" "mut"

I read the anota vignette and reference manual, which mentions "groupSlope" in the explanation for the "omniGroupStats" argument. The arguments for the input data is simply described as "data matrix with non numerical rownames".
Looking at the sample data provided with the package (see below) I ASSUME I need to process the sequencing count data before I use it within anota.

> head(anota_example_counts)
      yorf     norm     dens count  len  total
1 15S_rRNA 1471.349 1261.805  2111 1673 857584
2 21S_rRNA 1192.194 1022.406  4563 4463 857584
3     HRA1    0.000    0.000     0  588 857584
4     LSR1 1548.272 1327.773  1592 1199 857584
5     NME1  105.715   90.659    33  364 857584
> head(anota_example_processed)
15S_rRNA 5.6848584
21S_rRNA 5.3864571
HRA1     0.5289467
LSR1     5.7882936
NME1     2.9789340

In the following paper introducing the anota package (http://www.pnas.org/content/107/50/21487.long) I found how the authors processed the sequencing data for analysis:
"For the sequencing dataset, we used the count data
supplied by the authors, filtered for identifiers originating from the coding
regions, and used quantile normalization and a transformation to stabilize
the variance."

In case I am right that my data needs processing first, could please somebody suggest how I do "quantile normalization and a transformation to stabilize the variance" with my data.
If the error I get is due to something else, please let me know how to solve my problem.
I am new to R and bioconductor, please accept my apologies if I have overlooked something obvious.

Thank you very much for your help!


 -- output of sessionInfo(): 

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)

[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252    LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C                       
[5] LC_TIME=German_Switzerland.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.4.2   limma_3.18.13 anota_1.10.0  qvalue_1.36.0

loaded via a namespace (and not attached):
 [1] Biobase_2.22.0     BiocGenerics_0.8.0 MASS_7.3-30        multtest_2.18.0    parallel_3.0.2     splines_3.0.2      stats4_3.0.2       survival_2.37-7   
 [9] tcltk_3.0.2        tools_3.0.2

Sent via the guest posting facility at bioconductor.org.

More information about the Bioconductor mailing list