[BioC] Problems with iteration (sappily) over RNAStringSet
Kemal Akat
kakat at mail.rockefeller.edu
Thu Jul 5 23:15:54 CEST 2012
Hi,
I want to iterate over an RNAStringSet (rs) to do a calculation for each
of the sequences in the form of:
1) get the sequence
2) do the calculations
3) plot the results and
4) use the sequence name (names(rs) in plot legends and titles,
e.g.
plot(x, main = paste(sequence_name, 'in condition X'), sep = ' ').
The name I want to use is the first field from the FASTA description,
and I don't want to use the other information. However,
the extraction of the name does not work as assumed.
The input FASTA file looks like this:
> Gene1 Description
UUUUUUUUUUUUUUUUUUUUUUU
> Gene2 Description
AAAAAAAAAAAAAAAAAAAAAAA
> Gene3 Description
GGGGGGGGGGGGGGGGGGGGGGG
> Gene4 Description
CCCCCCCCCCCCCCCCCCCCCCC
library("Biostrings")
rs = read.RNAStringSet('test.fa')
R> rs
A RNAStringSet instance of length 4
width seq names
[1] 23 UUUUUUUUUUUUUUUUUUUUUUU Gene1 Description
[2] 23 AAAAAAAAAAAAAAAAAAAAAAA Gene2 Description
[3] 23 GGGGGGGGGGGGGGGGGGGGGGG Gene3 Description
[4] 23 CCCCCCCCCCCCCCCCCCCCCCC Gene4 Description
The following commands return what I was expecting:
R> strsplit(names(rs), split = ' ')[[1]][1]
[1] "Gene1"
R> strsplit(toString(rs), split = ',')[[1]][1]
[1] "UUUUUUUUUUUUUUUUUUUUUUU"
To iterate I wrote this function:
myFun = function(x){
name = strsplit(names(x), split = ' ')[[1]][1]
seq = strsplit(toString(x), split = ',')[[1]][1]
names(seq) = name
return(seq)
}
However, this returns an error:
R> myFun = function(x){
+ name = strsplit(names(x), split = ' ')[[1]][1]
+ seq = strsplit(toString(x), split = ',')[[1]][1]
+ names(seq) = name
+ return(seq)
+ }
R> sapply(y, myFun)
Error in strsplit(names(x), split = " ") : non-character argument
Calls: sapply ... lapply -> lapply -> lapply -> FUN -> FUN -> strsplit
Simplyfing the function to
R> myFun = function(x){
+ seq = strsplit(toString(x), split = ',')[[1]][1]
+ }
Returns the full sequence names as entered in the original FASTA file.
R> sapply(rs, myFun)
Gene1 Description Gene2 Description Gene3 Description
"UUUUUUUUUUUUUUUUUUUUUUU" "AAAAAAAAAAAAAAAAAAAAAAA" "GGGGGGGGGGGGGGGGGGGGGGG"
Gene4 Description
"CCCCCCCCCCCCCCCCCCCCCCC"
I would appreciate if anyone could offer a solution or explain why the strsplit
does not work with the looping (sapply)?
Thank you!
Kemal
R> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] illuminaHumanv4.db_1.14.0 org.Hs.eg.db_2.7.1
[3] RSQLite_0.11.1 DBI_0.2-5
[5] AnnotationDbi_1.18.1 beadarray_2.6.0
[7] Biobase_2.16.0 ShortRead_1.14.4
[9] latticeExtra_0.6-19 RColorBrewer_1.0-5
[11] Rsamtools_1.8.5 lattice_0.20-6
[13] GenomicRanges_1.8.7 ggplot2_0.9.1
[15] edgeR_2.6.7 limma_3.12.1
[17] Biostrings_2.24.1 IRanges_1.14.3
[19] BiocGenerics_0.2.0 colorout_0.9-9
loaded via a namespace (and not attached):
[1] BeadDataPackR_1.8.0 bitops_1.0-4.1 colorspace_1.1-1
[4] dichromat_1.2-4 digest_0.5.2 grid_2.15.0
[7] hwriter_1.3 labeling_0.1 MASS_7.3-18
[10] memoise_0.1 munsell_0.3 plyr_1.7.1
[13] proto_0.3-9.2 reshape2_1.2.1 scales_0.2.1
[16] stats4_2.15.0 stringr_0.6 tools_2.15.0
[19] zlibbioc_1.2.0
More information about the Bioconductor
mailing list