[BioC] Create a sample file /tab delimited file in QuasR
Michael Stadler
michael.stadler at fmi.ch
Fri Nov 1 09:11:29 CET 2013
Hi Chris,
The problem is that "sampleFile" should not contain any sequences; it is
just the file name of a plain text file, which in turn lists all
sequence files to be processed by QuasR.
QuasR comes with a few example files. Since your dataset is single read
(see http://www.ncbi.nlm.nih.gov/sra?term=SRX220452), the following
would be a good starting point:
system.file("extdata","samples_chip_single.txt",package="QuasR")
The easiest is to copy this file and then edit it in a plain text
editor, such as Notepad or Textedit, replacing the file paths and sample
names with the ones from your analysis.
Alternatively, the following code should create such a file in your
current working directory with the name "mysamples.txt" (you'll need to
replace "path/to/seqs" with the correct path, and you can use "/" as the
directory separator):
tab <- data.frame(FileName=c("C:/path/to/seqs/SRR653521.fastq"),
SampleName=c("Sample1"))
write.table(tab, "mysamples.txt", quote=FALSE, row.names=FALSE,
col.names=TRUE, sep="\t")
One more comment to the Ecoli BSgenome object that you intend to use as
a reference: This BSgenome contains several genomes from different
E.coli strains:
library("BSgenome.Ecoli.NCBI.20080805")
help(package="BSgenome.Ecoli.NCBI.20080805")
seqlengths(Ecoli)
NC_008253 NC_008563 NC_010468 NC_004431 NC_009801 NC_009800 NC_002655
4938920 5082025 4746218 5231428 4979619 4643538 5528445
NC_002695 NC_010498 NC_007946 NC_010473 NC_000913 AC_000091
5498450 5068389 5065741 4686137 4639675 4646332
QuasR will treat these genomes as if they would be seperate chromosomes
of a single genome, which may not be what you want. For example, reads
mapping to regions that are identical in the separate genomes will be
randomly assigned. It is probably preferrable to select one of the
genomes as a reference, save it to a fasta file, and use that instead
for your "genomeName":
singleGenome <- as(Ecoli[["NC_008253"]], "DNAStringSet")
names(singleGenome) <- "NC_008253"
writeXStringSet(singleGenome, "Ecoli_genome_NC_008253.fa")
genomeName <- "Ecoli_genome_NC_008253.fa"
I hope this helps,
Michael
On 31.10.2013 17:30, chris [guest] wrote:
>
> I am somewhat new to R and am trying to load a file into QuasR. I downloaded the file e coli chip-seq data set SRR653521.sra from NCBIs SRA database (SRA accession number: SRX220452, GEO accession number: GSM1072327). I used ncbi sratoolkit to convert this to SRR653521.fastq. I am trying to load this into R using QuasR to do alignment, GO/pathway analysis, etc, but first need to get the fastq data into R for use with QuasR. I am using Windows 7, with current R, Bioconductor, and required packages.
>
> I followed the vignette found with browseVignettes(), and opened "An Introduction to QuasR". As in section 2.3, I loaded:
>
> library(QuasR)
> library(BSgenome)
> library(Rsamtools)
> library(rtracklayer)
> library(GenomicFeatures)
> library(Gviz)
> as well as library(ShortRead)
>
> for the reference genome I typed:
> available.genomes();
> genomeName="BSgenome.Ecoli.NCBI.20080805"
>
> and for the sample file, I initially tried:
> sampleFile=readFastq("C:\\Users\\Chris\\Documents\\SRA\\SRX220452\\SRR653521\\SRR653521.fastq")
>
> This did not work, so I tried to make a matrix for use in writing a tab delimited sample file, but was unsuccessful:
>
>> sampleMatrix=matrix(c("C:\\Users\\Chris\\Documents\\SRA\\SRX220452\\SRR653521\\SRR653521.fastq"=FileName, Sample1=SampleName),nrow=2,ncol=2,byrow=TRUE)
> Error in matrix(c(`C:\\Users\\Chris\\Documents\\SRA\\SRX220452\\SRR653521\\SRR653521.fastq` = FileName, :
> object 'FileName' not found
>
>> sampleMatrix=matrix(c("C:\\Users\\Chris\\Documents\\SRA\\SRX220452\\SRR653521\\SRR653521.fastq", Sample1),nrow=2,ncol=2,byrow=TRUE, dimnames=c(flies,samples)
> + )
> Error in matrix(c("C:\\Users\\Chris\\Documents\\SRA\\SRX220452\\SRR653521\\SRR653521.fastq", :
> object 'Sample1' not found
>
> Can somebody please tell give me an example of code/syntax that will allow me to create a sample file? Also, I am not sure if my fastq file is for single end read, or paired end read. The examples listed in the vignette are only for files found in the "extdata" folder, and I don't know how to proceed. As soon as I get a sample file, I can probably figure out the rest. I hope this question isn't too basic, and thanks in advance for any help!
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] grid parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] BSgenome.Ecoli.NCBI.20080805_1.3.17 BiocInstaller_1.12.0 Gviz_1.6.0 GenomicFeatures_1.14.0
> [5] AnnotationDbi_1.24.0 Biobase_2.22.0 rtracklayer_1.22.0 BSgenome_1.30.0
> [9] ShortRead_1.20.0 Rsamtools_1.14.1 lattice_0.20-24 Biostrings_2.30.0
> [13] QuasR_1.2.0 Rbowtie_1.2.0 GenomicRanges_1.14.3 XVector_0.2.0
> [17] IRanges_1.20.3 BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.18.0 biovizBase_1.10.0 bitops_1.0-6 cluster_1.14.4 colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 Hmisc_3.12-2
> [9] hwriter_1.3 labeling_0.2 latticeExtra_0.6-26 munsell_0.4.2 plyr_1.8 RColorBrewer_1.0-5 RCurl_1.95-4.1 rpart_4.1-3
> [17] RSQLite_0.11.4 scales_0.2.3 stats4_3.0.2 stringr_0.6.2 tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
More information about the Bioconductor
mailing list