[BioC] read gz compressed wig files?
Hamid Bolouri
hbolouri at gmail.com
Sun Apr 3 20:10:55 CEST 2011
Thanks very much Michael. I'll use bigWig where it's available, and
process wig-only datasets using wigLines on large-memory nodes. AOK.
Thanks to everyone for all the help.
Hamid
On Fri, Apr 1, 2011 at 9:18 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
>
>
> On Fri, Apr 1, 2011 at 5:43 PM, Hamid Bolouri <hbolouri at gmail.com> wrote:
>>
>> Thanks for the suggestion Michael.
>>
>> FYI: doing import(x, format = "wigLines") on a ~220MB (unzipped)
>> ENCODE wig file, R crashed after about an hour with the minimalist
>> Unix (Ubuntu) message: 'Killed'. I am guessing a memory limit issue
>> (which is what I get trying the same command on a Windows PC).
>>
>
> Yep, just ran out of memory. A fixedStep WIG file like this will be 4X its
> file size in R as a RangedData/GRanges, so 1GB. Given that the parser is
> written in R and thus is making various copies of the data, it's going to
> take at least 4GB of memory to parse a file like this.
>
> I would recommend using bigWig (import.bw) instead. That way, you can just
> read in the parts of the data that you need, and incremental processing
> becomes possible.
>
> This might be the bigWig you want:
>
> http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHistoneK562H3k9me1StdSig.bigWig
>
>
>>
>> Thanks
>>
>> Hamid
>> FYI, sessionInfo for a restarted session:
>> > sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] rtracklayer_1.10.6 RCurl_1.4-3 bitops_1.0-4.1
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.10.0 Biostrings_2.18.0 BSgenome_1.18.2
>> [4] GenomicRanges_1.2.1 IRanges_1.8.8 tools_2.12.0
>> [7] XML_3.2-0
>>
>>
>> On Fri, Apr 1, 2011 at 2:33 PM, Michael Lawrence
>> <lawrence.michael at gene.com> wrote:
>> > Btw, rtracklayer has an internal function, import.wigLines() that can
>> > parse
>> > the lines after the track line. Could try using that. Can use import(x,
>> > format = "wigLines") to get there.
>> >
>> > On Fri, Apr 1, 2011 at 1:25 PM, Hamid Bolouri <hbolouri at gmail.com>
>> > wrote:
>> >>
>> >> Martin, Steve; Thank you both much.
>> >>
>> >> Pretty amazing that hundreds of ENCODE data files mayt be
>> >> 'non-standard'. Lesson learnt.
>> >>
>> >> Thanks again,
>> >>
>> >> Hamid
>> >>
>> >> On Fri, Apr 1, 2011 at 12:21 PM, Martin Morgan <mtmorgan at fhcrc.org>
>> >> wrote:
>> >> > On 04/01/2011 08:29 AM, Steve Lianoglou wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> On Thu, Mar 31, 2011 at 7:03 PM, Hamid Bolouri<hbolouri at gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> Thanks Steve;
>> >> >>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> import.wig(gzopen('C:\\...pathto...\\wgEncodeBroadChipSeqSignalK562H3k9me1.wig.gz'))
>> >> >>>
>> >> >>> Error in
>> >> >>>
>> >> >>>
>> >> >>> import.wig(gzopen("C:\\Users\\hbolouri\\Desktop\\ENCODE_data\\wgEncodeBroadChipSeqSignalK562H3k9me1.wig.gz"))
>> >> >>> :
>> >> >>> error in evaluating the argument 'con' in selecting a method for
>> >> >>> function 'import.wig'
>> >> >>>
>> >> >>>> traceback()
>> >> >>>
>> >> >>> 1:
>> >> >>>
>> >> >>>
>> >> >>> import.wig(gzopen("C:\\...\\wgEncodeBroadChipSeqSignalK562H3k9me1.wig.gz"))
>> >> >>>
>> >> >>>
>> >> >>> using gzfile instead of gzopen avoids the error message, but seems
>> >> >>> to
>> >> >>> produce an empty object
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> import.wig(gzfile('C:\\...pathto...\\wgEncodeBroadChipSeqSignalK562H3k9me1.wig.gz'))
>> >> >>>
>> >> >>> RangedDataList of length 0
>> >> >>
>> >> >> If you unzip the file and read it in "as normal", does it work
>> >> >> differently?
>> >> >
>> >> > I think the basic problem is that these files are not strictly
>> >> > wiggle,
>> >> > missing an initial 'track' line:
>> >> >
>> >> >> head wgEncodeBroadChipSeqSignalHepg2H3k27ac.wig
>> >> > fixedStep chrom=chr1 start=1 step=25
>> >> > 113
>> >> > 136
>> >> >
>> >> > Martin
>> >> >
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > Computational Biology
>> >> > Fred Hutchinson Cancer Research Center
>> >> > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>> >> >
>> >> > Location: M1-B861
>> >> > Telephone: 206 667-2793
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> http://labs.fhcrc.org/bolouri
>> >
>> >
>>
>>
>>
>> --
>> http://labs.fhcrc.org/bolouri
>
>
--
http://labs.fhcrc.org/bolouri
More information about the Bioconductor
mailing list