[R] memDecompress and zlib compressed base64 encoded string
Johannes Graumann
johannes_graumann at web.de
Fri Jan 15 11:48:41 CET 2010
Prof Brian Ripley wrote:
>> I have zlib compressed strings (example is attached)
>
> What is that file? Not gzip compression:
>
> gannet% file compressed.txt
> compressed.txt: ASCII text, with very long lines
>
> since gzip uses a magic header that 'file' knows about. And even if
> the header was stripped, such files are 8-bit and yours is ASCII.
> Try
>> x <- 'Johannes Graumann'
>> xx <- charToRaw(x)
>> xxx <- memCompress(xx, "g")
>> rawToChar(xxx)
> [1] "x\x9c\xf3\xca\xcfH\xcc\xcbK-Vp/J,\xcd\0052\001:\n\006\x90"
>
> to see what a real gzipped string looks like.
>
>> and would like to decompress them using memDecompress ...
>>
>> I try this:
>>> connection <- file("compressed.txt","r")
>>> compressed <- readLines(connection)
I am dealing with mass spectrometric data in a XML file format (mzXML). The
biggest part of the contained data is actual mass spectra that are base64
encoded and optionally compressed using http://zlib.net (saving quite some
storage space). When they are compressed I just get an XML node that looks
like this
<peaks>CONTENT OF THE ORIGINAL ATTACHMENT HERE</peaks>
I would like to be able to decompress that string and thought that
memDecompress was the right tool to do so ...
> You have not told us the 'at a minimum' information requested in the
> posting guide. But you should not expect that to read a binary file,
> especially not in a MBCS locale. We have readBin for that purpose.
I'm actually reading this in as a string from the XML file ...
>>> memDecompress(as.raw(compressed),type="g")
>
> I don't think you know what as.raw does: it does not convert bytes in
> a character string to raw (for which you need charToRaw).
>
> It is always a good idea to look at each stage of your computation:
>
>> as.raw(compressed)
> [1] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00
> [26] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00
Yup, that was plain stupid and trying to make memDecompress run at all
(since handing it the character string also resulted in an error.
> sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-pc-linux-gnu
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
[9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rkward_0.5.1
loaded via a namespace (and not attached):
[1] tools_2.10.1
Thanks for any further hints, Joh
More information about the R-help
mailing list