[R] How to make XML support Expat?

Duncan Temple Lang duncan at wald.ucdavis.edu
Sun Oct 25 00:38:54 CEST 2009



Johannes Graumann wrote:
> Thanks for your input. If I understand correctly, XPath requires the whole 
> document to be resident in memory. That is not an option given the size of 
> documents I'm facing ... I'll go with the standard streaming implementation of 
> the XML package and see how far I get.

xmlEventParse() is intended for handling files that we don't want to keep in
memory. The branches parameter does make it easier to deal with sub-trees
as the document is being parsed.  And within these branches one can use XPath.

So how big are the files you are working with?  Suprisingly, reading
70Mb files into memory and doing XPath can be quite fast.

  D.

> 
> Thanks, Joh
> 
> On Saturday 24 October 2009 23:31:46 Duncan Temple Lang wrote:
>> Johannes Graumann wrote:
>>> Hi,
>>>
>>> I had heard that Expat is was faster. Your mail actually made me go check
>>> google for some comparisons and that does not seem the case ... do you
>>> have any insight into this?
>> A couple of points..
>>
>> i) At this point, I don't have any data about which of libxml2 and expat
>>  are faster C-level parsers
>>
>> ii) Since you are calling the parser from R and then presumably working the
>>  resluting content via manipulation in R, these R-level operations are
>>  likely to be the slower parts of the overall process.
>>
>> iii) I tend to use XPath for processing the resulting XML DOM/tree. That
>>  makes things quite fast (and also easy to express if you know XPath).
>>      expat is a parser and doesn't provide XPath facilities. So you would
>>  lose out big time in terms of speed here.
>>
>> iv)  Xerces is an alternative, but again doesn't have a full XPath
>>  implementation by itself, AFAIK.
>>
>>
>> So basically, I wouldn't prematurely worry about speed.
>> If you have a test case, you can profile the code and see
>> where the bottlenecks are.
>>
>>   D.
>>
>>> Thanks, Joh
>>>
>>> On Saturday 24 October 2009 20:38:23 Duncan Temple Lang wrote:
>>>> Hi Joh.
>>>>
>>>> What particular aspects of expat do you want that libxml2 and
>>>> the XML package currently cannot provide?
>>>>
>>>> The early versions of the XML package (for the first few years)
>>>> could support expat and libxml2 as the C++/C-level parsers.
>>>> However, the support for expat was not maintained, so while
>>>> it could be resurrected and I have thought about it at several
>>>> times, I doubt it would compile out of the box now as
>>>> expat has most likely changed significantly.
>>>>
>>>>
>>>> If you wanted to experiment with the expat support in the package,
>>>> use
>>>>
>>>>   R CMD INSTALL --configure-args='--with-expat'  XML
>>>>
>>>> and that will endeavor to find the expat libraries, etc.
>>>>
>>>>
>>>> HTH,
>>>>
>>>>   D.
>>>>
>>>> Johannes Graumann wrote:
>>>>> Hi,
>>>>>
>>>>> How can I make the result of the following lines "TRUE"?
>>>>>
>>>>>> install.packages("XML")
>>>>>> library(XML)
>>>>>> supportsExpat()
>>>>> [1] FALSE
>>>>>
>>>>> I'm on linux, looked into the actual package, but don't seem to be able
>>>>> to wrap my head around how to compile this in ...
>>>>>
>>>>> Any pointers are welcome,
>>>>>
>>>>> Thanks Joh
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html and provide commented,
>>>>> minimal, self-contained, reproducible code.




More information about the R-help mailing list