[R] parse XML file

Ben Tupper btupper at bigelow.org
Wed Jun 29 13:57:23 CEST 2011


Hi,

On Jun 29, 2011, at 6:26 AM, Kai Serschmarn wrote:

> Thank you Barry, that works fine.
> Sorry for stupid questions... however, I couldn't manage to get a  
> dataframe out of this.
>
> That's what I was doing:
>
> doc = xmlRoot(xmlTreeParse("de.dwd.klis.TADM.xml"))
> dumpData <-  function(doc){
> 	for(i in 1:length(doc)){
> 		stns = doc[[i]]
> 	for (j in 1:length(stns)){
> 		cat(stns$attributes['value'],stns[[j]][[1]]$value,stns[[j]] 
> $attributes['date'],"\n")
> 		}
> 		}
> 		}
> dumpData(doc)
>

Perhaps this would work for you.  It generates a list of data frames,  
one for each station.

###### BEGIN

## start with your doc - split it into a list of nodes (one for each  
child)
stn <-  xmlChildren(doc)


# converts a station node to a data frame
getMyStation <- function(x){

    # get the name of the station
    stationName <- xmlAttrs(x)["value"]

    # a function to extract the date and value
    getMyRecords <- function(x){
       date <- xmlAttrs(x)["date"]
       val <- xmlValue(x)
       y <- c( date, val)
       return(y)
    }

    # for each child, extract the records
    r <- lapply(x, getMyRecords)
    nR <- length(r)

    # bind into one matrix - all characters as this point
    y <- do.call(rbind, r)

    # make a data.frame
    df <- data.frame("Station" = rep(stationName, nR), "date" = y[,1],  
"value" = y[,2],
       row.names = 1:nR, stringsAsFactors = FALSE)

    return(df)
}


# now loop through the station nodes - extract data into a data frame
x <- lapply(stn, getMyStation)

##### END


Cheers,
Ben

Ben Tupper
Bigelow Laboratory for Ocean Sciences
180 McKown Point Rd. P.O. Box 475
West Boothbay Harbor, Maine   04575-0475
http://www.bigelow.org/



More information about the R-help mailing list