[R] Importing data from text file with mixed format
delnatan
delnatan at gmail.com
Mon Oct 26 18:01:05 CET 2009
All these have been really helpful. Once again I see that anything's possible
in R!
Thank you for the suggestion Bill, I think arranging the data in one data
frame is a good idea.
-Daniel
William Dunlap wrote:
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of delnatan
>> Sent: Saturday, October 24, 2009 8:32 PM
>> To: r-help at r-project.org
>> Subject: [R] Importing data from text file with mixed format
>>
>>
>> Hi,
>> I'm having difficulty importing my textfile that looks
>> something like this:
>>
>> #begin text file
>> Timepoint 1
>> ObjectNumber Volume SurfaceArea
>> 1 5.3 9.7
>> 2 4.9 8.3
>> 3 5.0 9.1
>> 4 3.5 7.8
>>
>> Timepoint 2
>> ObjectNumber Volume SurfaceArea
>> 1 5.1 9.0
>> 2 4.7 8.9
>> 3 4.3 8.3
>> 4 4.2 7.9
>>
>> ... #goes on to Timepoint 80
>>
>> How would I import this data into a list containing
>> data.frame for each
>> timepoint?
>> I'd like my data to be organized like this:
>>
>> >myList
>> [[1]]
>> ObjectNumber Volume SurfaceArea
>> 1 1 5.3 9.7
>> 2 2 4.9 8.3
>> 3 3 5.0 9.1
>> 4 4 3.5 7.8
>>
>> [[2]]
>> ObjectNumber Volume SurfaceArea
>> 1 1 5.1 9.0
>> 2 2 4.7 8.9
>> 3 3 4.3 8.3
>> 4 4 4.2 7.9
>
> The following function reads that text file into one data.frame,
> which has a Timepoint column, which is a format I usually find
> more convenient. You can use split(data, data$Timepoint)
> to get to the format you asked for. If you use the one-data-frame
> format you can use the cast and melt functions from the reshape
> package to rearrange it.
>
> readMyData <- function (file) {
> # read every line in the file
> lines <- readLines(file)
> # drop empty lines
> lines <- grep("^[[:space:]]*$", lines, value=TRUE, invert=TRUE)
> # find and check header lines
> isHeaderLine <- regexpr("^ObjectNumber", lines) > 0
> if (sum(isHeaderLine)==0)
> stop("No header lines of form 'ObjectNumber ...'")
> if (length(u <- unique(lines[isHeaderLine]))>1)
> stop("Header lines vary: ", paste(sQuote(head(u)), collapse=",
> "))
> col.names <- strsplit(lines[which(isHeaderLine)[1]],
> "[[:space:]]+")[[1]]
> # after making column names from header lines, drop header lines
> lines <- lines[!isHeaderLine]
> # process Timepoint lines
> isTimepointLine <- regexpr("^Timepoint", lines) > 0
> if (sum(isTimepointLine)==0)
> stop("No lines of form 'Timepoint <number>'")
> timepoints <- sub("^Timepoint[[:space:]]*", "",
> lines[isTimepointLine])
> timepoints <- as.integer(timepoints)
> if (any(is.na(timepoints)))
> stop("Non-integer found in a Timepoint line: ",
> sQuote(lines[isTimepointLine][which(is.na(timepoints))[1]]))
> nRowsPerTimepoint <-
> diff(c(which(isTimepointLine),length(isTimepointLine)+1)) - 1
> # drop Timepoint lines. Remaining lines should be data lines
> lines <- lines[!isTimepointLine]
> # An error in read.table means there were lines we should have
> dropped
> result <- read.table(header=FALSE,
> row.names=NULL,
> col.names=col.names,
> textConnection(lines))
> # Add Timepoint column
> result$Timepoint <- rep(timepoints, nRowsPerTimepoint)
> result
> }
>
> E.g.,
>> data <- readMyData("c:/temp/t.txt")
>> data
> ObjectNumber Volume SurfaceArea Timepoint
> 1 1 5.3 9.7 1
> 2 2 4.9 8.3 1
> 3 3 5.0 9.1 1
> 4 4 3.5 7.8 1
> 5 1 5.1 9.0 2
> 6 2 4.7 8.9 2
> 7 3 4.3 8.3 2
> 8 4 4.2 7.9 2
>> split(data, data$Timepoint)
> $`1`
> ObjectNumber Volume SurfaceArea Timepoint
> 1 1 5.3 9.7 1
> 2 2 4.9 8.3 1
> 3 3 5.0 9.1 1
> 4 4 3.5 7.8 1
>
> $`2`
> ObjectNumber Volume SurfaceArea Timepoint
> 5 1 5.1 9.0 2
> 6 2 4.7 8.9 2
> 7 3 4.3 8.3 2
> 8 4 4.2 7.9 2
>> mdata <- melt(data, id=c("ObjectNumber","Timepoint"))
>> cast(mdata, Timepoint~variable, fun.aggregate=c,
> subset=variable=="SurfaceArea")
> Timepoint SurfaceArea_X1 SurfaceArea_X2 SurfaceArea_X3 SurfaceArea_X4
> 1 1 9.7 8.3 9.1 7.8
> 2 2 9.0 8.9 8.3 7.9
>> cast(mdata, ObjectNumber~variable, fun.aggregate=c,
> subset=variable=="SurfaceArea")
> ObjectNumber SurfaceArea_X1 SurfaceArea_X2
> 1 1 9.7 9.0
> 2 2 8.3 8.9
> 3 3 9.1 8.3
> 4 4 7.8 7.9
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>>
>> -Daniel
>> --
>> View this message in context:
>> http://www.nabble.com/Importing-data-from-text-file-with-mixed
> -format-tp26045031p26045031.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context: http://www.nabble.com/Importing-data-from-text-file-with-mixed-format-tp26045031p26063496.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list