[R] RODBC and Excel: Wrong Data Type Assumed on Import

Gabor Grothendieck ggrothendieck at gmail.com
Sat Nov 5 08:04:00 CET 2005


On 11/4/05, Earl F. Glynn <efg at stowers-institute.org> wrote:
> "Gabor Grothendieck" <ggrothendieck at gmail.com> wrote in message
> news:971536df0511031753y6eb4e271o797e862bc2740e1a at mail.gmail.com...
> > You could try using the COM interface rather than the ODBC
> > interface.  Try code such as this:
> >
> > library(RDCOMClient)
> > xls <- COMCreate("Excel.Application")
> > xls[["Workbooks"]]$Open("MySpreadsheet.xls")
> > sheet <- xls[["ActiveSheet"]]
> > mydata <- sheet[["UsedRange"]][["value"]]
> > xls$Quit()
> >
> > # convert mydata to a character matrix
> > mydata.char <- matrix(unlist(mydata), nc = length(xx))
>
> Gabor,
>
> Thank you for that suggestion.  I try to avoid COM, but it seems to work
> well with this problem.
>
> Because I have empty cells, which are treated as NULLS, the unlist didn't
> quite work.
>
> Here's what I did:
>
> library(RDCOMClient)
> xls <- COMCreate("Excel.Application")
> xls[["Workbooks"]]$Open("U:/efg/lab/R/Plasmid/construct list.xls")
> sheet <- xls[["ActiveSheet"]]
> mydata <- sheet[["UsedRange"]][["value"]]
> xls$Quit()
>
> for (column in 1:length(mydata))
> {
>  cat(column, " ", length(mydata[[column]]), " ",
> length(unlist(mydata[[column]])), "\n")
> }
>
> The results show that while mydata is a list of columns, if you unlist each
> column you'll be short by the number of NULL values.
>
> 1   1251   1251
> 2   1251   1198
> 3   1251   870
> 4   1251   327
> 5   1251   1250
>
> This seemed a bit crude to fix that problem (can someone suggest a more
> elegant way?):
>
> mymatrix <- NULL
> for (column in 1:length(mydata))
> {
>  # Use lappy to replace NULLs with "" strings, column-by-column
>  mymatrix <- cbind(mymatrix, lapply(mydata[[column]], function(cell) {
> ifelse(is.null(cell), "", cell) } ))
> }
> # Fix column names
> colnames(mymatrix) <- mymatrix[1,]
> mymatrix <- mymatrix[-1,]
>
> > mymatrix[273:276,]
>     Plasmid Number Plasmid
> Concentration Comments Lost
> [1,] 274            "yxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxy"         "1 ug/ul"
> "4 mg"   ""
> [2,] "275a"         "xyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyx" "1 ug/2
> ul"   ""       ""
> [3,] "275b"         "xyxyxyxyxyxyxyxyxyxyxyxyx"                   "1 ug/5
> ul"   ""       ""
> [4,] 276            "xyxyxyxyxyxyxyxyxyxyxyxyxyxy"                "1 ug/5
> ul"   ""       "Assumed Lost"
>
> Thank you for preserving "275a" and "275b" as the names here.
>
> So, I'd recommend RDCOMClient over RODBC with Excel files.  "Being lucky"
> shouldn't be part of processing Excel files.
>

You could try something like this which turns the data into
a textConnection which is read using read.table:

con <- textConnection(do.call("paste", mydata))
dd <- read.table(con, header = TRUE, na.strings = "NULL", as.is = TRUE)

You might need to vary the arguments to read.table depending on what
it is you want to get out.  Also, I have assumed that none of the
strings contain
spaces though using a sep= arg on paste and read.table could handle that
too.




More information about the R-help mailing list