[R] Apply as.factor (or as.numeric etc) to multiple columns

Bengoechea Bartolomé Enrique (SIES 73) enrique.bengoechea at credit-suisse.com
Thu Jun 25 09:43:22 CEST 2009


Hi Mark,

I frequently need to do that when importing data. This one-liner works:

> data.frame(mapply(as, x, c("integer", "character", "factor"), SIMPLIFY=FALSE), stringsAsFactors=FALSE);

but it has two problems:

1) as() is an S4 method that does not always work 
2) writting the vector of classes for 60 variables is rather tedious.

Both issues can be solved with the following two helper functions. The first function tries to use as(x, class); if it doesn't work, tries as.<class>(x); If it still doesn't work, tries <class>(x). The second function tranforms a single string to a character vector of classes, by transforming each letter in the string to a class name (i.e. "D" is tranformed to "Date", "i" to "integer", etc.), so that writting 60 classes is fast.

doCoerce <- function(x, class) {
	if (canCoerce(x, class)) 
		as(x, class)
	else {
		result <- try(match.fun(paste("as", class, sep="."))(x), silent=TRUE);
		if (inherits(result, "try-error"))
			result <- match.fun(class)(x)
		result;		
    }
}

expandClasses <- function (x) {
    unknowns <- character(0)
    result <- lapply(strsplit(as.character(x), NULL, fixed = TRUE), 
        function(y) {
            sapply(y, function(z) switch(z, 
			i = "integer", n = "numeric", 
                l = "logical", c = "character", x = "complex", 
                r = "raw", f = "factor", D = "Date", P = "POSIXct", 
                t = "POSIXlt", N = NA_character_, {
                  unknowns <<- c(unknowns, z)
                  NA_character_
                }), USE.NAMES = FALSE)
        })
    if (length(unknowns)) {
        unknowns <- unique(unknowns)
        warning(sprintf(ngettext(length(unknowns), "code %s not recognized", 
            "codes %s not recognized"), dqMsg(unknowns)))
    }
    result
}

An example:

> x <- data.frame(X="2008-01-01", Y=1.1:3.1, Z=letters[1:3])
> data.frame(mapply(doCoerce, x, expandClasses("Dif")[[1L]], SIMPLIFY=FALSE), stringsAsFactors=FALSE);

Regards,

Enrique


------------------------------

Message: 99
Date: Tue, 23 Jun 2009 15:23:54 -0600
From: Mark Na <mtb954 at gmail.com>
Subject: [R] Apply as.factor (or as.numeric etc) to multiple columns
To: r-help at r-project.org
Message-ID:
	<e40d78ce0906231423m4c3da14i2f6270f92463c943 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi R-helpers,

I have a dataframe with 60columns and I would like to convert several
columns to factor, others to numeric, and yet others to dates. Rather
than having 60 lines like this:

data$Var1<-as.factor(data$Var1)

I wonder if it's possible to write one line of code (per data type,
e.g. factor) that would apply a function (e.g., as.factor) to several
(non-contiguous) columns. So, I could then use 3 or 4 lines of code
(for 3 or 4 data types) instead of 60.

I have tried writing an apply function, but it failed.

Thanks for any help you might be able to provide.

Mark Na




More information about the R-help mailing list