[R] Having trouble converting a dataframe of character vectors to factors
William Dunlap
wdunlap at tibco.com
Thu Feb 21 17:32:55 CET 2013
> scs2<-data.frame(lapply(scs2, factor))
Calling data.frame() on the output of lapply() can result in changing column names
and will drop attributes that the input data.frame may have had. I prefer to modify
the original data.frame instead of making a new one from scratch to avoid these problems.
Also, calling factor() on a factor will drop any unused levels, which you may not want
to do. Calling as.factor will not.
Compare the following three methods
f1 <- function (dataFrame) {
dataFrame[] <- lapply(dataFrame, factor)
dataFrame
}
f2 <- function (dataFrame) {
dataFrame[] <- lapply(dataFrame, as.factor)
dataFrame
}
f3 <- function (dataFrame) {
data.frame(lapply(dataFrame, factor))
}
on the following data.frame
x <- data.frame(stringsAsFactors=FALSE, check.names=FALSE,
"No/Yes" = factor(c("Yes","Yes","Yes"), levels=c("No","Yes")),
"Size" = ordered(c("Small","Large","Medium"), levels=c("Small","Medium","Large")),
"Name" = c("Adam","Bill","Chuck"))
attr(x, "Date") <- as.POSIXlt("2013-02-21")
> str(x)
'data.frame': 3 obs. of 3 variables:
$ No/Yes: Factor w/ 2 levels "No","Yes": 2 2 2
$ Size : Ord.factor w/ 3 levels "Small"<"Medium"<..: 1 3 2
$ Name : chr "Adam" "Bill" "Chuck"
- attr(*, "Date")= POSIXlt, format: "2013-02-21"
> str(f1(x)) # drops unused levels
'data.frame': 3 obs. of 3 variables:
$ No/Yes: Factor w/ 1 level "Yes": 1 1 1
$ Size : Ord.factor w/ 3 levels "Small"<"Medium"<..: 1 3 2
$ Name : Factor w/ 3 levels "Adam","Bill",..: 1 2 3
- attr(*, "Date")= POSIXlt, format: "2013-02-21"
> str(f2(x))
'data.frame': 3 obs. of 3 variables:
$ No/Yes: Factor w/ 2 levels "No","Yes": 2 2 2
$ Size : Ord.factor w/ 3 levels "Small"<"Medium"<..: 1 3 2
$ Name : Factor w/ 3 levels "Adam","Bill",..: 1 2 3
- attr(*, "Date")= POSIXlt, format: "2013-02-21"
> str(f3(x)) # mangles column names, drops unused levels, drops Date attribute
'data.frame': 3 obs. of 3 variables:
$ No.Yes: Factor w/ 1 level "Yes": 1 1 1
$ Size : Ord.factor w/ 3 levels "Small"<"Medium"<..: 1 3 2
$ Name : Factor w/ 3 levels "Adam","Bill",..: 1 2 3
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Mark Lamias
> Sent: Wednesday, February 20, 2013 6:51 PM
> To: Daniel Lopez; R help (r-help at r-project.org)
> Subject: Re: [R] Having trouble converting a dataframe of character vectors to factors
>
> How about this?
>
> scs2<-data.frame(lapply(scs2, factor))
>
>
>
>
> ________________________________
> From: "Lopez, Dan" <lopez235 at llnl.gov>
> To: "R help (r-help at r-project.org)" <r-help at r-project.org>
> Sent: Wednesday, February 20, 2013 7:09 PM
> Subject: [R] Having trouble converting a dataframe of character vectors to factors
>
> R Experts,
>
> I have a dataframe made up of character vectors--these are results from survey
> questions. I need to convert them to factors.
>
> I tried the following which did not work:
> scs2<-sapply(scs2,as.factor)
> also this didn't work:
> scs2<-sapply(scs2,function(x) as.factor(x))
>
> After doing either of above I end up with
> >str(scs2)
>
> chr [1:10, 1:10] "very important" "very important" "very important" "very important" ...
>
> - attr(*, "dimnames")=List of 2
>
> ..$ : NULL
>
> ..$ : chr [1:10] "Q1_1" "Q1_2" "Q1_3" "Q1_4" ...
>
> >class(scs2)
> "matrix"
>
> But when I do it one at a time it works:
> scs2$Q1_1<-as.factor(scs2$Q1_1)
> scs2$Q1_2<- as.factor(scs2$Q1_2)
>
> What am I doing wrong? How do I accomplish this with sapply or similar function?
>
> Data for reproducibility:
>
>
> scs2<-structure(list(Q1_1 = c("very important", "very important", "very important",
>
> "very important", "very important", "very important", "very important",
>
> "somewhat important", "important", "very important"), Q1_2 = c("important",
>
> "somewhat important", "very important", "important", "important",
>
> "very important", "somewhat important", "somewhat important",
>
> "very important", "very important"), Q1_3 = c("very important",
>
> "important", "very important", "very important", "important",
>
> "very important", "very important", "somewhat important", "not important",
>
> "important"), Q1_4 = c("very important", "important", "very important",
>
> "very important", "important", "important", "important", "very important",
>
> "somewhat important", "important"), Q1_5 = c("very important",
>
> "not important", "important", "very important", "not important",
>
> "important", "somewhat important", "important", "somewhat important",
>
> "not important"), Q1_6 = c("very important", "not important",
>
> "important", "very important", "somewhat important", "very important",
>
> "very important", "very important", "important", "important"),
>
> Q1_7 = c("very important", "somewhat important", "important",
>
> "somewhat important", "important", "important", "very important",
>
> "very important", "somewhat important", "not important"),
>
> Q2 = c("Somewhat", "Very Much", "Somewhat", "Very Much",
>
> "Very Much", "Very Much", "Very Much", "Very Much", "Very Much",
>
> "Very Much"), Q3 = c("yes", "yes", "yes", "yes", "yes", "yes",
>
> "yes", "yes", "yes", "yes"), Q4 = c("None", "None", "None",
>
> "None", "Confirmed Field of Study", "Confirmed Field of Study",
>
> "Confirmed Field of Study", "None", "None", "None")), .Names = c("Q1_1",
>
> "Q1_2", "Q1_3", "Q1_4", "Q1_5", "Q1_6", "Q1_7", "Q2", "Q3", "Q4"
>
> ), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L,
>
> 172L, 110L), class = "data.frame")
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> [[alternative HTML version deleted]]
More information about the R-help
mailing list