[R] dplyr/summarize does not create a true data frame

John Posner john.posner at MJBIOSTAT.COM
Fri Nov 21 18:10:16 CET 2014


I got an error when trying to extract a 1-column subset of a data frame (called "my.output") created by dplyr/summarize. The ncol() function says that my.output has 4 columns, but "my.output[4]" fails. Note that converting my.output using as.data.frame() makes for a happy ending.

Is this the intended behavior of dplyr?

Tx,
John

> library(dplyr)

> # set up data frame
> rows = 100
> repcnt = 50
> sexes = c("Female", "Male")
> heights = c("Med", "Short", "Tall")

> frm = data.frame(
+   Id = paste("P", sprintf("%04d", 1:rows), sep=""),
+   Sex = sample(rep(sexes, repcnt), rows, replace=T),
+   Height = sample(rep(heights, repcnt), rows, replace=T),
+   V1 = round(runif(rows)*25, 2) + 50,
+   V2 = round(runif(rows)*1000, 2) + 50,
+   V3 = round(runif(rows)*350, 2) - 175
+ )
> 
> # use dplyr/summarize to create data frame
> my.output = frm %>%
+   group_by(Sex, Height) %>%
+   summarize(V1sum=sum(V1), V2sum=sum(V2))

> # work with columns in the output data frame
> ncol(my.output)
[1] 4

> my.output[1]
Source: local data frame [6 x 1]
Groups: Sex

     Sex
1 Female
2 Female
3 Female
4   Male
5   Male
6   Male

> my.output[4]
Error in eval(expr, envir, enclos) : index out of bounds  ######## ERROR HERE

> as.data.frame(my.output)[4]
     V2sum
1 12427.97
2  8449.82
3  8610.97
4  7249.20
5 12616.91
6 10372.15
>



More information about the R-help mailing list