[R] How does the data.frame function generate column names?
Joshua Wiley
jwiley.psych at gmail.com
Mon Jan 24 01:22:54 CET 2011
Hi,
Welcome to R! What you have run into is a feature of how subsetting
works. By default, it converts to the lowest possible dimensions.
The odd name you see, "d.8.10...c..", is an attempt to convert "
d[8:10, "c"] " into a valid name. R does this approximately by
converting disallowed characters (like ":") into periods (.). This is
because data.frame() uses whatever was passed to it as the name of the
column, unless whatever it is already has a column name. Here is some
code (you should be able to copy and paste), with comments that
explains a bit further and hopefully gives you a better feel for
indexing and creating data frame objects.
Cheers,
Josh
################################################
## your data (in one step)
d <- data.frame(a = 1:10, b = 11:20, c = 21:30)
## because only one column of 'd' is selected, the conversion
## to lowest possible dimensions is 1 (a vector)
## and that loses its column name, so use drop = FALSE
f <- data.frame(d[8:10, "c", drop = FALSE])
## another option is to explicitly name the column
g <- data.frame(c = d[8:10, "c"])
## here you have selected two columns so there must
## be at least two dimensions, and names are kept
g2 <-data.frame(d[8:10, c("b", "c")])
## to "see" what is happening
d[8:10, "c", drop = FALSE]
d[8:10, "c", drop = TRUE] # default
## for more details, see the documentation
?"[" # see the "drop" argument description
?data.frame # under the "value" section on names
################################################
On Sun, Jan 23, 2011 at 1:53 PM, H Roark <hrbuilder at hotmail.com> wrote:
>
> Hi all,
>
> I'm a new R user and am confused about how R behaves when converting a vector to a data frame when using the data.frame function. I'm specifically interested in cases where the vector is expressed as a subset of another data frame. For example, say I want to create a data frame from the last three rows of the third column of the data frame, d, that I've created below:
>
> a<-(1:10)
> b<-(11:20)
> c<-(21:30)
> d<-data.frame(a,b,c)
>
> To do that, I know that I could do:
>
> e<-d[8:10,"c"]
> f<-data.frame(e)
>
> However, I would like for the single column in the data frame, f, to be named "c". Obviously, I could just use the vector, c<-d[8:10,"c"], in place of the vector e. However, I wonder why I can't do:
>
> g<-data.frame(d[8:10,"c"])
>
> This expression returns the proper values, but the resulting variable is named "d.8.10...c.." and not "c" as I expected it to be named.
>
> Could someone explain the mechanics of this statement and tell me why it produced such an oddly named variable? I'm especially confused as to why I get the result I expect if I use the data.frame function on multiple vectors, as in:
>
> g2<-data.frame(d[8:10,c("b","c")])
>
> which produces a data frame with columns named "b" and "c".
>
> Many thanks in advance,
> Alec
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/
More information about the R-help
mailing list