[R] Help with reshape/reShape and indexing

Wed May 13 15:02:42 CEST 2009

Hi Dana,

> ---------- Forwarded message ----------
> From: Dana Sevak <dana.sevak at yahoo.com>
> To: r-help at r-project.org
> Date: Tue, 12 May 2009 23:02:00 -0700 (PDT)
> Subject: [R] Help with reshape/reShape and indexing
>
> Dear R Helpers,
>
> I have trouble applying reShape and reshape although I read the documentation and several posts, so I would very much appreciate your help on the two points below.
>
There are usually many ways to accomplish any given task in R, and
which one you use is a matter of preference. I've settled on use the
reshape package for these kinds of tasks. If you're comfortable with
the solutions already suggested there's no need to continue reading.
Otherwise here's another approach:

> I have a dataframe
>
> df = data.frame(Name=c("a", "a", "a", "b", "b", "c"), X1=c("12", "13", "14", "20", "25", "30"), X2 = c(200, 250, 300, 600, 700, 4))
>
> > df
> Name X1  X2
> 1    a 12 200
> 2    a 13 250
> 3    a 14 300
> 4    b 20 600
> 5    b 25 700
> 6    c 30 900
>
> First I need to add an additional column to this dataframe that will count the number of rows per each Name entry.  The resulting df should look like:
>
> df.index = data.frame(Name=c("a", "a", "a", "b", "b", "c"), X1=c(12, 13, 14, 20, 25, 30), X2 = c(200, 250, 300, 600, 700, 4), Index=c(1,2,3,1,2,1))
>
> > df.index
> Name X1  X2    Index
> 1    a 12 200    1
> 2    a 13 250    2
> 3    a 14 300    3
> 4    b 20 600    1
> 5    b 25 700    2
> 6    c 30 900    1
>
> How can I do this?
>
Easy enough with the plyr package (loaded with reshape):

df = data.frame(Name=c("a", "a", "a", "b", "b", "c"), X1=c("12", "13",
"14", "20", "25", "30"), X2 = c(200, 250, 300, 600, 700, 4))
library(reshape)
df$Index <- ddply(df, "Name", colwise(seq_along))[,1]

>
> Secondly, I would like to reshape this dataframe in the form:
>
> > df2
>  1  2  3
> a 12 13 14
> b 20 25 NA
> c 30 NA NA
>
> Since the df is sorted by Name and X2, I would need that the available X1 values populate the resulting rows in df2 from left to right (i.e. if only one value is available, it is written in the first column and the remaining columns get NAs).

I don't really understand this. What happened to X2? Anyway, I would
do it like this:

> df$X2 <- NULL
> m.df <- melt(df, measure.vars="X1")
> df.final <- cast(m.df, ... ~ Index)
> df.final
  Name variable  1    2    3
1    a       X1 12   13   14
2    b       X1 20   25 <NA>
3    c       X1 30 <NA> <NA>

But I don't see why you want to drop X2, so I would actually do

df = data.frame(Name=c("a", "a", "a", "b", "b", "c"), X1=c("12", "13",
"14", "20", "25", "30"), X2 = c(200, 250, 300, 600, 700, 4))
df$Index <- ddply(df, "Name", colwise(seq_along))[,1]
df$X2 <- as.character(df$X2)
m.df <- melt(df, measure.vars=c("X1","X2"))
df.final <- cast(m.df, ... ~ Index)
df.final
  Name variable   1    2    3
1    a       X1  12   13   14
2    a       X2 200  250  300
3    b       X1  20   25 <NA>
4    b       X2 600  700 <NA>
5    c       X1  30 <NA> <NA>
6    c       X2   4 <NA> <NA>

All the best,
Ista
>  If I could generate the Index column, I think I could accomplish this with:
>
> df2 = reShape(df.index$X1, id=df.index$Name, colvar=df.index$Index)
> colnames(df2) = c("V1", "V2", "V3")
>
> However, is there a way to get to df2 without using the Index column and still have the NAs written as described above?
>
> Thank you so much for your help on these two issues.
>
> With best regards,
> Dana Sevak