[R] merging dataframes in a list
jim holtman
jholtman at gmail.com
Sat Jun 4 19:11:23 CEST 2016
Here is how you can to it with tidyr:
> x <- list(data.frame(name="sample1", red=20)
+ , data.frame(name="sample1", green=15)
+ , data.frame(name="sample2", red=10)
+ , data.frame(name="sample2", green=30)
+ )
> library(dplyr)
> library(tidyr)
>
> # convert to 'name, type, value'; assumes dataframe with 2 variables
> x.conv <- lapply(x, function(df){
+ data.frame(name = as.character(df$name)
+ , type = names(df)[2L] # use 'red'/'green' as indicators
+ , value = df[[2]]
+ , stringsAsFactors = FALSE
+ )
+ })
> print(x.conv)
[[1]]
name type value
1 sample1 red 20
[[2]]
name type value
1 sample1 green 15
[[3]]
name type value
1 sample2 red 10
[[4]]
name type value
1 sample2 green 30
>
> x.conv <- bind_rows(x.conv) # create single dataframe
> print(x.conv)
Source: local data frame [4 x 3]
name type value
(chr) (chr) (dbl)
1 sample1 red 20
2 sample1 green 15
3 sample2 red 10
4 sample2 green 30
>
> # create output
> spread(x.conv, type, value) # uses tidyr 'spread'
Source: local data frame [2 x 3]
name green red
(chr) (dbl) (dbl)
1 sample1 15 20
2 sample2 30 10
>
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
On Fri, Jun 3, 2016 at 4:02 PM, Ed Siefker <ebs15242 at gmail.com> wrote:
> Thanks, ldply got me a data frame straight away. But it filled empty
> spaces with NA and merge no longer works.
>
> > ldply(mylist)
> name red green
> 1 sample1 20 NA
> 2 sample1 NA 15
> 3 sample2 10 NA
> 4 sample2 NA 30
> > mydf <- ldply(mylist)
> > merge(mydf[1,],mydf[2,])
> [1] name red green
> <0 rows> (or 0-length row.names)
> > merge(mydf[1,],mydf[2,], by=1)
> name red.x green.x red.y green.y
> 1 sample1 20 NA NA 15
>
>
> How do I merge dataframes with NA?
>
> On Fri, Jun 3, 2016 at 2:17 PM, Ulrik Stervbo <ulrik.stervbo at gmail.com>
> wrote:
> > You can use ldply in the plyr package to bind all the data.frames
> together
> > (a regular loop will also work). Afterwards you can summarise using ddply
> >
> > Hope this helps
> > Ulrik
> >
> >
> > Ed Siefker <ebs15242 at gmail.com> schrieb am Fr., 3. Juni 2016 21:10:
> >>
> >> aggregate isn't really what I want. Maybe tapply? I still can't get
> >> it to work.
> >>
> >> > length(mylist)
> >> [1] 4
> >> > length(names)
> >> [1] 4
> >> > tapply(mylist, names, merge)
> >> Error in tapply(mylist, names, merge) : arguments must have same length
> >>
> >> I guess because a list isn't an atomic data type. What function will
> >> do the same on lists? lapply doesn't have a 'by' argument.
> >>
> >> On Fri, Jun 3, 2016 at 1:41 PM, Ed Siefker <ebs15242 at gmail.com> wrote:
> >> > I manually constructed the list of sample names and tried the
> >> > aggregate call I mentioned.
> >> > Merge works when called manually, but not when using aggregate.
> >> >
> >> >> mylist <- list(data.frame(name="sample1", red=20),
> >> >> data.frame(name="sample1", green=15), data.frame(name="sample2",
> red=10),
> >> >> data.frame(na me="sample2", green=30))
> >> >> names <- list("sample1", "sample1", "sample2", "sample2")
> >> >> merge(mylist[1], mylist[2])
> >> > name red green
> >> > 1 sample1 20 15
> >> >> merge(mylist[3], mylist[4])
> >> > name red green
> >> > 1 sample2 10 30
> >> >> aggregate(mylist, by=as.list(names), merge)
> >> > Error in as.data.frame(y) : argument "y" is missing, with no default
> >> >
> >> > What's the right way to do this?
> >> >
> >> > On Fri, Jun 3, 2016 at 1:20 PM, Ed Siefker <ebs15242 at gmail.com>
> wrote:
> >> >> I have a list of data as follows.
> >> >>
> >> >>> list(data.frame(name="sample1", red=20), data.frame(name="sample1",
> >> >>> green=15), data.frame(name="sample2", red=10),
> data.frame(name="sample 2",
> >> >>> green=30))
> >> >> [[1]]
> >> >> name red
> >> >> 1 sample1 20
> >> >>
> >> >> [[2]]
> >> >> name green
> >> >> 1 sample1 15
> >> >>
> >> >> [[3]]
> >> >> name red
> >> >> 1 sample2 10
> >> >>
> >> >> [[4]]
> >> >> name green
> >> >> 1 sample2 30
> >> >>
> >> >>
> >> >> I would like to massage this into a data frame like this:
> >> >>
> >> >> name red green
> >> >> 1 sample1 20 15
> >> >> 2 sample2 10 30
> >> >>
> >> >>
> >> >> I'm imagining I can use aggregate(mylist, by=samplenames, merge)
> >> >> right? But how do I get the list of samplenames? How do I subset
> >> >> each dataframe inside the list?
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list