[R] Convert list of data frames to one data frame
Ira Sharenow
|r@@h@renow100 @end|ng |rom y@hoo@com
Sat Jun 30 04:08:56 CEST 2018
Bert,
Thanks for your idea. However, the end results is not what I am looking
for. Each initial data frame in the list will result in just one row in
the final data frame. In your case
Row 1 of the initial structure will become 1 b 2 c3d NA NA NA NA in the
end structure
Row 2 of the initial structure will become 5 k 6 l 7 m 8 n 9 o
Sarah’s code works
> dfbycol(zz)
first1 last1 first2 last2 first3 last3 first4 last4 first5 last5
one1b2c3d<NA><NA><NA><NA>
two5k6l7m8n9o
>
dfbycol <- function(x) {
x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
x <- do.call(rbind, x)
x <- data.frame(x, stringsAsFactors=FALSE)
colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2), each=2))
x
}
Thanks.
By the way I am working with a colleague on this. Apparently the data
came from reading in XML data.
Ira
On 6/29/2018 6:33 PM, Bert Gunter wrote:
> Well, I don't know your constraints, of course; but if I understand
> correctly, in situations like this, it is usually worthwhile to
> reconsider your data structure.
>
> This is a one-liner if you simply rbind all your data frames into one
> with 2 columns. Here's an example to indicate how:
>
> ## list of two data frames with different column names and numbers of
> rows:
> zz <-list(one = data.frame(f=1:3,g=letters[2:4]), two = data.frame(a =
> 5:9,b = letters[11:15]))
>
> ## create common column names and bind them up:
> do.call(rbind,lapply(zz,function(x){ names(x) <- c("first","last"); x}))
>
> Note that the row names of the result tell you which original frame
> the rows came from. This can also be obtained just from a count of
> rows (?nrow) of the original list.
>
> Apologies if I misunderstand or your query or your constraints make
> this simple approach impossible.
>
> Cheers,
> Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Fri, Jun 29, 2018 at 5:29 PM, Ira Sharenow via R-help
> <r-help using r-project.org <mailto:r-help using r-project.org>> wrote:
>
>
> Sarah and David,
>
> Thank you for your responses.I will try and be clearer.
>
> Base R solution: Sarah’smethod worked perfectly
>
> Is there a dplyrsolution?
>
> START: list of dataframes
>
> FINISH: one data frame
>
> DETAILS: The initiallist of data frames might have hundreds or a
> few thousand data frames. Everydata frame will have two columns.
> The first column will represent first names.The second column will
> represent last names. The column names are notconsistent. Data
> frames will most likely have from one to five rows.
>
> SUGGESTED STRATEGY:Convert the n by 2 data frames to 1 by 2n data
> frames. Then somehow do an rbindeven though the number of columns
> differ from data frame to data frame.
>
> EXAMPLE: List with twodata frames
>
> # DF1
>
> First Last
>
> George Washington
>
>
>
> # DF2
>
> Start End
>
> John Adams
>
> Thomas Jefferson
>
>
>
> # End Result. One dataframe
>
> First1 Second1 First2 Second2
>
> George Washington NA NA
>
> John Adams Thomas Jefferson
>
>
>
> DISCUSSION: As mentionedI posted something on Stack Overflow.
> Unfortunately, my example was not generalenough and so the
> suggested solutions worked on the easy case which I provided
> butnot when the names were different.
>
> The suggested solution was:
>
> library(dplyr)
>
> bind_rows(lapply(employees4List,function(x)
> rbind.data.frame(c(t(x)))))
>
>
>
> On this site I pointedout that the inner function:
> lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
>
> For each data frame correctlyspread the multiple rows into 1 by
> 2ndata frames. However, the column names were derived from the
> values and were amess. This caused a problem with bind_rows.
>
> I felt that if I knewhow to change all the names of all of the
> data frames that were created afterlapply, then I could then use
> bind_rows. So if someone knows how to change allof the names at
> this intermediate stage, I hope that person will provide thesolution.
>
> In the end a 1 by 2 data frame would have namesFirst1 Second1. A
> 1 by 4 data framewould have names First1 Second1
> First2 Second2.
>
> Ira
>
>
> On Friday, June 29, 2018, 12:49:18 PM PDT, David Winsemius
> <dwinsemius using comcast.net <mailto:dwinsemius using comcast.net>> wrote:
>
>
> > On Jun 29, 2018, at 7:28 AM, Sarah Goslee
> <sarah.goslee using gmail.com <mailto:sarah.goslee using gmail.com>> wrote:
> >
> > Hi,
> >
> > It isn't super clear to me what you're after.
>
> Agree.
>
> Had a different read of ht erequest. Thought the request was for a
> first step that "harmonized" the names of the columns and then
> used `dplyr::bind_rows`:
>
> library(dplyr)
> newList <- lapply( employees4List, 'names<-',
> names(employees4List[[1]]) )
> bind_rows(newList)
>
> #---------
>
> first1 second1
> 1 Al Jones
> 2 Al2 Jones
> 3 Barb Smith
> 4 Al3 Jones
> 5 Barbara Smith
> 6 Carol Adams
> 7 Al Jones2
>
> Might want to wrap suppressWarnings around the right side of that
> assignment since there were many warnings regarding incongruent
> factor levels.
>
> --
> David.
> > Is this what you intend?
> >
> >> dfbycol(employees4BList)
> > first1 last1 first2 last2 first3 last3
> > 1 Al Jones <NA> <NA> <NA> <NA>
> > 2 Al Jones Barb Smith <NA> <NA>
> > 3 Al Jones Barb Smith Carol Adams
> > 4 Al Jones <NA> <NA> <NA> <NA>
> >>
> >> dfbycol(employees4List)
> > first1 last1 first2 last2 first3 last3
> > 1 Al Jones <NA> <NA> <NA> <NA>
> > 2 Al2 Jones Barb Smith <NA> <NA>
> > 3 Al3 Jones Barbara Smith Carol Adams
> > 4 Al Jones2 <NA> <NA> <NA> <NA>
> >
> >
> > If so:
> >
> > employees4BList = list(
> > data.frame(first1 = "Al", second1 = "Jones"),
> > data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
> > data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
> > "Smith", "Adams")),
> > data.frame(first1 = ("Al"), second1 = "Jones"))
> >
> > employees4List = list(
> > data.frame(first1 = ("Al"), second1 = "Jones"),
> > data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones",
> "Smith")),
> > data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 =
> c("Jones",
> > "Smith", "Adams")),
> > data.frame(first4 = ("Al"), second4 = "Jones2"))
> >
> > ###
> >
> > dfbycol <- function(x) {
> > x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
> > x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
> > x <- do.call(rbind, x)
> > x <- data.frame(x, stringsAsFactors=FALSE)
> > colnames(x) <- paste0(c("first", "last"), rep(seq(1,
> ncol(x)/2), each=2))
> > x
> > }
> >
> > ###
> >
> > dfbycol(employees4BList)
> >
> > dfbycol(employees4List)
> >
> > On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
> > <r-help using r-project.org <mailto:r-help using r-project.org>> wrote:
> >> I have a list of data frames which I would like to combine into
> one data
> >> frame doing something like rbind. I wish to combine in column
> order and
> >> not by names. However, there are issues.
> >>
> >> The number of columns is not the same for each data frame. This
> is an
> >> intermediate step to a problem and the number of columns could be
> >> 2,4,6,8,or10. There might be a few thousand data frames.
> Another problem
> >> is that the names of the columns produced by the first step are
> garbage.
> >>
> >> Below is a method that I obtained by asking a question on stack
> >> overflow. Unfortunately, my example was not general enough. The
> code
> >> below works for the simple case where the names of the people are
> >> consistent. It does not work when the names are realistically
> not the same.
> >>
> >>
> https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
> <https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432>
> >>
> >>
> >> Please note that the lapply step sets things up except for the
> column
> >> name issue. If I could figure out a way to change the column
> names, then
> >> the bind_rows step will, I believe, work.
> >>
> >> So I really have two questions. How to change all column names
> of all
> >> the data frames and then how to solve the original problem.
> >>
> >> # The non general case works fine. It produces one data frame
> and I can
> >> then change the column names to
> >>
> >> # c("first1", "last1","first2", "last2","first3", "last3",)
> >>
> >> #Non general easy case
> >>
> >> employees4BList = list(data.frame(first1 = "Al", second1 =
> "Jones"),
> >>
> >> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones",
> "Smith")),
> >>
> >> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
> >> "Smith", "Adams")),
> >>
> >> data.frame(first1 = ("Al"), second1 = "Jones"))
> >>
> >> employees4BList
> >>
> >> bind_rows(lapply(employees4BList, function(x)
> rbind.data.frame(c(t(x)))))
> >>
> >> # This produces a nice list of data frames, except for the names
> >>
> >> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
> >>
> >> # This list is a disaster. I am looking for a solution that
> works in
> >> this case.
> >>
> >> employees4List = list(data.frame(first1 = ("Al"), second1 =
> "Jones"),
> >>
> >> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones",
> "Smith")),
> >>
> >> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 =
> c("Jones",
> >> "Smith", "Adams")),
> >>
> >> data.frame(first4 = ("Al"), second4 = "Jones2"))
> >>
> >> bind_rows(lapply(employees4List, function(x)
> rbind.data.frame(c(t(x)))))
> >>
> >> Thanks.
> >>
> >> Ira
> >>
> >
> > --
> > Sarah Goslee
> > http://www.functionaldiversity.org
> <http://www.functionaldiversity.org>
> >
> > ______________________________________________
> > R-help using r-project.org <mailto:R-help using r-project.org> mailing list
> -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently
> advanced.' -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org <mailto:R-help using r-project.org> mailing list --
> To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list