[R] problem applying the same function twice

Wed Mar 11 01:46:35 CET 2015

Sarah,

I realized what I was saying after I pressed send on the email. It makes
perfect sense now, thanks so much for your help and patience.
On Mar 10, 2015 5:57 PM, "Sarah Goslee" <sarah.goslee at gmail.com> wrote:

> I think you're kind of missing the way this works:
>
> the data frame created by expand.grid() should ONLY have site, year,
> sample (with the exact names used in the data itself).
> Then the merged data frame will have the full site,year,sample
> combinations, along with ALL the data variables. Your animal example
> only had one measured variable, but the same method will work with any
> number.
> Reading ?merge might help you understand.
>
> Sarah
>
> On Tue, Mar 10, 2015 at 5:35 PM, Curtis Burkhalter
> <curtisburkhalter at gmail.com> wrote:
> >
> > Thanks Sarah, one of my column names was missing a letter so it was
> throwing
> > things off. It works super fast now and is exactly what I needed. My
> actual
> > data set  has about 6 other ancillary response data data columns, is
> there a
> > way to combine the 'full' data set I just created with the original in
> case
> > I need any of the other response variables. E.g.
> >
> > FULL:                                          Original:
> > Combined:
> > site    year     sample                    site    year     sample
>  color
> > shape                  site    year     sample     color     shape
> > 1        1         10                           1        1         10
> > blue       diamond              1        1         10            blue
> > diamond
> > 1         1        12                           1         1        12
> > green     pyramid               1         1        12            green
> > pyramid
> > 1         1        NA
> > 1         1        NA           NA        NA
> >
> > Thanks
> >
> > On Tue, Mar 10, 2015 at 3:12 PM, Sarah Goslee <sarah.goslee at gmail.com>
> > wrote:
> >>
> >> Yeah, that's tiny:
> >>
> >> > fullout <- expand.grid(site=1:669, year=1:7, sample=1:3)
> >> > dim(fullout)
> >> [1] 14049     3
> >>
> >>
> >> Almost certainly the problem is that your expand.grid result doesn't
> >> have the same column names as your actual data file, so merge() is
> >> trying to make an enormous result. Note how when I made outgrid in the
> >> example I named the columns.
> >>
> >> Make sure that the names are identical!
> >>
> >>
> >> On Tue, Mar 10, 2015 at 4:57 PM, Curtis Burkhalter
> >> <curtisburkhalter at gmail.com> wrote:
> >> > Sarah,
> >> >
> >> > I have 669 sites and each site has 7 years of data, so if I'm thinking
> >> > correctly then there should be 4683 possible combinations of site x
> >> > year.
> >> > For each year though I need 3 sampling periods so that there is
> >> > something
> >> > like the following:
> >> >
> >> > site 1      year1      sample 1
> >> > site 1      year1      sample 2
> >> > site 1      year1      sample 3
> >> > site 2      year1      sample 1
> >> > site 2      year1      sample 2
> >> > site 2      year1      sample 3.....
> >> > site 669   year7      sample 1
> >> > site 669   year7     sample 2
> >> > site 669   year7     sample 3.
> >> >
> >> > I have my max memory allocation set to the amount of RAM (8GB) on my
> >> > laptop,
> >> > but it still 'times out' due to memory problems.
> >> >
> >> > On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee <sarah.goslee at gmail.com
> >
> >> > wrote:
> >> >>
> >> >> You said your data only had 14000 rows, which really isn't many.
> >> >>
> >> >> How many possible combinations do you have, and how many do you need
> to
> >> >> add?
> >> >>
> >> >> On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter
> >> >> <curtisburkhalter at gmail.com> wrote:
> >> >> > Sarah,
> >> >> >
> >> >> > This strategy works great for this small dataset, but when I
> attempt
> >> >> > your
> >> >> > method with my data set I reach the maximum allowable memory
> >> >> > allocation
> >> >> > and
> >> >> > the operation just stalls and then stops completely before it is
> >> >> > finished.
> >> >> > Do you know of a way around this?
> >> >> >
> >> >> > Thanks
> >> >> >
> >> >> > On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee
> >> >> > <sarah.goslee at gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> I didn't work through your code, because it looked overly
> >> >> >> complicated.
> >> >> >> Here's a more general approach that does what you appear to want:
> >> >> >>
> >> >> >> # use dput() to provide reproducible data please!
> >> >> >> comAn <- structure(list(animals = c("bird", "bird", "bird",
> "bird",
> >> >> >> "bird",
> >> >> >> "bird", "dog", "dog", "dog", "dog", "dog", "dog", "cat", "cat",
> >> >> >> "cat", "cat"), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
> >> >> >> 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L,
> >> >> >> 20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L
> >> >> >> )), .Names = c("animals", "animalYears", "animalMass"), class =
> >> >> >> "data.frame", row.names = c("1",
> >> >> >> "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
> >> >> >> "14", "15", "16"))
> >> >> >>
> >> >> >>
> >> >> >> # add reps to comAn
> >> >> >> # assumes comAn is already sorted on animals, animalYears
> >> >> >> comAn$reps <- unlist(sapply(rle(do.call("paste",
> >> >> >> comAn[,1:2]))$lengths, seq_len))
> >> >> >>
> >> >> >> # create full set of combinations
> >> >> >> outgrid <- expand.grid(animals=unique(comAn$animals),
> >> >> >> animalYears=unique(comAn$animalYears), reps=unique(comAn$reps),
> >> >> >> stringsAsFactors=FALSE)
> >> >> >>
> >> >> >> # combine with comAn
> >> >> >> comAn.full <- merge(outgrid, comAn, all.x=TRUE)
> >> >> >>
> >> >> >> > comAn.full
> >> >> >>    animals animalYears reps animalMass
> >> >> >> 1     bird           1    1         29
> >> >> >> 2     bird           1    2         48
> >> >> >> 3     bird           1    3         36
> >> >> >> 4     bird           2    1         20
> >> >> >> 5     bird           2    2         34
> >> >> >> 6     bird           2    3         34
> >> >> >> 7      cat           1    1         46
> >> >> >> 8      cat           1    2         33
> >> >> >> 9      cat           1    3         48
> >> >> >> 10     cat           2    1         21
> >> >> >> 11     cat           2    2         NA
> >> >> >> 12     cat           2    3         NA
> >> >> >> 13     dog           1    1         21
> >> >> >> 14     dog           1    2         28
> >> >> >> 15     dog           1    3         25
> >> >> >> 16     dog           2    1         35
> >> >> >> 17     dog           2    2         18
> >> >> >> 18     dog           2    3         11
> >> >> >> >
> >> >> >>
> >> >> >> On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter
> >> >> >> <curtisburkhalter at gmail.com> wrote:
> >> >> >> > Hey everyone,
> >> >> >> >
> >> >> >> > I've written a function that adds NAs to a dataframe where data
> is
> >> >> >> > missing
> >> >> >> > and it seems to work great if I only need to run it once, but
> if I
> >> >> >> > run
> >> >> >> > it
> >> >> >> > two times in a row I run into problems. I've created a workable
> >> >> >> > example
> >> >> >> > to
> >> >> >> > explain what I mean and why I would do this.
> >> >> >> >
> >> >> >> > In my dataframe there are areas where I need to add two rows of
> >> >> >> > NAs
> >> >> >> > (b/c
> >> >> >> > I
> >> >> >> > need to have 3 animal x year combos and for cat in year 2 I only
> >> >> >> > have
> >> >> >> > one)
> >> >> >> > so I thought that I'd just run my code twice using the function
> in
> >> >> >> > the
> >> >> >> > code
> >> >> >> > below. Everything works great when I run it the first time, but
> >> >> >> > when
> >> >> >> > I
> >> >> >> > run
> >> >> >> > it again it says that the value returned to the list 'x' is of
> >> >> >> > length
> >> >> >> > 0.
> >> >> >> > I
> >> >> >> > don't understand why the function works the first time around
> and
> >> >> >> > adds
> >> >> >> > an
> >> >> >> > NA to the 'animalMass' column, but won't do it again. I've used
> >> >> >> > (print(str(dataframe)) to see if there is a change in class or
> >> >> >> > type
> >> >> >> > when
> >> >> >> > the function runs through the original dataframe and there is
> for
> >> >> >> > 'animalYears', but I just convert it back before rerunning the
> >> >> >> > function
> >> >> >> > for
> >> >> >> > second time.
> >> >> >> >
> >> >> >> > Any thoughts on this would be greatly appreciated b/c my actual
> >> >> >> > data
> >> >> >> > dataframe I have to input into WinBUGS is 14000x12, so it's not
> a
> >> >> >> > trivial
> >> >> >> > thing to just add in an NA here or there.
> >> >> >> >
> >> >> >> >>comAn
> >> >> >> >    animals animalYears animalMass
> >> >> >> > 1     bird           1         29
> >> >> >> > 2     bird           1         48
> >> >> >> > 3     bird           1         36
> >> >> >> > 4     bird           2         20
> >> >> >> > 5     bird           2         34
> >> >> >> > 6     bird           2         34
> >> >> >> > 7      dog           1         21
> >> >> >> > 8      dog           1         28
> >> >> >> > 9      dog           1         25
> >> >> >> > 10     dog           2         35
> >> >> >> > 11     dog           2         18
> >> >> >> > 12     dog           2         11
> >> >> >> > 13     cat           1         46
> >> >> >> > 14     cat           1         33
> >> >> >> > 15     cat           1         48
> >> >> >> > 16     cat           2         21
> >> >> >> >
> >> >> >> > So every animal has 3 measurements per year, except for the cat
> in
> >> >> >> > year
> >> >> >> > two
> >> >> >> > which has only 1. I run the code below and get:
> >> >> >> >
> >> >> >> > #combs defines the different combinations of
> >> >> >> > #animals and animalYears
> >> >> >> > combs<-paste(comAn$animals,comAn$animalYears,sep=':')
> >> >> >> > #counts defines how long the different combinations are
> >> >> >> > counts<-ave(1:nrow(comAn),combs,FUN=length)
> >> >> >> > #missing defines the combs that have length less than one and
> puts
> >> >> >> > it
> >> >> >> > in
> >> >> >> > #the data frame missing
> >> >> >> > missing<-data.frame(vals=combs[counts<2],count=counts[counts<2])
> >> >> >> >
> >> >> >> > genRows<-function(dat){
> >> >> >> >         vals<-strsplit(dat[1],':')[[1]]
> >> >> >> >                 #not sure why dat[2] is being converted to a
> >> >> >> > string
> >> >> >> >         newRows<-2-as.numeric(dat[2])
> >> >> >> >         newDf<-data.frame(animals=rep(vals[1],newRows),
> >> >> >> >                           animalYears=rep(vals[2],newRows),
> >> >> >> >                           animalMass=rep(NA,newRows))
> >> >> >> >         return(newDf)
> >> >> >> >         }
> >> >> >> >
> >> >> >> >
> >> >> >> > x<-apply(missing,1,genRows)
> >> >> >> > comAn=rbind(comAn,
> >> >> >> >         do.call(rbind,x))
> >> >> >> >
> >> >> >> >> comAn
> >> >> >> >    animals animalYears animalMass
> >> >> >> > 1     bird           1         29
> >> >> >> > 2     bird           1         48
> >> >> >> > 3     bird           1         36
> >> >> >> > 4     bird           2         20
> >> >> >> > 5     bird           2         34
> >> >> >> > 6     bird           2         34
> >> >> >> > 7      dog           1         21
> >> >> >> > 8      dog           1         28
> >> >> >> > 9      dog           1         25
> >> >> >> > 10     dog           2         35
> >> >> >> > 11     dog           2         18
> >> >> >> > 12     dog           2         11
> >> >> >> > 13     cat           1         46
> >> >> >> > 14     cat           1         33
> >> >> >> > 15     cat           1         48
> >> >> >> > 16     cat           2         21
> >> >> >> > 17     cat           2       <NA>
> >> >> >> >
> >> >> >> > So far so good, but then I adjust the code so that it reads
> >> >> >> > (**notice
> >> >> >> > the
> >> >> >> > change in the specification in 'missing' to counts<3**):
> >> >> >> >
> >> >> >> > #combs defines the different combinations of
> >> >> >> > #animals and animalYears
> >> >> >> > combs<-paste(comAn$animals,comAn$animalYears,sep=':')
> >> >> >> > #counts defines how long the different combinations are
> >> >> >> > counts<-ave(1:nrow(comAn),combs,FUN=length)
> >> >> >> > #missing defines the combs that have length less than one and
> puts
> >> >> >> > it
> >> >> >> > in
> >> >> >> > #the data frame missing
> >> >> >> > missing<-data.frame(vals=combs[counts<3],count=counts[counts<3])
> >> >> >> >
> >> >> >> > genRows<-function(dat){
> >> >> >> >         vals<-strsplit(dat[1],':')[[1]]
> >> >> >> >                 #not sure why dat[2] is being converted to a
> >> >> >> > string
> >> >> >> >         newRows<-2-as.numeric(dat[2])
> >> >> >> >         newDf<-data.frame(animals=rep(vals[1],newRows),
> >> >> >> >                           animalYears=rep(vals[2],newRows),
> >> >> >> >                           animalMass=rep(NA,newRows))
> >> >> >> >         return(newDf)
> >> >> >> >         }
> >> >> >> >
> >> >> >> >
> >> >> >> > x<-apply(missing,1,genRows)
> >> >> >> > comAn=rbind(comAn,
> >> >> >> >         do.call(rbind,x))
> >> >> >> >
> >> >> >> > The result for 'x' then reads:
> >> >> >> >
> >> >> >> >> x
> >> >> >> > [[1]]
> >> >> >> > [1] animals     animalYears animalMass
> >> >> >> > <0 rows> (or 0-length row.names)
> >> >> >> >
> >> >> >> > Any thoughts on why it might be doing this instead of adding an
> >> >> >> > additional
> >> >> >> > row to get the result:
> >> >> >> >
> >> >> >> >> comAn
> >> >> >> >    animals animalYears animalMass
> >> >> >> > 1     bird           1         29
> >> >> >> > 2     bird           1         48
> >> >> >> > 3     bird           1         36
> >> >> >> > 4     bird           2         20
> >> >> >> > 5     bird           2         34
> >> >> >> > 6     bird           2         34
> >> >> >> > 7      dog           1         21
> >> >> >> > 8      dog           1         28
> >> >> >> > 9      dog           1         25
> >> >> >> > 10     dog           2         35
> >> >> >> > 11     dog           2         18
> >> >> >> > 12     dog           2         11
> >> >> >> > 13     cat           1         46
> >> >> >> > 14     cat           1         33
> >> >> >> > 15     cat           1         48
> >> >> >> > 16     cat           2         21
> >> >> >> > 17     cat           2       <NA>
> >> >> >> > 18     cat           2       <NA>
> >> >> >> >
> >> >> >> > Thanks
> >> >> >> > --
> >> >> >> > Curtis Burkhalter
> >> >> >
> >> >> >
>

	[[alternative HTML version deleted]]