[R] Having some Trouble Data Structures

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Mon Nov 5 06:37:30 CET 2012


Please keep mail threads on the mailing list. Please follow the posting guidelines and provide a sample of data and desired outcome.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

"Benjamin Ward (ENV)" <B.Ward at uea.ac.uk> wrote:

>Hi,
>Thank you very much for your reply - how you prefer, is how my
>supervisor implemented the layout in Minitab, however I was unsure of
>how to get R to do this repeating ID behaviour and how to know that in
>a for loop going through individual 1 to say 10, I want it to: 
>
>Randomly sample a number from a distribution for the number of
>effectors (I can do this but with runif), 
>
>Then put one value in a cell of the Effector column and repeat the ID
>for each effector row. I'm also then left wondering when I do for loops
>then that use ID, will it go and apply operations row by row, or ID by
>ID - for example in the immunology part I would need a loop to check
>individual by individual if any of the effectors it has means death in
>the host, in which case all instances of - say ID "1" would need to be
>deleted.
>
>Would you be able to provide an example chunk of how you accomplish
>this with your preferred approach, if you have the time?
>
>Thanks,
>Ben W.
>
>________________________________________
>From: Jeff Newmiller [jdnewmil at dcn.davis.ca.us]
>Sent: 28 October 2012 15:27
>To: Benjamin Ward (ENV); r-help at r-project.org
>Subject: Re: [R] Having some Trouble Data Structures
>
>Search on "ragged array".
>
>My preferred approach is to use a data frame with one row per effector
>that repeats the per-ID information. If that occupies too much memory,
>you can setup another data frame with one row per ID and refer to that
>information as using lapply and subset the effectors data as needed.
>The plyr package is also useful for such processing.
>---------------------------------------------------------------------------
>Jeff Newmiller                        The     .....       .....  Go
>Live...
>DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>Go...
>                                     Live:   OO#.. Dead: OO#..  Playing
>Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>/Software/Embedded Controllers)               .OO#.       .OO#. 
>rocks...1k
>---------------------------------------------------------------------------
>Sent from my phone. Please excuse my brevity.
>
>"Benjamin Ward (ENV)" <B.Ward at uea.ac.uk> wrote:
>
>>Hi All,
>>
>>I'm trying to run a simulation of host-pathogen evolution based around
>>individuals.
>>What I need to have is a dataframe or table of some description -
>>describing all the individuals of a pathogen population (so far I've
>>implemented this as a matrix):
>>
>>     ID         No_of_Effectors                   Effectors
>(Sequences)
>>  [1,] 0001              3                   ##   3 Random Numbers ##
>>
>>There will be many such rows for many individuals. They have something
>>called effectors, the number of which is randomly generated, so say
>you
>>get 3 in the No_of_Effectors column. Then I make R generate 3 numbers
>>from between 1 and 10,000, this gives me three numerical
>>representations of genes. These numbers will be compared to a similar
>>data structure of the host individuals who have their immune genes
>with
>>similar numbers.
>>
>>My problem is that obviously I can't stick 3 numbers in one "cell" of
>>the matrix (I've tried) :
>>
>>Pathogen_Individuals[1,3] <- c(2,3,4)
>>Error in Pathogen_Individuals[1, 3] <- c(345, 567, 678) :
>>  number of items to replace is not a multiple of replacement length
>>
>>In future I'm also going to have more variables such as whether a gene
>>is expressed. Such information may require a matrix in itself -
>>something like:
>>
>>
>>        Effector ID             Sequence                  Expressed?
>> [1,]     0001              345,567,678                       1 (or
>0).
>>
>>Is there a way then I can put more than one value in the cell like a
>>list of values, or a way to put objects in a cell of a data frame,
>>matrix or table etc. Almost an inception deal - data structures nested
>>in a data structure? If I search for things like "insert list into
>>matrix" I get results like how to turn one into another, which is not
>>what I think I need to be doing.
>>
>>I have been considering having several data structures not nested in
>>each other, something like for every individual create a new matrix
>>object with the name Effectors_[Individual_ID] and some how get my
>>simulation loops operating on those objects but I find it hard to see
>>how to tell R all of those matrices are to be included in an
>operation,
>>as you can all lines of a data frame for example with for loops.
>>This is strange for me because this model was written in a macro-code
>>for another program which handles data in a different format and
>layout
>>to R.
>>
>>My problem is I think, each individual in the model has many variables
>>- in this case representations of genes. So I'm having trouble getting
>>my head about this.
>>
>>Hopefully someone more experienced will be able to offer advice or a
>>solution, it will be very appreciated.
>>
>>Many Thanks,
>>Ben Ward (ENV, UEA & The Sainsbury Lab, JIC).
>>
>>P.S. I have searched previous queries to the list, and I'm not sure
>but
>>this may be useful for relevant:
>>
>>
>>Have you thought of using a list?
>>
>>> a <- matrix(1:10, nrow=2)
>>> b <- 1:5
>>> x <- list(a=a, b=b)
>>> x
>>$a
>>     [,1] [,2] [,3] [,4] [,5]
>>[1,]    1    3    5    7    9
>>[2,]    2    4    6    8   10
>>
>>$b
>>[1] 1 2 3 4 5
>>
>>> x$a
>>     [,1] [,2] [,3] [,4] [,5]
>>[1,]    1    3    5    7    9
>>[2,]    2    4    6    8   10
>>> x$b
>>[1] 1 2 3 4 5
>>
>>oliveoil and yarn datasets have been mentioned.
>>
>>
>>
>>
>>
>>       [[alternative HTML version deleted]]
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list