[R] initiate elements in a dataframe with lists
Jeff Newmiller
jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Wed Jul 25 19:43:52 CEST 2018
The code below reeks of a misconception that lists are efficient to add
items to, which is a confusion with the computer science term "linked
list". In R, a list is NOT a linked list... it is a vector, which means
the memory used by the list is allocated at the time it is created, and
REALLOCATED when a new item is added. The only reason you should use a
list is because you expect to put values of different types or shapes into
it, which does not appear to apply in this use case.
In R, you should make a valiant effort to create things right the first
time, and if that doesn't work then preallocate the space you will need in
the vectors you are working with. Since you have a need to store a
variable number of elements in each intersectX element, the column needs
to be a list but the elements of that list can perfectly well be character
vectors.
x <- data.frame( TYPE=c("DEL", "DEL", "DUP", "TRA", "INV", "TRA")
, CHRA=c("chr1", "chr1", "chr1", "chr1", "chr2", "chr2")
, POSA=c(10, 15, 120, 340, 100, 220)
, CHRB=c("chr1", "chr1", "chr1", "chr2", "chr2", "chr1")
, POSB=c(30, 100, 300, 20, 200, 320)
, stringsAsFactors = FALSE
)
compareRng <- function( chr1, pos1, chr2, pos2, delta ) {
( chr1 == chr2
& ( pos2 - delta ) < pos1
& pos1 < ( pos2 + delta )
)
}
makeIntersectX <- function( n, chrlabel, poslabel, delta ) {
lgclidx <- rep( TRUE, nrow( x ) )
lgclidx[ n ] <- FALSE
x[[ chrlabel ]][ compareRng( x[[ chrlabel ]][ n ]
, x[[ poslabel ]][ n ]
, x[[ chrlabel ]]
, x[[ poslabel ]]
, delta
)
& lgclidx
]
}
x$intersectA <- lapply( seq.int( nrow( x ) )
, makeIntersectX
, chrlabel = "CHRA"
, poslabel = "POSA"
, delta = 10L
)
x$intersectB <- lapply( seq.int( nrow( x ) )
, makeIntersectX
, chrlabel = "CHRB"
, poslabel = "POSB"
, delta = 21L
)
> x
TYPE CHRA POSA CHRB POSB intersectA intersectB
1 DEL chr1 10 chr1 30 chr1
2 DEL chr1 15 chr1 100 chr1
3 DUP chr1 120 chr1 300 chr1
4 TRA chr1 340 chr2 20
5 INV chr2 100 chr2 200
6 TRA chr2 220 chr1 320 chr1
Note that depending on what you plan to do beyond this point, it might
actually be more performant to use a data frame with repeated rows instead
of list columns... but I cannot tell from what you have provided.
On Wed, 25 Jul 2018, Bogdan Tanasa wrote:
> Dear Thierry and Juan, thank you for your help. Thank you all.
>
> Now, if I would like to add an element to the empty list, how shall I do :
> for example, shall i = 2, and j = 1, in a bit of more complex R code :
>
> x <- data.frame(TYPE=c("DEL", "DEL", "DUP", "TRA", "INV", "TRA"),
> CHRA=c("chr1", "chr1", "chr1", "chr1", "chr2", "chr2"),
> POSA=c(10, 15, 120, 340, 100, 220),
> CHRB=c("chr1", "chr1", "chr1", "chr2", "chr2", "chr1"),
> POSB=c(30, 100, 300, 20, 200, 320))
>
> x$labA <- paste(x$CHRA, x$POSA, sep="_")
> x$labB <- paste(x$CHRB, x$POSB, sep="_")
>
> x$POSA_left <- x$POSA - 10
> x$POSA_right <- x$POSA + 10
>
> x$POSB_left <- x$POSB - 10
> x$POSB_right <- x$POSB + 10
>
> x$intersectA <- rep(list(list()), nrow(x))
> x$intersectB <- rep(list(list()), nrow(x))
>
> And we know that for i = 2, and j = 1, the condition is TRUE :
>
> i <- 2
>
> j <- 1
>
> if ( (x$CHRA[i] == x$CHRA[j] ) &&
> (x$POSA[i] > x$POSA_left[j] ) &&
> (x$POSA[i] < x$POSA_right[j] ) ){
> x$intersectA[i] <- c(x$intersectA[i], x$labA[j])}
>
> the R code does not work. Thank you for your kind help !
>
> On Wed, Jul 25, 2018 at 12:26 AM, Thierry Onkelinx <thierry.onkelinx using inbo.be
>> wrote:
>
>> Dear Bogdan,
>>
>> You are looking for x$intersectA <- vector("list", nrow(x))
>>
>> Best regards,
>>
>>
>> ir. Thierry Onkelinx
>> Statisticus / Statistician
>>
>> Vlaamse Overheid / Government of Flanders
>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
>> FOREST
>> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
>> thierry.onkelinx using inbo.be
>> Havenlaan 88
>> <https://maps.google.com/?q=Havenlaan+88&entry=gmail&source=g> bus 73,
>> 1000 Brussel
>> www.inbo.be
>>
>> ////////////////////////////////////////////////////////////
>> ///////////////////////////////
>> To call in the statistician after the experiment is done may be no more
>> than asking him to perform a post-mortem examination: he may be able to say
>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>> The plural of anecdote is not data. ~ Roger Brinner
>> The combination of some data and an aching desire for an answer does not
>> ensure that a reasonable answer can be extracted from a given body of data.
>> ~ John Tukey
>> ////////////////////////////////////////////////////////////
>> ///////////////////////////////
>>
>> <https://www.inbo.be>
>>
>> 2018-07-25 8:55 GMT+02:00 Bogdan Tanasa <tanasa using gmail.com>:
>>
>>> Dear all,
>>>
>>> assuming that I do have a dataframe like :
>>>
>>> x <- data.frame(TYPE=c("DEL", "DEL", "DUP", "TRA", "INV", "TRA"),
>>> CHRA=c("chr1", "chr1", "chr1", "chr1", "chr2", "chr2"),
>>> POSA=c(10, 15, 120, 340, 100, 220),
>>> CHRB=c("chr1", "chr1", "chr1", "chr2", "chr2", "chr1"),
>>> POSB=c(30, 100, 300, 20, 200, 320)) ,
>>>
>>> how could I initiate another 2 columns in x, where each element in these 2
>>> columns is going to be a list (the list could be updated later). Thank
>>> you !
>>>
>>> Shall I do,
>>>
>>> for (i in 1:dim(x)[1]) { x$intersectA[i] <- list()}
>>>
>>> for (i in 1:dim(x)[1]) { x$intersectB[i] <- list()}
>>>
>>> nothing is happening. Thank you very much !
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil using dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
More information about the R-help
mailing list