[R] rle with data.table - is it possible?
Kate Ignatius
kate.ignatius at gmail.com
Thu Jan 1 07:56:43 CET 2015
Is it possible to add the following code or similar in data.table:
childseg<-0
x:=sumchild <-0
span<-rle(x)$lengths[rle(x)$values==TRUE
childseg[x]<-rep(seq_along(span), times = spanLOH)
childseg[childseg == 0]<-''
I was hoping to do this code by SNPEFF_GENE_NAME for mum, dad and
child. The problem I'm having is with the
span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
be added to data.table.
On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
<jdnewmil at dcn.davis.ca.us> wrote:
> I do not understand the value of using the rle function in your description,
> but the code below appears to produce the table you want.
>
> Note that better support for the data.table package might be found at
> stackexchange as the documentation specifies.
>
> x <- read.table( text=
> "Dad Mum Child Group
> AA RR RA A
> AA RR RR A
> AA AA AA B
> AA AA AA B
> RA AA RR B
> RR AA RR B
> AA AA AA B
> AA AA RA C
> AA AA RA C
> AA RR RA C
> ", header=TRUE, stringsAsFactors=FALSE )
>
> library(data.table)
> DT <- data.table( x )
> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
> DT[ , sumdad := 0L ]
> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
> DT[ , cdad := NULL ]
> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
> DT[ , summum := 0L ]
> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
> DT[ , cmum := NULL ]
> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
> DT[ , sumchild := 0L ]
> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
> DT[ , cchild := NULL ]
>
>> DT
>
> Dad Mum Child Group sumdad summum sumchild
> 1: AA RR RA A 2 2 0
> 2: AA RR RR A 2 2 1
> 3: AA AA AA B 4 5 5
> 4: AA AA AA B 4 5 5
> 5: RA AA RR B 0 5 5
> 6: RR AA RR B 4 5 5
> 7: AA AA AA B 4 5 5
> 8: AA AA RA C 3 3 0
> 9: AA AA RA C 3 3 0
> 10: AA RR RA C 3 3 0
>
>
> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>
>> I'm trying to use both these packages and wondering whether they are
>> possible...
>>
>> To make this simple, my ultimate goal is determine long stretches of
>> 1s, but I want to do this within groups (hence using the data.table as
>> I use the "set key" option. However, I'm I'm not having much luck
>> making this possible.
>>
>> For example, for simplistic sake, I have the following data:
>>
>> Dad Mum Child Group
>> AA RR RA A
>> AA RR RR A
>> AA AA AA B
>> AA AA AA B
>> RA AA RR B
>> RR AA RR B
>> AA AA AA B
>> AA AA RA C
>> AA AA RA C
>> AA RR RA C
>>
>> And the following code which I know works
>>
>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>
>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>
>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>
>> However, I wish to do the above code by Group (though this file is
>> millions of rows long and groups will be larger but just wanted to
>> simply the example).
>>
>> I did something like this but of course I got an error:
>>
>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>
>> The reason being as I want to eventually have something like this:
>>
>> Dad Mum Child Group sumdad summum sumchild
>> AA RR RA A 2 2 0
>> AA RR RR A 2 2 1
>> AA AA AA B 4 5 5
>> AA AA AA B 4 5 5
>> RA AA RR B 0 5 5
>> RR AA RR B 4 5 5
>> AA AA AA B 4 5 5
>> AA AA RA C 3 3 0
>> AA AA RA C 3 3 0
>> AA RR RA C 3 3 0
>>
>> That is, I would like to have the specific counts next to what I'm
>> consecutively counting per group. So for Group A for dad there are 2
>> AAs, there are two RRs for mum but only 1 AA or RR for the child and
>> that is RR (so the 1 is next to the RR and not the RA).
>>
>> Can this be done?
>>
>> K.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller The ..... ..... Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ---------------------------------------------------------------------------
More information about the R-help
mailing list