[R] difftimes; histogram; memory problems
Gabor Grothendieck
ggrothendieck at gmail.com
Tue Feb 16 04:45:52 CET 2010
Here are two approaches to try:
> # test data
> d1 <- data.frame(x = Sys.Date() + 1:3)
> d2 <- data.frame(x = Sys.Date() - 1:3)
> # 1. you might not have enough memory for this but its short
> table(outer(1:3, -(1:3), "-"))
2 3 4 5 6
1 2 3 2 1
> # 2. this one performs all the operations outside of R getting
> # result back in so it won't need as much memory
>
> library(sqldf)
> sqldf("select d1.x - d2.x, count(*) from d1, d2 group by d1.x - d2.x")
d1.x - d2.x count(*)
1 2 1
2 3 2
3 4 3
4 5 2
5 6 1
On Mon, Feb 15, 2010 at 9:17 PM, Jonathan <jonsleepy at gmail.com> wrote:
> Let me fix a couple of typos in that email:
>
> Hi All:
>
> Let's say I have two dataframes (Condition1 and Condition2); each
> being on the order of 12,000 and 16,000 rows; 1 column. The entries
> contain dates.
>
> I'd like to calculate, for each possible pair of dates (that is:
> Condition1[1:12,000] and Condition2[1:16,000], the number of days
> difference between the dates in the pair. The result should be a
> matrix 12,000 by 16,000, which I'll call M. The purpose of building
> such a matrix M is to create a histogram of all the values contained
> within it.
>
> Ex):
> Condition1 <- data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000))
> Condition2 <- data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000))
>
> First, my instinct is to try and vectorize the operation. I tried
> this by expanding each vector into a matrix of repeated vectors (I'd
> then just subtract the two resultant matrices to get matrix M). I got
> the following error:
>
>> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)), byrow=TRUE, ncol=nrow(Condition1))
> Error: cannot allocate vector of size 732.4 Mb
>> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)), byrow=FALSE, nrow=nrow(Condition2))
> Error: cannot allocate vector of size 732.4 Mb
>
> Since it seems these matrices are too large, I'm wondering whether
> there's a better way to call a hist command without actually building
> the said matrix..
>
> I'd greatly appreciate any ideas!
>
> Best,
> Jonathan
>
> On Mon, Feb 15, 2010 at 8:19 PM, Jonathan <jonsleepy at gmail.com> wrote:
>> Hi All:
>>
>> Let's say I have two dataframes (Condition1 and Condition2); each
>> being on the order of 12,000 and 16,000 rows; 1 column. The entries
>> contain dates.
>>
>> I'd like to calculate, for each possible pair of dates (that is:
>> Condition1[1:10,000] and Condition2[1:10,000], the number of days
>> difference between the dates in the pair. The result should be a
>> matrix 12,000 by 16,000. Really, what I need is a histogram of all
>> the values in this matrix.
>>
>> Ex):
>> Condition1 <- data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000))
>> Condition2 <- data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000))
>>
>> First, my instinct is to try and vectorize the operation. I tried
>> this by expanding each vector into a matrix of repeated vectors (I'd
>> then just subtract the two). I got the following error:
>>
>>> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)), byrow=TRUE, ncol=nrow(Condition1))
>> Error: cannot allocate vector of size 732.4 Mb
>>> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)), byrow=FALSE, nrow=nrow(Condition2))
>> Error: cannot allocate vector of size 732.4 Mb
>>
>> Since it seems these matrices are too large, I'm wondering whether
>> there's a better way to call a hist command without actually building
>> the said matrix..
>>
>> I'd greatly appreciate any ideas!
>>
>> Best,
>> Jonathan
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list