[R] Transforming relational data
mathijsdevaan
mathijsdevaan at gmail.com
Tue Feb 22 11:53:38 CET 2011
Hi Matthew, thanks for your help. There are some things going wrong still.
Consider this (slightly extended) example:
library(data.table)
DT = data.table(read.table(textConnection(" A B C
1 1 a 1999
2 1 b 1999
3 1 c 1999
4 1 d 1999
5 2 c 2001
6 2 d 2001
7 3 a 2004
8 3 b 2004
9 3 d 2004
10 4 c 2001
11 4 d 2001"),head=TRUE,stringsAsFactors=FALSE))
firststep = DT[,cbind(A,expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2]
firststep
C A Var1 Var2 v
1 1999 1 b a 0.2500000
2 1999 1 c a 0.2500000
3 1999 1 d a 0.2500000
4 1999 1 a b 0.2500000
5 1999 1 c b 0.2500000
6 1999 1 d b 0.2500000
7 1999 1 a c 0.2500000
8 1999 1 b c 0.2500000
9 1999 1 d c 0.2500000
10 1999 1 a d 0.2500000
11 1999 1 b d 0.2500000
12 1999 1 c d 0.2500000
13 2001 2 b a 0.2500000
14 2001 4 b a 0.2500000
15 2001 2 a b 0.2500000
16 2001 4 a b 0.2500000
17 2001 2 b a 0.2500000
18 2001 4 b a 0.2500000
19 2001 2 a b 0.2500000
20 2001 4 a b 0.2500000
21 2004 3 b a 0.3333333
22 2004 3 c a 0.3333333
23 2004 3 a b 0.3333333
24 2004 3 c b 0.3333333
25 2004 3 a c 0.3333333
26 2004 3 b c 0.3333333
Following "firststep", project 2 and 4 involved individuals a and b, while
actually c and d were involved. It seems that there is something going wrong
in transforming the data.
Then going to the final result, a list is generated of years and sums of v,
rather than a list of projects and sums of v. Probably I haven't been clear
enough: I want to produce a list of all projects and the familiarity of all
project members involved right before the start of the project.
Example
project_id familiarity
4 0.25
Members c and d were jointly involved in 3 projects: 1,2,4. Project 4 took
place in 2001, so only project 1 took place before that (1999 (project 2
took place in the same year and is therefore not included). The average
familiarity between the members in project 1 was 1/4, so:
project_id familiarity
4 0.25
Thanks!
Matthew Dowle wrote:
>
>
> Thanks for the attempt and required output. How about this?
>
> firststep = DT[,cbind(expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2]
> setkey(firststep,Var1,Var2,C)
> firststep = firststep[,transform(.SD,cv=cumsum(v)),by=list(Var1,Var2)]
> setkey(firststep,Var1,Var2,C)
> DT[, {x=data.table(expand.grid(B,B),C[1]-1L)
> firststep[x,roll=TRUE,nomatch=0][,sum(cv)] # prior familiarity
> },by=C]
> C V1
> [1,] 1999 0.0
> [2,] 2001 0.5
> [3,] 2004 2.5
>
> I think you may have said you have large data. If so, this
> method should be fast. Please let us know how you get on.
>
> HTH
> Matthew
>
>
>
> On Thu, 17 Feb 2011 23:07:19 -0800, mathijsdevaan wrote:
>
>> OK, for the last step I have tried this (among other things):
>> library(data.table)
>> DT = data.table(read.table(textConnection(" A B C 1 1 a 1999
>> 2 1 b 1999
>> 3 1 c 1999
>> 4 1 d 1999
>> 5 2 c 2001
>> 6 2 d 2001
>> 7 3 a 2004
>> 8 3 b 2004
>> 9 3 d 2004"),head=TRUE,stringsAsFactors=FALSE))
>>
>> firststep = DT[,cbind(expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2]
>> setkey(firststep,Var1,Var2)
>> list1<-firststep[J(expand.grid(DT$B,DT$B),v=1/length(DT$B)),nomatch=0]
> [,sum(v)]
>> list1
>> #27
>>
>> What I would like to get:
>> list
>> 1 0
>> 2 0.5
>> 3 2.5
>>
>> Thanks!
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context: http://r.789695.n4.nabble.com/Re-Transforming-relational-data-tp3307449p3318939.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list