[R] Joining two datasets - recursive procedure?
Luca Meyer
lucam1968 at gmail.com
Wed Mar 18 07:17:32 CET 2015
Hello,
I am facing a quite challenging task (at least to me) and I was wondering
if someone could advise how R could assist me to speed the task up.
I am dealing with a dataset with 3 discrete variables and one continuous
variable. The discrete variables are:
V1: 8 modalities
V2: 13 modalities
V3: 13 modalities
The continuous variable V4 is a decimal number always greater than zero in
the marginals of each of the 3 variables but it is sometimes equal to zero
(and sometimes negative) in the joint tables.
I have got 2 files:
=> one with distribution of all possible combinations of V1xV2 (some of
which are zero or neagtive) and
=> one with the marginal distribution of V3.
I am trying to build the long and narrow dataset V1xV2xV3 in such a way
that each V1xV2 cell does not get modified and V3 fits as closely as
possible to its marginal distribution. Does it make sense?
To be even more specific, my 2 input files look like the following.
FILE 1
V1,V2,V4
A, A, 24.251
A, B, 1.065
(...)
B, C, 0.294
B, D, 2.731
(...)
H, L, 0.345
H, M, 0.000
FILE 2
V3, V4
A, 1.575
B, 4.294
C, 10.044
(...)
L, 5.123
M, 3.334
What I need to achieve is a file such as the following
FILE 3
V1, V2, V3, V4
A, A, A, ???
A, A, B, ???
(...)
D, D, E, ???
D, D, F, ???
(...)
H, M, L, ???
H, M, M, ???
Please notice that FILE 3 need to be such that if I aggregate on V1+V2 I
recover exactly FILE 1 and that if I aggregate on V3 I can recover a file
as close as possible to FILE 3 (ideally the same file).
Can anyone suggest how I could do that with R?
Thank you very much indeed for any assistance you are able to provide.
Kind regards,
Luca
[[alternative HTML version deleted]]
More information about the R-help
mailing list