Luca Meyer lucam1968 at gmail.com
Wed Mar 18 07:17:32 CET 2015


I am facing a quite challenging task (at least to me) and I was wondering
if someone could advise how R could assist me to speed the task up.

I am dealing with a dataset with 3 discrete variables and one continuous
variable. The discrete variables are:

V1: 8 modalities
V2: 13 modalities
V3: 13 modalities

The continuous variable V4 is a decimal number always greater than zero in
the marginals of each of the 3 variables but it is sometimes equal to zero
(and sometimes negative) in the joint tables.

I have got 2 files:

=> one with distribution of all possible combinations of V1xV2 (some of
which are zero or neagtive) and
=> one with the marginal distribution of V3.

I am trying to build the long and narrow dataset V1xV2xV3 in such a way
that each V1xV2 cell does not get modified and V3 fits as closely as
possible to its marginal distribution. Does it make sense?

To be even more specific, my 2 input files look like the following.

A, A, 24.251
A, B, 1.065
B, C, 0.294
B, D, 2.731
H, L, 0.345
H, M, 0.000

V3, V4
A, 1.575
B, 4.294
C, 10.044
L, 5.123
M, 3.334

What I need to achieve is a file such as the following

V1, V2, V3, V4
A, A, A, ???
A, A, B, ???
D, D, E, ???
D, D, F, ???
H, M, L, ???
H, M, M, ???

Please notice that FILE 3 need to be such that if I aggregate on V1+V2 I
recover exactly FILE 1 and that if I aggregate on V3 I can recover a file
as close as possible to FILE 3 (ideally the same file).

Can anyone suggest how I could do that with R?

Thank you very much indeed for any assistance you are able to provide.

Kind regards,


