[R] rpart and survey weights
Erofili Grapsa
erwfili at gmail.com
Wed Jan 27 12:56:28 CET 2016
Dear R users
I have a question regarding rpart and survey weights. In the introduction
to rpart document it says "Weights are not yet supported, and will be
ignored if present", however they are somehow used as the results are
different with and without weights. Can weights now be used and if yes,
what kind of weights? Can survey weights be used safely? These are my
results with weights:
Classification tree:
rpart(formula = cl2m ~ age + day + Employed + media + geo + soclass +
persinc + hhsizeM + nfadult + nmadult + childshM, data = tum,
weights = tum$pweight, method = "class", control = rpart.control(xval =
10,
minbucket = 2, cp = 0))
Variables actually used in tree construction:
[1] age day Employed geo hhsizeM media soclass
Root node error: 11950440/16768 = 712.69
n= 16768
CP nsplit rel error xerror xstd
1 0.1980770 0 1.00000 1.00000 0.00016997
2 0.1405072 1 0.80192 0.80192 0.00017852
3 0.0300841 2 0.66142 0.66142 0.00017714
4 0.0053155 3 0.63133 0.63133 0.00017604
5 0.0025728 4 0.62602 0.62819 0.00017591
6 0.0020625 6 0.62087 0.62326 0.00017570
7 0.0020000 9 0.61468 0.62233 0.00017566
and without weights:
Classification tree:
rpart(formula = cl2m ~ age + day + Employed + media + geo + soclass +
persinc + hhsizeM + nfadult + nmadult + childshM, data = tum,
method = "class", control = rpart.control(xval = 10, minbucket = 2,
cp = 0))
Variables actually used in tree construction:
[1] age day Employed media
Root node error: 10954/16768 = 0.65327
n= 16768
CP nsplit rel error xerror xstd
1 0.192624 0 1.00000 1.00000 0.0056261
2 0.157020 1 0.80738 0.80738 0.0059018
3 0.030856 2 0.65036 0.65218 0.0058457
4 0.012872 3 0.61950 0.62050 0.0058038
5 0.002000 4 0.60663 0.60809 0.0057845
Does the root node error make sense when using survey weights? How can I
interpret it?
Regards
Erofili
[[alternative HTML version deleted]]
More information about the R-help
mailing list