[R] Random Forest - Strata
Tim Howard
tghoward at gw.dec.state.ny.us
Wed Jul 21 13:11:24 CEST 2010
Coll,
An alternative approach is to do that subsetting yourself before sending it to RF and treat each group as an external validation group, as follows:
- extract Site A, build a RF model (Model 1) on sites B and C
- validate this model by running a predict on site A using the model, use ROCR or other evaluation metrics to look at the effectiveness of Model 1.
- extract Site B, build a RF model (Model 2) on sites A and C.
- validate this model by trying to predict presence in site B using model 2.
- continue through all your sites.
This is called 'leave-one-out' and is used in some fields for model validation. You final accuracy estimates of your model could be based on the averages of values obtained for each model.
Hope that Helps.
Tim
------------------------------
Message: 44
Date: Tue, 20 Jul 2010 08:48:04 -0700 (PDT)
From: Coll <gbcoll2 at gmail.com>
To: r-help at r-project.org
Subject: [R] Random Forest - Strata
Message-ID: <1279640884553-2295731.post at n4.nabble.com>
Content-Type: text/plain; charset=us-ascii
Hi all,
Had struggled in getting "Strata" in randomForest to work on this.
Can I get randomForest for each of its TREE, to get ALL sample from some
strata to build tree, while leaving some strata TOTALLY untouched as oob?
e.g. in below, how I can tell RF to,
- for tree 1 in the forest, to use only Site A and B to build the tree,
while using the WHOLE Site C data for the oob error rate,
- for tree 2, use only site A and C to build tree, while using whole site B
data for oob
- for tree 3, use Site B and C, A as oob...?
My command does not work as it would use some sample in all of the sites:
rforest.obj <- randomForest(Presence.f ~., data=dataset.subset, strata =
site.factor)
while
the setting the corresponding "sampsize" argument seems would only screen
out the Site in all tree building...
Site Presence Length Sulphur
A Yes 3.50 19.42
A No 3.90 51.09
A No 3.60 26.75
B Yes 2.60 9.71
B No 2.20 9.77
B No 2.60 8.60
B No 3.00 35.59
C Yes 3.50 16.07
C No 3.40 49.96
C No 3.10 35.35
Any idea / comments are welcomed.
Thanks in advance.
Coll
--
View this message in context: http://r.789695.n4.nabble.com/Random-Forest-Strata-tp2295731p2295731.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list