[R] Problem with subset() function?
Steven McKinney
smckinney at bccrc.ca
Wed Jan 21 00:02:14 CET 2009
Hi all,
Can anyone explain why the following use of
the subset() function produces a different
outcome than the use of the "[" extractor?
The subset() function as used in
density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
appears to me from documentation to be equivalent to
density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])
(modulo exclusion of NAs) but use of the former yields an
error from density.default() (shown below).
Is this a bug in the subset() machinery? Or is it
a documentation issue for the subset() function
documentation or density() documentation?
I'm seeing issues such as this with newcomers to R
who initially seem to prefer using subset() instead
of the bracket extractor. At this point these functions
are clearly not exchangeable. Should code be patched
so that they are, or documentation amended to show
when use of subset() is not appropriate?
> ### Bug in subset()?
> set.seed(123)
> mydf <- data.frame(ht = 150 + 10 * rnorm(100),
+ wt = 150 + 10 * rnorm(100),
+ age = sample(20:60, size = 100, replace = TRUE)
+ )
> density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
Error in density.default(subset(mydf, ht >= 150 & wt <= 150, select = c(age))) :
argument 'x' must be numeric
> density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])
Call:
density.default(x = mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"])
Data: mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"] (29 obs.); Bandwidth 'bw' = 5.816
x y
Min. : 4.553 Min. :3.781e-05
1st Qu.:22.776 1st Qu.:3.108e-03
Median :41.000 Median :1.775e-02
Mean :41.000 Mean :1.370e-02
3rd Qu.:59.224 3rd Qu.:2.128e-02
Max. :77.447 Max. :2.665e-02
> sessionInfo()
R version 2.8.0 Patched (2008-11-06 r46845)
powerpc-apple-darwin9.5.0
locale:
C
attached base packages:
[1] stats graphics grDevices datasets utils methods base
loaded via a namespace (and not attached):
[1] Matrix_0.999375-16 grid_2.8.0 lattice_0.17-15 lme4_0.99875-9
[5] nlme_3.1-89
>
Steven McKinney
Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre
email: smckinney +at+ bccrc +dot+ ca
tel: 604-675-8000 x7561
BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C.
V5Z 1L3
Canada
More information about the R-help
mailing list