[R] trouble reading in datasets
Peter Dalgaard BSA
p.dalgaard at biostat.ku.dk
Tue Oct 26 00:13:02 CEST 1999
Clayton Springer <csprin at brandybuck.ca.sandia.gov> writes:
> Dear All,
>
> I was trying to follow some of the examples in Venables and Ripley "Modern applied ... with S-plus"
> I have downloaded a copy of the iris data set and loaded into R. :
>
> however I cannot use the apply command (from p47):
>
> > apply (iris, 2 ,mean)
> Error in sum(..., na.rm = na.rm) : invalid "mode" of argument
>
> > apply (iris, c(2) ,mean)
> Error in sum(..., na.rm = na.rm) : invalid "mode" of argument
>
> also
>
> > apply (iris, c(2) ,sum)
> Error in sum(..., na.rm = na.rm) : invalid "mode" of argument
>
> So ... any suggestions as to what have I not done here?
>
> Some commands that show that I did load the dataset.
> > iris
> V1 V2 V3 V4 V5
> 1 5.1 3.5 1.4 0.2 Iris-setosa
>
Well, it's one of the built-in datasets, so you could just have typed
data(iris)
However, that wouldn't have helped you, except giving you a clue that
the problem does not lie in the reading in. apply() works on matrices
and iris is a data frame, so R tries to convert it to one. However, in
doing so, it must try to convert all elements to the same type and the
last column is a factor, so the whole thing becomes character:
> as.matrix(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 "5.1" "3.5" "1.4" "0.2" "setosa"
2 "4.9" "3.0" "1.4" "0.2" "setosa"
3 "4.7" "3.2" "1.3" "0.2" "setosa"
4 "4.6" "3.1" "1.5" "0.2" "setosa"
....
Now, here's a difference between R and S(-plus 3.4): S will happily
let you take the mean of a character variable as in
> mean("1")
[1] 1
whereas R will not
> mean("1")
Error in sum(..., na.rm = na.rm) : invalid "mode" of argument
If you get rid of the 5th column, the problem disappears:
> apply (iris[,-5], 2 ,mean)
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.843333 3.057333 3.758000 1.199333
Actually, that isn't the end of the story, because the iris data that
comes with S is stored as a 3-way array, rather than a dataframe. The
way to convert from one to the other is -ahem- left as an exercise....
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list