[R] applying math/stat functions to rows in data frame
Marc Schwartz
marc_schwartz at comcast.net
Sat Sep 15 18:32:11 CEST 2007
On Sat, 2007-09-15 at 09:02 -0700, Gerard Smits wrote:
> Hi All,
>
> There are a variety of functions that can be applied to a variable
> (column) in a data frame: mean, min, max, sd, range, IQR, etc.
>
> I am aware of only two that work on the rows, using q1-q3 as example
> variables:
>
> rowMeans(cbind(q1,q2,q3),na.rm=T) #mean of multiple variables
> rowSums (cbind(q1,q2,q3),na.rm=T) #sum of multiple variables
>
> Can the standard column functions (listed in the first sentence) be
> applied to rows, with the use of correct indexes to reference the
> columns of interest? Or, must these summary functions be programmed
> separately to work on a row?
>
> Thanks,
>
> Gerard
The answer is: it depends
If the row can be coerced to a numeric vector, then yes. This presumes
that the data frame contains a single data type or the subset of columns
you need contains a single data type.
If the row contains multiple data types, then the row becomes a single
row data frame or a list and you would have to consider other possible
approaches.
For example:
Taking the first row of the 'iris' dataset becomes a single row data
frame:
> str(iris[1, ])
'data.frame': 1 obs. of 5 variables:
$ Sepal.Length: num 5.1
$ Sepal.Width : num 3.5
$ Petal.Length: num 1.4
$ Petal.Width : num 0.2
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1
or if you set 'drop = TRUE', a list:
> str(iris[1, , drop = TRUE])
List of 5
$ Sepal.Length: num 5.1
$ Sepal.Width : num 3.5
$ Petal.Length: num 1.4
$ Petal.Width : num 0.2
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1
If however, you remove the last column Species, which is a factor, you
can coerce the remaining object to a numeric matrix:
> str(as.matrix(iris[, -5]))
num [1:150, 1:4] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
Some functions will do this coercion internally:
For example:
> rowSums(iris)
Error in rowSums(x, prod(dn), p, na.rm) : 'x' must be numeric
However:
> head(rowSums(iris[, -5]))
[1] 10.2 9.5 9.4 9.4 10.2 11.4
HTH,
Marc Schwartz
More information about the R-help
mailing list