[R] correct function formation in R
Duncan Murdoch
murdoch.duncan at gmail.com
Tue Nov 20 19:35:50 CET 2012
On 20/11/2012 12:39 PM, Omphalodes Verna wrote:
> Dear list!
>
> I have question of 'correct function formation'. Which function (fun1 or fun2; see below) is written more correctly? Using ''structure'' as output or creating empty ''data.frame'' and then transform it as output? (fun1 and fun1 is just for illustration).
>
> Thanks a lot, OV
>
> code:
> input <- data.frame(x1 = rnorm(20), x2 = rnorm(20), x3 = rnorm(20))
> fun1 <- function(x) {
> ID <- NULL; minimum <- NULL; maximum <- NULL
> for(i in seq_along(names(x))) {
> ID[i] <- names(x)[i]
> minimum[i] <- min(x[, names(x)[i]])
> maximum[i] <- max(x[, names(x)[i]])
> }
> output <- structure(list(ID, minimum, maximum), row.names = seq_along(names(x)), .Names = c("ID", "minimum", "maximum"), class = "data.frame")
> return(output)
> }
fun1 above relies on the internal implementation of the data.frame
class. That's really unlikely to change, but you still shouldn't rely
on it.
> fun2 <- function(x) {
> output <- data.frame(ID = character(), minimum = numeric(), maximum = numeric(), stringsAsFactors = FALSE)
> for(i in seq_along(names(x))) {
> output[i, "ID"] <-names(x)[i]
> output[i, "minimum"] <- min(x[, names(x)[i]])
> output[i, "maximum"] <- max(x[, names(x)[i]])
> }
> return(output)
> }
This one is going to be really slow, because it does so much indexing of
the output dataframe.
I would combine the approaches: assign to local variables in the loop
the way fun1 does, then construct a dataframe at the end. That is,
output <- data.frame(ID, minimum, maximum)
return(output)
One other change: don't initialize the local variables to NULL,
initialize them to their final size, e.g.
ID <- character(ncol(x))
minimum <- numeric(ncol(x))
maximum <- numeric(ncol(x))
(And if the contents are as simple as in the example, you don't need the
loop, but I assume the real case is more complicated.)
Duncan Murdoch
>
> fun1(input)
> fun2(input)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list