[R] programming: telling a function where to look for the entered variables

E Hofstadler e.hofstadler at gmail.com
Fri Apr 1 14:54:08 CEST 2011


2011/4/1 Nick Sabbe <nick.sabbe at ugent.be>:
> This should be a version that does what you want.

Indeed it does, thank you very much!

> Because you named the variable lvarname, I assumed you were already passing
> "lvar" instead of trying to pass lvar (without the quotes), which is in no
> way a 'name'.

Sorry about that, I can see how my variable names were somewhat confusing.

Many thanks once again!

>
>
>
> -----Original Message-----
> From: irene.prix at googlemail.com [mailto:irene.prix at googlemail.com] On Behalf
> Of E Hofstadler
> Sent: vrijdag 1 april 2011 14:28
> To: Nick Sabbe
> Cc: r-help at r-project.org
> Subject: Re: [R] programming: telling a function where to look for the
> entered variables
>
> Thanks Nick and Juan for your replies.
>
> Nick, thanks for pointing out the warning in subset(). I'm not sure
> though I understand the example you provided -- because despite using
> subset() rather than bracket notation, the original function (myfunct)
> does what is expected of it. The problem I have is with the second
> function (myfunct.better), where variable names + dataframe are not
> fixed within the function but passed to the function when calling it
> -- and even with bracket notation I don't quite manage to tell R where
> to look for the columns that related to the entered column names.
> (but then perhaps I misunderstood you)
>
> This is what I tried (using bracket notation):
>
> myfunct.better(dataframe, subgroup, lvarname,yvarname){
> Data.tmp <- dataframe[dataframe[,deparse(substitute(lvarname))]==subgroup,
> c("xvar",deparse(substitute(yvarname)))]
> }
>
> but this creates an empty contingency table only -- perhaps because my
> use of deparse() is flawed (I think what is converted into a string is
> "lvarname" and "yvarname", rather than the column names that these two
> function-variables represent in the dataframe)?
>
>
> 2011/4/1 Nick Sabbe <nick.sabbe at ugent.be>:
>> See the warning in ?subset.
>> Passing the column name of lvar is not the same as passing the 'contextual
>> column' (as I coin it in these circumstances).
>> You can solve it by indeed using [] instead.
>>
>> For my own comfort, here is the relevant line from your original function:
>> Data.tmp <- subset(Fulldf, lvar==subgroup, select=c("xvar","yvar"))
>> Which should become something like (untested but should be close):
>> Data.tmp <- Fulldf[Fulldf[,"lvar"]==subgroup, c("xvar","yvar")]
>>
>> This should be a lot easier to translate based on column names, as the
>> column names are now used as such.
>>
>> HTH,
>>
>>
>> Nick Sabbe
>> --
>> ping: nick.sabbe at ugent.be
>> link: http://biomath.ugent.be
>> wink: A1.056, Coupure Links 653, 9000 Gent
>> ring: 09/264.59.36
>>
>> -- Do Not Disapprove
>>
>>
>>
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On
>> Behalf Of E Hofstadler
>> Sent: vrijdag 1 april 2011 13:09
>> To: r-help at r-project.org
>> Subject: [R] programming: telling a function where to look for the entered
>> variables
>>
>> Hi there,
>>
>> Could someone help me with the following programming problem..?
>>
>> I have written a function that works for my intended purpose, but it
>> is quite closely tied to a particular dataframe and the names of the
>> variables in this dataframe. However, I'd like to use the same
>> function for different dataframes and variables. My problem is that
>> I'm not quite sure how to tell my function in which dataframe the
>> entered variables are located.
>>
>> Here's some reproducible data and the function:
>>
>> # create reproducible data
>> set.seed(124)
>> xvar <- sample(0:3, 1000, replace = T)
>> yvar <- sample(0:1, 1000, replace=T)
>> zvar <- rnorm(100)
>> lvar <- sample(0:1, 1000, replace=T)
>> Fulldf <- as.data.frame(cbind(xvar,yvar,zvar,lvar))
>> Fulldf$xvar <- factor(xvar, labels=c("blue","green","red","yellow"))
>> Fulldf$yvar <- factor(yvar, labels=c("area1","area2"))
>> Fulldf$lvar <- factor(lvar, labels=c("yes","no"))
>>
>> and here's the function in the form that it currently works: from a
>> subset of the dataframe Fulldf, a contingency table is created (in my
>> actual data, several other operations are then performed on that
>> contingency table, but these are not relevant for the problem in
>> question, therefore I've deleted it) .
>>
>> # function as it currently works: tailored to a particular dataframe
>> (Fulldf)
>>
>> myfunct <- function(subgroup){ # enter a particular subgroup for which
>> the contingency table should be calculated (i.e. a particular value of
>> the factor lvar)
>> Data.tmp <- subset(Fulldf, lvar==subgroup, select=c("xvar","yvar"))
>> #restrict dataframe to given subgroup and two columns of the original
>> dataframe
>> Data.tmp <- na.omit(Data.tmp) # exclude missing values
>> indextable <- table(Data.tmp$xvar, Data.tmp$yvar) # make contingency table
>> return(indextable)
>> }
>>
>> #Since I need to use the function with different dataframes and
>> variable names, I'd like to be able to tell my function the name of
>> the dataframe and variables it should use for calculating the index.
>> This is how I tried to modify the first part of the #function, but it
>> didn't work:
>>
>> # function as I would like it to work: independent of any particular
>> dataframe or variable names (doesn't work)
>>
>> myfunct.better <- function(subgroup, lvarname, yvarname, dataframe){
>> #enter the subgroup, the variable names to be used and the dataframe
>> in which they are found
>>    Data.tmp <- subset(dataframe, lvarname==subgroup, select=c("xvar",
>> deparse(substitute(yvarname)))) # trying to subset the given dataframe
>> for the given subgroup of the given variable. The variable "xvar"
>> happens to have the same name in all dataframes) but the variable
>> yvarname has different names in the different dataframes
>> Data.tmp <- na.omit(Data.tmp)
>>    indextable <- table(Data.tmp$xvar, Data.tmp$yvarname) # create the
>> contingency table on the basis of the entered variables
>> return(indextable)
>> }
>>
>> calling
>>
>> myfunct.better("yes", lvarname=lvar, yvarname=yvar, dataframe=Fulldf)
>>
>> results in the following error:
>>
>> Error in `[.data.frame`(x, r, vars, drop = drop) :
>>  undefined columns selected
>>
>> My feeling is that R doesn't know where to look for the entered
>> variables (lvar, yvar), but I'm not sure how to solve this problem. I
>> tried using with() and even attach() within the function, but that
>> didn't work.
>>
>> Any help is greatly appreciated.
>>
>> Best,
>> Esther
>>
>> P.S.:
>> Are there books that elaborate programming in R for beginners -- and I
>> mean things like how to best use vectorization instead of loops and
>> general "best practice" tips for programming. Most of the books I've
>> been looking at focus on applying R for particular statistical
>> analyses, and only comparably briefly deal with more general
>> programming aspects. I was wondering if there's any books or tutorials
>> out there that cover the latter aspects in a more elaborate and
>> systematic way...?
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>



More information about the R-help mailing list