[R] mapple
Jim Lemon
drj|m|emon @end|ng |rom gm@||@com
Wed Oct 2 23:42:49 CEST 2019
Hi Phillip,
You have chosen one of the best methods to learn any computer language
(and many other things) by using it to do something that you want to
do. Start with your "change of date" problem. As you saw, different
people suggested different ways to get the same result. If I remember
correctly, I suggested using the difference between successive dates:
# create some dates
dates<-as.Date(paste(rep(2019,10),rep(9,10),c(10,10,11,11,11,14,17,19,19,20
),sep="-"))
dates
[1] "2019-09-10" "2019-09-10" "2019-09-11" "2019-09-11" "2019-09-11"
[6] "2019-09-14" "2019-09-17" "2019-09-19" "2019-09-19" "2019-09-20"
# get the difference between successive dates
diffdates<-diff(dates)
diffdates
Time differences in days
[1] 0 1 0 0 3 3 2 0 1
# a change of date produces a difference greater than 0
diffdates > 0
[1] FALSE TRUE FALSE FALSE TRUE TRUE TRUE FALSE TRUE
# because a logical value will automatically be coerced to numeric
# in a calculation, you can show the _changes_ in dates using the
# cumulative sum of the TRUE (1) values
cumsum(diffdates > 0)
[1] 0 1 1 1 2 3 4 4 5
# note that because the length of the differences will be one less
# then the length of values, you have to make up the first value
# probably NA or zero
How did I choose the three simple functions to get the result? Because
I knew that you wanted to count the changes in date, I chose "diff" to
give me a vector of the change values. From here, you just want to
_increment_ a count based on the changes, regardless of their values.
So I converted each date change value to a logical value. If the
change value was zero, it remained zero. If it was greater than zero,
it became TRUE (1). As you can see, the cumulative sum of these
logical values returns a cumulative count of changes.
There are two stages in this. First, create a sequence of steps to go
from the information you have to that you want:
1) find where the date changes occur
2) manipulate this vector into a vector that increments its value at each change
This part can be hard for almost anyone. If you have a mind that is
good at analyzing a problem, it's a lot easier. Second, choose
functions that actually perform the steps:
1) find where the date changes occur (diff)
2) change this vector into the increment values you want (>0 produces
TRUE/FALSE = 1/0)
3) sum up the increments (cumsum)
The second part is the hard one for most beginners. Sure, I can
imagine what steps are needed, but which of the hundreds of basic,
cryptically-named functions will perform these steps? My view is that
these two rely on both native ability and experience.
Now for the conditional to value problem. When I run the code I sent
you, I get the answer I expected:
phdf<-read.table(text="v1 v2 v3 v4 v5 code
+ 0 0 0 0 0 1
+ 1 4 0 0 0 1
+ 1 1 0 0 0 1
+ 1 0 1 0 0 1
+ 2 0 1 0 0 1
+ 0 1 0 0 0 1
+ 0 1 2 0 0 1
+ 0 1 2 3 0 1
+ 0 2 3 4 4 1
+ 0 0 0 2 3 1",
+ header=TRUE,
+ stringsAsFactors=FALSE)
> rules<-list("x[1]==0&&x[2]==0&&x[3]==0&&x[4]==0&&x[5]==0",
+ "x[1]==1&&x[2]==1","x[1]==0&&x[2]==1&&x[3]==2")
> outcomes<-c(1,5,10)
> apply_rule<-function(x,rule) return(eval(parse(text=rule)))
> for(ri in 1:length(rules))
+ phdf[apply(phdf,1,apply_rule,rules[[ri]]),"code"] <- outcomes[ri]
> phdf
v1 v2 v3 v4 v5 code
1 0 0 0 0 0 1
2 1 4 0 0 0 1
3 1 1 0 0 0 5
4 1 0 1 0 0 1
5 2 0 1 0 0 1
6 0 1 0 0 0 1
7 0 1 2 0 0 10
8 0 1 2 3 0 10
9 0 2 3 4 4 1
10 0 0 0 2 3 1
What I did here was to create a list of rules (conditional statements
that return TRUE/FALSE) and a vector of values that will be
substituted when the corresponding conditional statement is TRUE.
Next, I have to create a function that can be passed as the FUN
argument to a *apply statement. What it does is to convert the
character string of a conditional statement in "rules" to an
expression, evaluate it, and return the logical outcome.
Stepping through the rules one by one produces a logical vector of
TRUE/FALSE for each rule. I use that vector to index the rows of phdf,
indexing the column with its name ("code") thereby changing the value
of phdf$code where the condition is TRUE to the corresponding value of
"outcomes".
Again, this was done by working out the steps necessary to get the
result, then deciding which functions would perform those steps. It
may seem difficult because it is. Like the famous Socratic dialog in
which he appears to get one of Meno's slaves to answer a mathematical
problem that is clearly beyond his knowledge, Socrates carefully
frames his questions in a way that the slave can answer correctly. I
will not claim, as Socrates did, that this proves that the soul of a
function is immortal and already possesses the knowledge to solve the
problem. Socrates demonstrated that if you ask the right simple
questions, you can put the simple answers together to get the result
you want. There are a lot of people who still do this, but in science
it is known as "confirmatory bias".
Jim
On Thu, Oct 3, 2019 at 1:45 AM Phillip Heinrich <herd_dog using cox.net> wrote:
>
> Can't seem to get past the rules statement that you suggested. I get an
> invalid argument to unary operator error. The class of the dataframe is
> "list" and the structure is "factor". Looked up "unary operator" but I
> don't really understand what that means.
>
> Thanks.
>
> class(phdf)
> [1] "list"
>
> > str(phdf)
> List of 6
> $ v1 : Factor w/ 3 levels "0","1","2": 1 2 2 2 3 1 1 1 1 1
> $ v2 : Factor w/ 4 levels "0","1","2","4": 1 4 2 1 1 2 2 2 3 1
> $ v3 : Factor w/ 4 levels "0","1","2","3": 1 1 1 2 2 1 3 3 4 1
> $ v4 : Factor w/ 4 levels "0","2","3","4": 1 1 1 1 1 1 1 3 4 2
> $ v5 : Factor w/ 3 levels "0","3","4": 1 1 1 1 1 1 1 1 3 2
> $ code: Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1
>
> > rules <- list("x[1]==0 && x[2]==0 && x[3]==0 && x[4]==0 && x[5]==0",+
> + "x[1]==1 && x[2]==1 && x[3]==0 && x[4]==0 && x[5]==10")
>
> Error in +"x[1]==1 && x[2]==1 && x[3]==0 && x[4]==0 && x[5]==10" :
> invalid argument to unary operator
>
> -----Original Message-----
> From: Jim Lemon
> Sent: Tuesday, October 1, 2019 8:24 PM
> To: Phillip Heinrich
> Cc: r-help
> Subject: Re: [R] mapple
>
> Hi Phillip,
> The following seems to do what you want:
>
> phdf<-read.table(text="v1 v2 v3 v4 v5 code
> 0 0 0 0 0 1
> 1 4 0 0 0 1
> 1 1 0 0 0 1
> 1 0 1 0 0 1
> 2 0 1 0 0 1
> 0 1 0 0 0 1
> 0 1 2 0 0 1
> 0 1 2 3 0 1
> 0 2 3 4 4 1
> 0 0 0 2 3 1",
> header=TRUE,
> stringsAsFactors=FALSE)
> rules<-list("x[1]==0&&x[2]==0&&x[3]==0&&x[4]==0&&x[5]==0",
> "x[1]==1&&x[2]==1","x[1]==0&&x[2]==1&&x[3]==2")
> outcomes<-c(1,5,10)
> apply_rule<-function(x,rule) return(eval(parse(text=rule)))
> for(ri in 1:length(rules))
> phdf[apply(phdf,1,apply_rule,rules[[ri]]),"code"] <- outcomes[ri]
>
> and can be expanded to the number of rules that you want. BUT, you
> have not specified a non-match value, so your initial values for
> "code" will persist.
>
> Jim
>
> On Wed, Oct 2, 2019 at 12:32 PM Phillip Heinrich <herd_dog using cox.net> wrote:
> >
> > With the snippet of data below I’m trying to do an if/then type of thing:
> > row 1 – if all five variables equal 0 then code equals 1;
> > row 3 – if v1 = 1 and v2 = 1 then code = 5;
> > row 7 – if v1 = 0 and v2 = 1 and v3 = 2 then code = 10
> >
> > There are 24 codes in the complete database.
> >
> >
> > v1 v2 v3 v4 v5 code
> > 1 0 0 0 0 0 1
> > 2 1 4 0 0 0 1
> > 3 1 1 0 0 0 1
> > 4 1 0 1 0 0 1
> > 5 2 0 1 0 0 1
> > 6 0 1 0 0 0 1
> > 7 0 1 2 0 0 1
> > 8 0 1 2 3 0 1
> > 9 0 2 3 4 4 1
> > 10 0 0 0 2 3 1 I understand that the mapply function can do
> > things like this but I have been reading documentation and poking around
> > with Google but am getting nowhere. Any advise whould be greatly
> > appreciated.
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list