[R] Strange result when subsetting a data frame based on a character variable
Karl Schilling
karl.schilling at uni-bonn.de
Tue Nov 17 20:14:15 CET 2015
Dear all,
I have one observation that I do not quite understand. Maybe someone
can clarify this issue for me.
I have a data frame which I want to subset based on a grouping variable,
say "group". Actually, "group" is a numeric value, but it is saved as a
character. I give some code to generate an exemplary data frame below.
Now, if I use
MySubset <- subset(Data, Data$group == "..")
everything works fine, as expected. ".." stands here for the value of
group given as a character string.
Surprisingly, I also get a correct subsetting if I simply give the plain
numeric value of group (like MySubset <- subset(Data, Data$group == ..),
AS LONG AS this numeric value is less then 100000.
If the numeric value is 100000 or larger, I get an empty subset.
OK, I know how to avoid this situation, but I wonder what the
explanation for this for me rather strange behavior might be.
Thank you so much for your suggestions.
Karl Schilling
#####
Exemplary code for reproducing the above described problem:
options(stringsAsFactors = F)
# set up some data frame
value <- c(1:6)
group <- rep(c("20000", "99999", "100000"), each = 2)
Data <- data.frame(value = value, group = group)
str(Data)
# subset data frame based on the value of the variable "group",
# treating this value once as a character, and once as a number:
Data20 <- subset(Data, Data$group =="20000")
str(Data20)
Data20N <- subset(Data, Data$group ==20000)
str(Data20N)
Data99 <- subset(Data, Data$group =="99999")
str(Data99)
Data99N <- subset(Data, Data$group ==99999)
str(Data99N)
Data100 <- subset(Data, Data$group =="100000")
str(Data100)
Data100N <- subset(Data, Data$group ==100000)
str(Data100N)
--
Karl Schilling
More information about the R-help
mailing list