[R] Problem Subsetting Rows that Have NA's
peter dalgaard
pdalgd at gmail.com
Wed Oct 25 22:02:36 CEST 2017
It's not a bug, and the rationale has been hashed over since the beginning of time...
It is a bit of an annoyance in some contexts and part of the rationale for the existence of subset().
If you need an explanation, start with elementary vector indexing:
colors <- c("red", "green", "blue")
colors[c(1,3,2,NA,3)]
You pretty clearly want the result to be a vector of length 5 with 4th element NA, right?
Same story if you index into a data frame:
> airquality[c(1,3,2,NA,2),]
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
3 12 149 12.6 74 5 3
2 36 118 8.0 72 5 2
NA NA NA NA NA NA NA
2.1 36 118 8.0 72 5 2
Now, that's not an argument that you also get NA rows from logical indexing, but then comes the issue of automatic coercion: In colors[NA], the NA is actually mode "logical". If we removed NA indexes in logical indexing, we would have to explain why colors[c(1,NA)] has length 2 but colors[NA] has length zero (which it currently does not).
-pd
> On 25 Oct 2017, at 15:57 , BooBoo <booboo at gforcecable.com> wrote:
>
> On 10/25/2017 4:38 AM, Ista Zahn wrote:
>> On Tue, Oct 24, 2017 at 3:05 PM, BooBoo <booboo at gforcecable.com> wrote:
>>> This has every appearance of being a bug. If it is not a bug, can someone
>>> tell me what I am asking for when I ask for "x[x[,2]==0,]". Thanks.
>> You are asking for elements of x where the second column is equal to zero.
>>
>> help("==")
>>
>> and
>>
>> help("[")
>>
>> explain what happens when missing values are involved. I agree that
>> the behavior is surprising, but your first instinct when you discover
>> something surprising should be to read the documentation, not to post
>> to this list. After having read the documentation you may post back
>> here if anything remains unclear.
>>
>> Best,
>> Ista
>>
>>>> #here is the toy dataset
>>>> x <- rbind(c(1,1),c(2,2),c(3,3),c(4,0),c(5,0),c(6,NA),
>>> + c(7,NA),c(8,NA),c(9,NA),c(10,NA)
>>> + )
>>>> x
>>> [,1] [,2]
>>> [1,] 1 1
>>> [2,] 2 2
>>> [3,] 3 3
>>> [4,] 4 0
>>> [5,] 5 0
>>> [6,] 6 NA
>>> [7,] 7 NA
>>> [8,] 8 NA
>>> [9,] 9 NA
>>> [10,] 10 NA
>>>> #it contains rows that have NA's
>>>> x[is.na(x[,2]),]
>>> [,1] [,2]
>>> [1,] 6 NA
>>> [2,] 7 NA
>>> [3,] 8 NA
>>> [4,] 9 NA
>>> [5,] 10 NA
>>>> #seems like an unreasonable answer to a reasonable question
>>>> x[x[,2]==0,]
>>> [,1] [,2]
>>> [1,] 4 0
>>> [2,] 5 0
>>> [3,] NA NA
>>> [4,] NA NA
>>> [5,] NA NA
>>> [6,] NA NA
>>> [7,] NA NA
>>>> #this is more what I was expecting
>>>> x[which(x[,2]==0),]
>>> [,1] [,2]
>>> [1,] 4 0
>>> [2,] 5 0
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> I wanted to know if this was a bug so that I could report it if so. You say it is not, so you answered my question. As far as me not reading the documentation, I challenge anyone to read the cited help pages and predict the observed behavior based on the information given in those pages.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list