[R] Subsetting problem data, 2

Rui Barradas ruipbarradas at sapo.pt
Fri Jul 20 01:55:52 CEST 2012


Hello,

Sorry, forgot about that. It's trickier to write code without a dataset 
to test it.

Try

pattern <- "L[1-8][12]"

and after the grep print nms to see if it's right.

Rui Barradas

Em 20-07-2012 00:33, Lib Gray escreveu:
> I'm getting this error message:
>
> nms<-names(data)[grep(vars,names(data))]
> Warning message:
> In grep(vars, names(data)) :
>    argument 'pattern' has length > 1 and only the first element will be used
>
> Is there a way around this?
>
>
> On Thu, Jul 19, 2012 at 6:17 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>
>> Hello,
>>
>> I guess so, and I can save you some typing.
>>
>> vars <- sort(apply(expand.grid("L", 1:8, 1:2), 1, paste, collapse=""))
>>
>>
>> Then use it and see the result.
>>
>> Rui Barradas
>>
>> Em 20-07-2012 00:00, Lib Gray escreveu:
>>
>>> The variables are actually L11, L12, L21, L22, ... , L81, L82. Would just
>>> creating a vector c(L11,... ,L82) be fine? (I'm about to try it, but I
>>> wanted to check to see if that was going to be a big issue).
>>>
>>> On Thu, Jul 19, 2012 at 3:27 PM, Rui Barradas <ruipbarradas at sapo.pt>
>>> wrote:
>>>
>>>   Hello,
>>>> Try the following. The data is your example of Patient A through E, but
>>>> from the output of dput().
>>>>
>>>> dat <- structure(list(Patient = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
>>>> 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = c("A",
>>>> "B", "C", "D", "E"), class = "factor"), Cycle = c(1L, 2L, 3L,
>>>> 4L, 5L, 1L, 2L, 1L, 3L, 4L, 5L, 1L, 2L, 4L, 5L, 1L, 2L, 3L),
>>>>       V1 = c(0.4, 0.3, 0.3, 0.4, 0.5, 0.4, 0.4, 0.9, 0.3, NA, 0.4,
>>>>       0.2, 0.5, 0.6, 0.5, 0.1, 0.5, 0.4), V2 = c(0.1, 0.2, NA,
>>>>       NA, 0.2, NA, NA, 0.9, 0.5, NA, NA, 0.5, 0.7, 0.4, 0.5, NA,
>>>>       0.3, 0.3), V3 = c(0.5, 0.5, 0.6, 0.4, 0.5, NA, NA, 0.9, 0.6,
>>>>       NA, NA, NA, NA, NA, NA, NA, NA, NA), V4 = c(1.5, 1.6, 1.7,
>>>>       1.8, 1.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>>>>       NA), V5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>>>>       NA, NA, NA, NA, NA, NA)), .Names = c("Patient", "Cycle",
>>>> "V1", "V2", "V3", "V4", "V5"), class = "data.frame", row.names = c(NA,
>>>> -18L))
>>>>
>>>> dat
>>>>
>>>> nms <- names(dat)[grep("^V[1-9]$", names(dat))]
>>>> dd <- split(dat, dat$Patient)
>>>> fun <- function(x) any(is.na(x)) && any(!is.na(x))
>>>> ix <- sapply(dd, function(x) Reduce(`|`, lapply(x[, nms], fun)))
>>>>
>>>> dd[ix]
>>>> do.call(rbind, dd[ix])
>>>>
>>>>
>>>> I'm assuming that the variables names are as posted, V followed by one
>>>> single digit 1-9. To keep the Patients with complete cases just negate
>>>> the
>>>> index 'ix', it's a logical index.
>>>> Note also that dput() is the best way of posting a data example.
>>>>
>>>> Hope this helps,
>>>>
>>>> Rui Barradas
>>>>
>>>> Em 19-07-2012 15:15, Lib Gray escreveu:
>>>>
>>>>   Hello,
>>>>> I didn't give enough information when I sent an query before, so I'm
>>>>> trying
>>>>> again with a more detailed explanation:
>>>>>
>>>>> In this data set, each patient has a different number of measured
>>>>> variables
>>>>> (they represent tumors, so some people had 2 tumors, some had 5, etc).
>>>>> The
>>>>> problem I have is that often in later cycles for a patient, tumors that
>>>>> were originally measured are now missing (or a "new" tumor showed up).
>>>>> We
>>>>> assume there are many different reasons for why a tumor would be
>>>>> measured
>>>>> in one cycle and not another, and so I want to subset OUT the "problem"
>>>>> patients to better study these patterns.
>>>>>
>>>>> An example:
>>>>>
>>>>> Patient  Cycle  V1  V2  V3  V4  V5
>>>>> A  1  0.4  0.1  0.5  1.5  NA
>>>>> A  2  0.3  0.2  0.5  1.6  NA
>>>>> A  3  0.3  NA  0.6  1.7  NA
>>>>> A  4  0.4  NA  0.4  1.8  NA
>>>>> A  5  0.5  0.2  0.5  1.5  NA
>>>>>
>>>>> I want to keep patient A; they have 4 measured tumors, but tumor 2 is
>>>>> missing data for cycles 3 and 4
>>>>>
>>>>> B  1  0.4  NA  NA  NA  NA
>>>>> B  2  0.4  NA  NA  NA  NA
>>>>>
>>>>> I do not want to keep patient B; they have 1 tumor that is measure
>>>>> consistently in both cycles
>>>>>
>>>>> C  1  0.9  0.9  0.9  NA  NA
>>>>> C  3  0.3  0.5  0.6  NA  NA
>>>>> C  4  NA  NA  NA  NA  NA
>>>>> C  5  0.4  NA  NA  NA  NA
>>>>>
>>>>> I do want to keep patient C; all their data is missing for cycle 4 and
>>>>> cycle 5 only measured one tumor
>>>>>
>>>>> D  1  0.2  0.5  NA  NA  NA
>>>>> D  2  0.5  0.7  NA  NA  NA
>>>>> D  4  0.6  0.4  NA  NA  NA
>>>>> D  5  0.5  0.5  NA  NA  NA
>>>>>
>>>>> I do not want patient D, their two tumors were measured each cycle
>>>>>
>>>>> E  1  0.1  NA  NA  NA  NA
>>>>> E  2  0.5  0.3  NA  NA  NA
>>>>> E  3  0.4  0.3  NA  NA  NA
>>>>>
>>>>> I DO want patient E; they only had one tumor register in Cycle 1, but
>>>>> cycles 2 and 3 had two tumors.
>>>>>
>>>>>
>>>>> Thanks for any help!
>>>>>
>>>>>           [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________****________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/****listinfo/r-help<https://stat.ethz.ch/mailman/**listinfo/r-help>
>>>>> <https://stat.**ethz.ch/mailman/listinfo/r-**help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>>>> PLEASE do read the posting guide http://www.R-project.org/**
>>>>> posting-guide.html <http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html>
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>



More information about the R-help mailing list