[R] Odp: how to automatically select certain columns using for loop in dataframe

Fri Apr 10 09:10:20 CEST 2009

Hi

I do not like complicated paste cycles too much so I would prefer

for (i in 1:4) print(na.omit(all.data[  ,last.char(names(all.data)) %in% 
col_names[i] ]))

with last.char function like this

last.char<-function(x) substring(x, first=nchar(x), last=nchar(x))

Regards
Petr


r-help-bounces at r-project.org napsal dne 10.04.2009 00:30:37:

> Hi,
> 
> I am trying to display / print certain columns in my data frame that 
share
> certain condition (for example, part of the column name). I am using for
> loop, as follow:
> 
> # below is the sample data structure
> all.data <- data.frame( NUM_A = 1:5, NAME_A = c("Andy", "Andrew", 
"Angus",
> "Alex", "Argo"),
>                         NUM_B = 1:5, NAME_B = c(NA, "Barn", "Bolton",
> "Bravo", NA),
>                         NUM_C = 1:5, NAME_C = c("Candy", NA, "Cecil",
> "Crayon", "Corey"),
>                         NUM_D = 1:5, NAME_D = c("David", "Delta", NA, 
NA,
> "Dummy") )
> 
> col_names <- c("A", "B", "C", "D")
> 
> > all.data
>   NUM_A NAME_A NUM_B NAME_B NUM_C NAME_C NUM_D NAME_D
> 1     1   Andy     1   <NA>     1  Candy     1  David
> 2     2 Andrew     2   Barn     2   <NA>     2  Delta
> 3     3  Angus     3 Bolton     3  Cecil     3   <NA>
> 4     4   Alex     4  Bravo     4 Crayon     4   <NA>
> 5     5   Argo     5   <NA>     5  Corey     5  Dummy
> >
> 
> Then for each col_names, I want to display the columns:
> 
> for (each_name in col_names) {
> 
>         sub.data <- subset( all.data,
>                             !is.na( paste("NAME_", each_name, sep = '') 
),
>                             select = c( paste("NUM_", each_name, sep = 
'') ,
> paste("NAME_", each_name, sep = '') )
>                           )
>         print(sub.data)
> }
> 
> the "incorrect" result:
> 
> NUM_A NAME_A
> 1     1   Andy
> 2     2 Andrew
> 3     3  Angus
> 4     4   Alex
> 5     5   Argo
>   NUM_B NAME_B
> 1     1   <NA>
> 2     2   Barn
> 3     3 Bolton
> 4     4  Bravo
> 5     5   <NA>
>   NUM_C NAME_C
> 1     1  Candy
> 2     2   <NA>
> 3     3  Cecil
> 4     4 Crayon
> 5     5  Corey
>   NUM_D NAME_D
> 1     1  David
> 2     2  Delta
> 3     3   <NA>
> 4     4   <NA>
> 5     5  Dummy
> >
> 
> What I want to achieve is that the result should only display the NUM 
and
> NAME that is not NA. Here, the NA can be NULL, or zero (or other 
specific
> values).
> 
> the "correct" result:
> 
> NUM_A NAME_A
> 1     1   Andy
> 2     2 Andrew
> 3     3  Angus
> 4     4   Alex
> 5     5   Argo
>   NUM_B NAME_B
>  2     2   Barn
> 3     3 Bolton
> 4     4  Bravo
>    NUM_C NAME_C
> 1     1  Candy
>  3     3  Cecil
> 4     4 Crayon
> 5     5  Corey
>   NUM_D NAME_D
> 1     1  David
> 2     2  Delta
> 5     5  Dummy
> >
> 
> I am guessing that I don't use the correct type on the following 
statement
> (within the subset in the loop):
> !is.na( paste("NAME_", each_name, sep = '') )
> 
> But then, I might be completely using a wrong approach.
> 
> Any idea is definitely appreciated.
> 
> Thank you,
> 
> Ferry
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.