[R] How to sum values across multiple variables using a wildcard?
mtb954@gmail.com
mtb954 at gmail.com
Tue Feb 21 02:32:07 CET 2006
Thanks Gabor, Simon and Marc...I got this to work with the grep() and
rowSums examples you provided.
Mark
On 2/20/06, Marc Schwartz <MSchwartz at mn.rr.com> wrote:
> On Mon, 2006-02-20 at 18:41 -0600, mtb954 at gmail.com wrote:
> > I have a dataframe called "data" with 5 records (in rows) each of
> > which has been scored on each of many variables (in columns).
> >
> > Five of the variables are named var1, var2, var3, var4, var5 using
> > headers. The other variables are named using other conventions.
> >
> > I can create a new variable called var6 with the value 15 for each
> > record with this code:
> >
> > > var6=var1+var2+var3+var4+var5
> >
> > but this is tedious for my real dataset with dozens of variables. I
> > would rather use a wildcard to add up all the variables that begin
> > with "Var" like this pseudocode:
> >
> > > Var6=sum(var*)
> >
> > Any suggestions for implementing this in R? Thanks! Mark
>
> Here is one approach using grep().
>
> Given a data frame called MyDF with the following structure:
>
> > str(MyDF)
> `data.frame': 10 obs. of 20 variables:
> $ other4 : num -0.869 0.376 -2.022 0.619 -0.129 ...
> $ var8 : num -0.380 1.428 -1.075 -0.796 -0.588 ...
> $ var4 : num -0.0850 -0.7335 -0.5019 -1.1633 -0.0197 ...
> $ other9 : num 0.0210 -0.6455 0.0289 1.2405 -1.3359 ...
> $ var10 : num 0.647 -0.798 0.180 1.135 -0.258 ...
> $ other2 : num 0.1332 -0.2227 0.0423 0.6881 2.0304 ...
> $ other10: num 0.811 2.166 0.569 0.302 0.669 ...
> $ var1 : num -0.774 -1.812 -1.230 -0.969 0.245 ...
> $ var2 : num -0.0538 0.3712 0.8222 -0.8025 -0.6914 ...
> $ other6 : num 0.871 0.291 2.079 1.098 1.025 ...
> $ other1 : num -0.5130 0.1358 0.8744 0.0997 1.7458 ...
> $ var9 : num 0.664 -0.456 0.415 2.090 -0.283 ...
> $ other3 : num -0.425 -0.283 0.706 -1.879 -0.828 ...
> $ other7 : num 0.100 0.177 0.570 -0.631 -1.009 ...
> $ var3 : num 1.446 -0.862 0.184 1.077 0.146 ...
> $ var5 : num 0.402 -0.498 -0.906 0.641 1.690 ...
> $ var6 : num 0.892 -0.242 0.561 0.530 -0.291 ...
> $ other5 : num -1.210 0.815 -1.284 -0.152 0.329 ...
> $ other8 : num -0.265 -1.278 1.152 0.232 -1.189 ...
> $ var7 : num -0.616 -0.994 -0.263 1.626 -1.372 ...
>
>
> Note that the column names are either var* or other*.
>
> Using grep() we get the indices of the column names that contain "other"
> plus one or more following characters, where the "other" begins the
> word:
>
> > grep("\\bother.", names(MyDF))
> [1] 1 4 6 7 10 11 13 14 18 19
>
> See ?regexp for more information. Note that I use "\\b" to being the
> search at the starting word boundary and then "." to require that there
> be following characters. Thus, this would not match " other1" or
> " other".
>
> You can then use the following to subset the data frame MyDF and sum the
> rows for the requested columns:
>
> > rowSums(MyDF[, grep("\\bother.", names(MyDF))])
> 1 2 3 4 5 6 7
> -1.344893 1.531417 2.715234 1.616971 1.307379 4.655568 4.638446
> 8 9 10
> -2.640485 -2.226270 -2.158248
>
>
> You could use grep("other", names(MyDF)), but this would also get
> "other" if it appears anywhere in the name. For example:
>
> > grep("other", "Thisotherone")
> [1] 1
>
> It just depends upon your naming schema and how strict you need to be in
> the search.
>
> HTH,
>
> Marc Schwartz
>
>
>
More information about the R-help
mailing list