[R] lapply with data frame
jim holtman
jholtman at gmail.com
Sun Feb 28 04:06:26 CET 2010
> x <- read.table(textConnection("id group value
+ 1 A 3.2
+ 2 A 3.0
+ 3 A 3.1
+ 4 B 5.5
+ 5 B 6.0
+ 6 B 6.2"), header=TRUE)
> # dataframe is processed by column by lapply
> lapply(x, c)
$id
[1] 1 2 3 4 5 6
$group
[1] 1 1 1 2 2 2
$value
[1] 3.2 3.0 3.1 5.5 6.0 6.2
> # normalize by group
> x$norm <- ave(x$value, x$group, FUN=function(a) a / sum(a))
> x
id group value norm
1 1 A 3.2 0.3440860
2 2 A 3.0 0.3225806
3 3 A 3.1 0.3333333
4 4 B 5.5 0.3107345
5 5 B 6.0 0.3389831
6 6 B 6.2 0.3502825
On Sat, Feb 27, 2010 at 9:49 PM, Noah Silverman <noah at smartmediacorp.com> wrote:
> I'm a bit confused on how to use lapply with a data.frame.
>
> For example.
>
> lapply(data, function(x) print(x))
>
> WHAT exactly is passed to the function. Is it each ROW in the data frame,
> one by one, or each column, or the entire frame in one shot?
>
> What I want to do apply a function to each row in the data frame. Is lapply
> the right way.
>
> A second application is to normalize a column value by group. For example,
> if I have the following table:
> id group value norm
> 1 A 3.2
> 2 A 3.0
> 3 A 3.1
> 4 B 5.5
> 5 B 6.0
> 6 B 6.2
> etc...
>
> The long version would be:
> foreach (group in unique(data$group)){
> data$norm[group==group] <- data$value[group==group] /
> sum(data$value[group==group])
> }
>
> There must be a faster way to do this with lapply. (Ideally, I'd then use
> mclapply to run on multi-cores and really crank up the speed.)
>
> Any suggestions?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list