[R] Can't compute row means of two columns of a dataframe.
@vi@e@gross m@iii@g oii gm@ii@com
@vi@e@gross m@iii@g oii gm@ii@com
Sat Jun 8 20:15:46 CEST 2024
John,
Maybe you can clarify what you want the output to look like. It took me a
while to realize what you may want as it is NOT properly described as
wanting rowsums.
There is a standard function called rowMeans() that probably does what you
want if you want the mean of all rows as in:
> rowMeans(xxxz)
[1] 84.33333 87.00000 89.66667 92.33333 95.00000 97.66667 100.33333
103.66667 106.33333 109.00000 112.33333 115.00000
[13] 118.00000 121.33333 124.00000 127.33333 130.66667 134.00000 137.00000
It does not add the means to the original data.frame if you wanted it there
but that is easy enough to do.
> xxxz$Average20 <-rowMeans(xxxz)
> head(xxxz)
TotalInches Low20 High20 Average20
1 58 84 111 84.33333
2 59 87 115 87.00000
3 60 90 119 89.66667
4 61 93 123 92.33333
5 62 96 127 95.00000
6 63 99 131 97.66667
Your construct is more complex and it looks like you want to do this to a
subset of two columns. Again, straightforward:
xxxz$Average20 <-rowMeans(xxxz[, c("Low20", "High20")])
And I probably would do this using a dplyr mutate but that is outside the
scope.
This does not help explain your error, so let me look at what you are trying
to do.
What did you expect to use by() for in the second argument? You seem to be
giving it INDICES of the first column entries. What is that for?
by(xxxz[,c("Low20","High20")],
xxxz[,"TotalInches"],
mean)
The documentation suggest this is for splitting by factors. I do not see
there are multiple instances of some TotalInches so why is this needed for
some kind of grouping?
My guess is you are using the wrong function or the wrong way for your
needs. The warnings may relate to that.
-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Sorkin, John
Sent: Saturday, June 8, 2024 1:38 PM
To: r-help using r-project.org (r-help using r-project.org) <r-help using r-project.org>
Subject: [R] Can't compute row means of two columns of a dataframe.
I have a data frame with three columns, TotalInches, Low20, High20. For each
row of the dataset, I am trying to compute the mean of Low20 and High20.
xxxz <- structure(list(TotalInches =
c(58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76), Low20 =
c(84, 87,
90, 93, 96, 99, 102, 106, 109, 112, 116, 119, 122, 126,
129,
133, 137, 141, 144), High20 = c(111, 115, 119, 123, 127,
131,
135, 140, 144, 148, 153, 157, 162, 167, 171, 176, 181,
186, 191
)), class = "data.frame", row.names = c(NA, -19L))
xxxz
str(xxxz)
xxxz$Average20 <- by(xxxz[,c("Low20","High20")],xxxz[,"TotalInches"],mean)
warnings()
When I run the code above, I don't get the means by row. I get the following
warning messages, one for each row of the dataframe.
Warning messages:
1: In mean.default(data[x, , drop = FALSE], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(data[x, , drop = FALSE], ...) :
argument is not numeric or logical: returning NA
Can someone tell my what I am doing wrong, and how I can compute the row
means?
Thank you,
John
John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical
Center Geriatrics Research, Education, and Clinical Center;
PI Biostatistics and Informatics Core, University of Maryland School of
Medicine Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;
Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list