[R] by gives no results, gives warning that data are non-numeric, but the data appears to be numeric.
John Sorkin
JSorkin at grecc.umaryland.edu
Mon Dec 28 14:03:22 CET 2015
Thank you,
John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
Baltimore VA Medical Center
10 North Greene Street
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
>>> William Dunlap <wdunlap at tibco.com> 12/28/15 12:55 AM >>>
by(dataFrame, groupId, FUN) applies FUN a bunch of data.frames (row subsetsof the dataFrame input). mean() returns NA for data.frames. You could use
FUN=colMeans if you wanted column means or FUN=function(x)mean(colMeans(x))
or FUN=function(x)mean(unlist(x)) if you wanted some version of a grand mean
over all the columns.
If you want column means, you may find aggregate() more suited to the job, as it
applies FUN to each column in each row subset of the data and returns a data.frame
instead of a list of outputs of FUN.
> aggregate(mtcars[,3:5], mtcars[,2,drop=FALSE], mean)
cyl disp hp drat
1 4 105.1364 82.63636 4.070909
2 6 183.3143 122.28571 3.585714
3 8 353.1000 209.21429 3.229286
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Sun, Dec 27, 2015 at 6:55 PM, John Sorkin <jsorkin at grecc.umaryland.edu> wrote:
When I run by, I get an error message and no results. Any help in understanding what is wrong would be appreciated.
Error message:
Warning messages:
1: In mean.default(data[x, , drop = FALSE], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(data[x, , drop = FALSE], ...) :
argument is not numeric or logical: returning NA
[1] NA
[1] NA
I don't understand why I am getting the error message, and why I am not getting any results. I don't believe my data are non-numeric.
BY str works fine and confirms that the data are numeric
> by(hold,Arm,str)
'data.frame': 23 obs. of 3 variables:
$ Wtscr: num 97.2 103.9 58.2 130.9 135 ...
$ Wt0 : num 96.2 106.1 56.7 127.4 133.1 ...
$ Wt6 : num 93.8 101.7 55.5 127.6 130.9 ...
'data.frame': 16 obs. of 3 variables:
$ Wtscr: num 120.2 104.6 100.1 74.8 112.6 ...
$ Wt0 : num 117.2 105.3 99.5 75.7 110.7 ...
$ Wt6 : num 114.6 104.8 84.5 77.7 107.4 ...
Here is a listing of my data:
> hold
Wtscr Wt0 Wt6
1 120.2 117.2 114.60
2 104.6 105.3 104.80
3 97.2 96.2 93.80
4 103.9 106.1 101.70
5 58.2 56.7 55.50
6 130.9 127.4 127.60
7 135.0 133.1 130.90
8 100.1 99.5 84.50
9 130.3 115.3 115.80
10 150.5 148.7 133.40
11 74.8 75.7 77.70
12 112.6 110.7 107.40
13 90.0 91.0 83.40
14 139.1 138.5 126.70
15 99.1 96.4 95.70
16 108.3 107.5 109.30
17 75.1 72.9 72.20
18 97.5 102.1 98.50
19 202.2 90.1 90.60
20 91.7 89.4 93.40
21 102.1 102.2 100.80
22 116.9 118.9 118.00
23 94.6 95.3 90.30
24 122.2 117.0 117.00
25 105.6 103.3 103.60
26 96.9 96.8 98.80
27 102.9 100.3 89.00
28 115.8 118.5 117.30
29 95.7 96.2 95.40
30 88.2 86.9 88.30
31 108.7 108.8 108.80
32 89.2 88.6 81.20
33 86.8 86.5 82.70
34 135.5 130.1 125.40
35 112.5 113.9 111.45
36 111.0 105.3 109.50
37 103.4 100.5 95.50
38 117.6 117.4 101.40
39 116.7 118.5 101.80
The INDEX is clearly a factor:
> Arm
The data and the index have the same length:
> cbind(hold,Arm)
Wtscr Wt0 Wt6 Arm
1 120.2 117.2 114.60 PUFA
2 104.6 105.3 104.80 PUFA
3 97.2 96.2 93.80 MUFA
4 103.9 106.1 101.70 MUFA
5 58.2 56.7 55.50 MUFA
6 130.9 127.4 127.60 MUFA
7 135.0 133.1 130.90 MUFA
8 100.1 99.5 84.50 PUFA
9 130.3 115.3 115.80 MUFA
10 150.5 148.7 133.40 MUFA
11 74.8 75.7 77.70 PUFA
12 112.6 110.7 107.40 PUFA
13 90.0 91.0 83.40 PUFA
14 139.1 138.5 126.70 MUFA
15 99.1 96.4 95.70 MUFA
16 108.3 107.5 109.30 PUFA
17 75.1 72.9 72.20 PUFA
18 97.5 102.1 98.50 PUFA
19 202.2 90.1 90.60 MUFA
20 91.7 89.4 93.40 MUFA
21 102.1 102.2 100.80 MUFA
22 116.9 118.9 118.00 MUFA
23 94.6 95.3 90.30 MUFA
24 122.2 117.0 117.00 PUFA
25 105.6 103.3 103.60 MUFA
26 96.9 96.8 98.80 MUFA
27 102.9 100.3 89.00 PUFA
28 115.8 118.5 117.30 MUFA
29 95.7 96.2 95.40 PUFA
30 88.2 86.9 88.30 MUFA
31 108.7 108.8 108.80 PUFA
32 89.2 88.6 81.20 MUFA
33 86.8 86.5 82.70 MUFA
34 135.5 130.1 125.40 MUFA
35 112.5 113.9 111.45 MUFA
36 111.0 105.3 109.50 MUFA
37 103.4 100.5 95.50 PUFA
38 117.6 117.4 101.40 PUFA
39 116.7 118.5 101.80 PUFA
But the by function does not work!
> by(hold,Arm,mean,na.rm=TRUE)
[1] NA
[1] NA
Warning messages:
1: In mean.default(data[x, , drop = FALSE], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(data[x, , drop = FALSE], ...) :
argument is not numeric or logical: returning NA
Perhaps this is a hint, print does not give two separate group:
> by(hold,Arm,print)
Wtscr Wt0 Wt6
3 97.2 96.2 93.80
4 103.9 106.1 101.70
5 58.2 56.7 55.50
6 130.9 127.4 127.60
7 135.0 133.1 130.90
9 130.3 115.3 115.80
10 150.5 148.7 133.40
14 139.1 138.5 126.70
15 99.1 96.4 95.70
19 202.2 90.1 90.60
20 91.7 89.4 93.40
21 102.1 102.2 100.80
22 116.9 118.9 118.00
23 94.6 95.3 90.30
25 105.6 103.3 103.60
26 96.9 96.8 98.80
28 115.8 118.5 117.30
30 88.2 86.9 88.30
32 89.2 88.6 81.20
33 86.8 86.5 82.70
34 135.5 130.1 125.40
35 112.5 113.9 111.45
36 111.0 105.3 109.50
Wtscr Wt0 Wt6
1 120.2 117.2 114.6
2 104.6 105.3 104.8
8 100.1 99.5 84.5
11 74.8 75.7 77.7
12 112.6 110.7 107.4
13 90.0 91.0 83.4
16 108.3 107.5 109.3
17 75.1 72.9 72.2
18 97.5 102.1 98.5
24 122.2 117.0 117.0
27 102.9 100.3 89.0
29 95.7 96.2 95.4
31 108.7 108.8 108.8
37 103.4 100.5 95.5
38 117.6 117.4 101.4
39 116.7 118.5 101.8
Wtscr Wt0 Wt6
3 97.2 96.2 93.80
4 103.9 106.1 101.70
5 58.2 56.7 55.50
6 130.9 127.4 127.60
7 135.0 133.1 130.90
9 130.3 115.3 115.80
10 150.5 148.7 133.40
14 139.1 138.5 126.70
15 99.1 96.4 95.70
19 202.2 90.1 90.60
20 91.7 89.4 93.40
21 102.1 102.2 100.80
22 116.9 118.9 118.00
23 94.6 95.3 90.30
25 105.6 103.3 103.60
26 96.9 96.8 98.80
28 115.8 118.5 117.30
30 88.2 86.9 88.30
32 89.2 88.6 81.20
33 86.8 86.5 82.70
34 135.5 130.1 125.40
35 112.5 113.9 111.45
36 111.0 105.3 109.50
Wtscr Wt0 Wt6
1 120.2 117.2 114.6
2 104.6 105.3 104.8
8 100.1 99.5 84.5
11 74.8 75.7 77.7
12 112.6 110.7 107.4
13 90.0 91.0 83.4
16 108.3 107.5 109.3
17 75.1 72.9 72.2
18 97.5 102.1 98.5
24 122.2 117.0 117.0
27 102.9 100.3 89.0
29 95.7 96.2 95.4
31 108.7 108.8 108.8
37 103.4 100.5 95.5
38 117.6 117.4 101.4
39 116.7 118.5 101.8
But summary works as expected, giving two groups of results!
> by(hold,Arm,summary)
Wtscr Wt0 Wt6
Min. : 58.20 Min. : 56.7 Min. : 55.5
1st Qu.: 95.75 1st Qu.: 92.7 1st Qu.: 92.0
Median :105.60 Median :103.3 Median :101.7
Mean :112.75 Mean :106.3 Mean :104.0
3rd Qu.:130.60 3rd Qu.:118.7 3rd Qu.:117.7
Max. :202.20 Max. :148.7 Max. :133.4
Wtscr Wt0 Wt6
Min. : 74.80 Min. : 72.90 Min. : 72.20
1st Qu.: 97.05 1st Qu.: 98.67 1st Qu.: 87.88
Median :104.00 Median :103.70 Median : 99.95
Mean :103.15 Mean :102.54 Mean : 97.58
3rd Qu.:113.62 3rd Qu.:112.28 3rd Qu.:107.75
Max. :122.20 Max. :118.50 Max. :117.00
BY also shows that there are no NAs in the data, and the BY works properly.
> by(hold,Arm,is.na)
John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
Baltimore VA Medical Center
10 North Greene Street
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
Confidentiality Statement:
This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Confidentiality Statement:
This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
More information about the R-help
mailing list