[R] Bug in by() function which works for some FUN argument and does not work for others
peter dalgaard
pdalgd at gmail.com
Fri Apr 15 11:02:55 CEST 2016
Books don't rewrite themselves retroactively....
NEWS for 3.0.0 has
• mean() for data frames and sd() for data frames and matrices are
defunct.
and 3.0.0 was released April 3, 2013.
A book published in 2012 would likely be based on R 2.13.x or maybe even 2.12.x.
So mean(dataframe) worked in the past. It was changed because of inconsistencies, e.g. mean(as.matrix(dataframe)) is a single number, median.data.frame never existed, var(dataframe) differed from sd(dataframe)^2, etc. The deprecation/defunct process started with 2.14.0-pre in October 2011.
-pd
On 15 Apr 2016, at 10:16 , Akhilesh Singh <akhileshsingh.igkv at gmail.com> wrote:
> Dear All,
>
> Thanks for your help. However, I would like to draw your attention to the
> following:
>
> Actually, I was replicating the Example 2.3, using the dataset
> "brainsize.txt" given in Section 2.3.3 ("Summarize by group") at page 55,
> of a famous book "R by Example" written by "Jim Albert and Maria Rizzo"
> published in Springers (2012) in a Use R! Series. The output of the by()
> function printed in the book is being reproduced below for information to
> all:
>
>> by(data=brain[, -1], INDICES=brain$Gender, FUN=mean, na.rm=TRUE)
> brain$Gender: Female
> FSIQ VIQ PIQ Weight Height MRI_Count
> 111.900 109.450 110.450 137.200 65.765 862654.600
> ------------------------------------------------------------
> brain$Gender: Male
> FSIQ VIQ PIQ Weight Height MRI_Count
> 115.00000 115.25000 111.60000 166.44444 71.43158 954855.40000
>
>
> I do not know how could the writers of the book have produced the above
> results by by() function. But, when I could not reproduce these results,
> then I thought that probably, this could possibly be due to some missing
> values NA's in Weight and Height variables. Then I tried the above code for
> the "mtcars" dataset for INDICES=mtcars$am. When I found the same results
> here too, then I reported the case in "r-help at R-project.org".
>
> With best regards,
>
> Dr. A.K. Singh
> Head, Department of Agril. Statistics
> Indira Gandhi Krishi Vishwavidyalaya, Raipur
> Chhattisgarh, India, PIN-492012
> Mobile: +919752620740
> Email: akhileshsingh.igkv at gmail.com
>
> On Fri, Apr 15, 2016 at 3:06 AM, Adrian Dușa <dusa.adrian at unibuc.ro> wrote:
>
>> I think you are not using the best function for what your intentions are.
>> Try:
>>
>>> by(data=mtcars, INDICES=list(as.factor(mtcars$am)), FUN=colMeans)
>> : 0
>> mpg cyl disp hp drat wt
>> qsec vs
>> 17.1473684 6.9473684 290.3789474 160.2631579 3.2863158 3.7688947
>> 18.1831579 0.3684211
>> am gear carb
>> 0.0000000 3.2105263 2.7368421
>>
>> ---------------------------------------------------------------------------
>> : 1
>> mpg cyl disp hp drat wt
>> qsec vs
>> 24.3923077 5.0769231 143.5307692 126.8461538 4.0500000 2.4110000
>> 17.3600000 0.5384615
>> am gear carb
>> 1.0000000 4.3846154 2.9230769
>>
>> See the difference between colMeans() and mean() in their respective help
>> files.
>> Hth,
>> Adrian
>>
>> On Thu, Apr 14, 2016 at 11:14 PM, Akhilesh Singh <
>> akhileshsingh.igkv at gmail.com> wrote:
>>
>>> Dear Sirs,
>>>
>>> I am Professor at Indira Gandhi Krishi Vishwavidyalaya, Raipur,
>>> Chhattisgarh, India.
>>>
>>> While taking classes, I found the *by() *function producing following
>>> error
>>>
>>> when I use FUN=mean or median and some other functions, however,
>>> FUN=summary works.
>>>
>>> Given below is the output of the example I used on a built-in dataset
>>> "mtcars", along with error message reproduced herewith:
>>>
>>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=mean)
>>> : 0
>>> [1] NA
>>> ------------------------------------------------------------
>>> : 1
>>> [1] NA
>>> Warning messages:
>>> 1: In mean.default(data[x, , drop = FALSE], ...) :
>>> argument is not numeric or logical: returning NA
>>> 2: In mean.default(data[x, , drop = FALSE], ...) :
>>> argument is not numeric or logical: returning NA
>>>
>>> However, the same by() function works for FUN=summary, given below is the
>>> output:
>>>
>>>> by(data=mtcars, INDICES=list(mtcars$am), FUN=summary)
>>> : 0
>>> mpg cyl disp hp
>>> Min. :10.40 Min. :4.000 Min. :120.1 Min. : 62.0
>>> 1st Qu.:14.95 1st Qu.:6.000 1st Qu.:196.3 1st Qu.:116.5
>>> Median :17.30 Median :8.000 Median :275.8 Median :175.0
>>> Mean :17.15 Mean :6.947 Mean :290.4 Mean :160.3
>>> 3rd Qu.:19.20 3rd Qu.:8.000 3rd Qu.:360.0 3rd Qu.:192.5
>>> Max. :24.40 Max. :8.000 Max. :472.0 Max. :245.0
>>> drat wt qsec vs am
>>>
>>> Min. :2.760 Min. :2.465 Min. :15.41 Min. :0.0000 Min.
>>> :0
>>>
>>> 1st Qu.:3.070 1st Qu.:3.438 1st Qu.:17.18 1st Qu.:0.0000 1st
>>> Qu.:0
>>>
>>> Median :3.150 Median :3.520 Median :17.82 Median :0.0000 Median
>>> :0
>>>
>>> Mean :3.286 Mean :3.769 Mean :18.18 Mean :0.3684 Mean
>>> :0
>>>
>>> 3rd Qu.:3.695 3rd Qu.:3.842 3rd Qu.:19.17 3rd Qu.:1.0000 3rd
>>> Qu.:0
>>>
>>> Max. :3.920 Max. :5.424 Max. :22.90 Max. :1.0000 Max.
>>> :0
>>>
>>> gear carb
>>> Min. :3.000 Min. :1.000
>>> 1st Qu.:3.000 1st Qu.:2.000
>>> Median :3.000 Median :3.000
>>> Mean :3.211 Mean :2.737
>>> 3rd Qu.:3.000 3rd Qu.:4.000
>>> Max. :4.000 Max. :4.000
>>> ------------------------------------------------------------
>>> : 1
>>> mpg cyl disp hp drat
>>>
>>> Min. :15.00 Min. :4.000 Min. : 71.1 Min. : 52.0 Min.
>>> :3.54
>>> 1st Qu.:21.00 1st Qu.:4.000 1st Qu.: 79.0 1st Qu.: 66.0 1st
>>> Qu.:3.85
>>> Median :22.80 Median :4.000 Median :120.3 Median :109.0 Median
>>> :4.08
>>> Mean :24.39 Mean :5.077 Mean :143.5 Mean :126.8 Mean
>>> :4.05
>>> 3rd Qu.:30.40 3rd Qu.:6.000 3rd Qu.:160.0 3rd Qu.:113.0 3rd
>>> Qu.:4.22
>>> Max. :33.90 Max. :8.000 Max. :351.0 Max. :335.0 Max.
>>> :4.93
>>> wt qsec vs am gear
>>>
>>> Min. :1.513 Min. :14.50 Min. :0.0000 Min. :1 Min.
>>> :4.000
>>>
>>> 1st Qu.:1.935 1st Qu.:16.46 1st Qu.:0.0000 1st Qu.:1 1st
>>> Qu.:4.000
>>>
>>> Median :2.320 Median :17.02 Median :1.0000 Median :1 Median
>>> :4.000
>>>
>>> Mean :2.411 Mean :17.36 Mean :0.5385 Mean :1 Mean
>>> :4.385
>>>
>>> 3rd Qu.:2.780 3rd Qu.:18.61 3rd Qu.:1.0000 3rd Qu.:1 3rd
>>> Qu.:5.000
>>>
>>> Max. :3.570 Max. :19.90 Max. :1.0000 Max. :1 Max.
>>> :5.000
>>>
>>> carb
>>> Min. :1.000
>>> 1st Qu.:1.000
>>> Median :2.000
>>> Mean :2.923
>>> 3rd Qu.:4.000
>>> Max. :8.000
>>>>
>>>
>>> I am using the latest version of *R-3.2.4 on Windows*, however, this error
>>> is being generated in the previous version too,
>>>
>>> Hope this reporting will get serious attention in debugging.
>>>
>>> With best regards,
>>>
>>> Dr. A.K. Singh
>>> Head, Department of Agril. Statistics
>>> Indira Gandhi Krishi Vishwavidyalaya, Raipur
>>> Chhattisgarh, India, PIN-492012
>>> Mobile: +919752620740
>>> Email: akhileshsingh.igkv at gmail.com
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Adrian Dusa
>> University of Bucharest
>> Romanian Social Data Archive
>> Soseaua Panduri nr.90
>> 050663 Bucharest sector 5
>> Romania
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list