[R] dplyr's arrange function
Daniel Nordlund
djnordlund at gmail.com
Thu Jun 16 00:37:27 CEST 2016
On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:
> Hello,
>
> I am using the dplyr's arrange() function to sort one of the many data frames on a character variable (named "prevalence").
>
> Issue: I am not getting the desired output (line 7 is the problem, which should be the very last line in the sorted data frame) because the sorted field is character, not numeric.
>
> The reproducible example and the output are appended below.
>
> Is there any work-around to convert/treat this character variable (named "prevalence" in the data frame below) as numeric before using the arrange() function within the dplyr package?
>
> Any hints will be appreciated.
>
> Thanks,
>
> Pradip Muhuri
>
> # Reproducible Example
>
> library("readr")
> testdata <- read_csv(
> "indicator, prevalence
> 1. Health check-up, 77.2 (1.19)
> 2. Blood cholesterol checked, 84.5 (1.14)
> 3. Recieved flu vaccine, 50.0 (1.33)
> 4. Blood pressure checked, 88.7 (0.88)
> 5. Aspirin use-problems, 11.7 (1.02)
> 6.Colonoscopy, 60.2 (1.41)
> 7. Sigmoidoscopy, 6.1 (0.61)
> 8. Blood stool test, 14.6 (1.00)
> 9.Mammogram, 72.6 (1.82)
> 10. Pap Smear test, 73.3 (2.37)")
>
> # Sort on the character variable in descending order
> arrange(testdata, desc(prevalence))
>
> # Results from Console
>
> indicator prevalence
> (chr) (chr)
> 1 4. Blood pressure checked 88.7 (0.88)
> 2 2. Blood cholesterol checked 84.5 (1.14)
> 3 1. Health check-up 77.2 (1.19)
> 4 10. Pap Smear test 73.3 (2.37)
> 5 9.Mammogram 72.6 (1.82)
> 6 6.Colonoscopy 60.2 (1.41)
> 7 7. Sigmoidoscopy 6.1 (0.61)
> 8 3. Recieved flu vaccine 50.0 (1.33)
> 9 8. Blood stool test 14.6 (1.00)
> 10 5. Aspirin use-problems 11.7 (1.02)
>
>
> Pradip K. Muhuri, AHRQ/CFACT
> 5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
>
>
>
The problem is that you are sorting a character variable.
> testdata$prevalence
[1] "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" "11.7 (1.02)"
[6] "60.2 (1.41)" "6.1 (0.61)" "14.6 (1.00)" "72.6 (1.82)" "73.3 (2.37)"
>
Notice that the 7th element is "6.1 (0.61)". The first CHARACTER is a
"6", so it is going to sort BEFORE the "50.0 (1.33)" (in descending
order). If you want the character value of line 7 to sort last, it
would need to be "06.1 (0.61)" or " 6.1 (0.61)" (notice the leading space).
Hope this is helpful,
Dan
Daniel Nordlund
Port Townsend, WA USA
More information about the R-help
mailing list