[R] svykappa using the survey package
Muhuri, Pradip (AHRQ/CFACT)
Pradip.Muhuri at ahrq.hhs.gov
Tue Jun 21 01:49:55 CEST 2016
Hello,
My goal is to calculate the weighted kappa measure of agreement between two factors using the R survey package. I am getting the following error message (the console is appended below; sorry no data provided).
> # calculate survey Kappa
> svykappa(~xbpchek53+xcholck53, design)
Error in names(probs) <- nms :
'names' attribute [15] must be the same length as the vector [8]
I have followed the following major steps:
1) Used the "haven" package to read the sas data set into R.
2) Used the dplyr mutate() to create 2 new variables and converted to factors [required for the svykappa()?].
3) Created an object (named design) using the survey design variables and the data file.
4) Used the svykappa() to compute the kappa measure of agreement.
I will appreciate if someone could give me hints on how to resolve the issue.
Thanks,
Pradip Muhuri
############### The detailed console is appended below ####################
> setwd ("U:/A_PSAQ")
> library(haven)
> library(dplyr)
> library(survey)
> library(srvyr)
> library(Hmisc)
> my_hc2013_data <- read_sas("pc2013.sas7bdat")
>
> # Function to convert var names in upper cases to var names in lower cases
> lower <- function (df) {
+ names(df) <- tolower(names(df))
+ df
+ }
> my_hc2013_data <- lower(my_hc2013_data)
>
> # Check the contents - Hmisc package (as above) required
> # contents(my_hc2013_data)
>
> # create two new variables
> my_hc2013_data <- mutate(my_hc2013_data,
+ xbpchek53 = ifelse(bpchek53 ==1, 1,
+ ifelse(bpchek53 %in% 2:6, 2,NA)),
+ xcholck53 = ifelse(cholck53 ==1, 1,
+ ifelse(cholck53 %in% 2:6, 2,NA)))
>
> # convert the numeric variables to factors for the kappa measure
> my_hc2013_data$xbpchek53 <- as.factor(my_hc2013_data$xbpchek53)
> my_hc2013_data$xcholck53 <- as.factor(my_hc2013_data$xcholck53)
>
> # check whether the variables are factors
> is.factor(my_hc2013_data$xbpchek53)
[1] TRUE
> is.factor(my_hc2013_data$xcholck53)
[1] TRUE
>
>
> # check the data from the cross table
> addmargins(with(my_hc2013_data, table(bpchek53,xbpchek53 )))
xbpchek53
bpchek53 1 2 Sum
-9 0 0 0
-8 0 0 0
-7 0 0 0
-1 0 0 0
1 19778 0 19778
2 0 2652 2652
3 0 1014 1014
4 0 538 538
5 0 737 737
6 0 623 623
Sum 19778 5564 25342
> addmargins(with(my_hc2013_data, table(cholck53,xcholck53 )))
xcholck53
cholck53 1 2 Sum
-9 0 0 0
-8 0 0 0
-7 0 0 0
-1 0 0 0
1 14850 0 14850
2 0 3153 3153
3 0 1170 1170
4 0 696 696
5 0 909 909
6 0 3764 3764
Sum 14850 9692 24542
> addmargins(with(my_hc2013_data, table(xbpchek53,xcholck53 )))
xcholck53
xbpchek53 1 2 Sum
1 14667 4379 19046
2 163 5225 5388
Sum 14830 9604 24434
>
> # create an object with design variables and data
> design<-svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f, data=my_hc2013_data, nest=TRUE)
>
> # calculate survey Kappa
> svykappa(~xbpchek53+xcholck53, design)
Error in names(probs) <- nms :
'names' attribute [15] must be the same length as the vector [8]
#################################################################
Pradip K. Muhuri, AHRQ/CFACT
5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Muhuri, Pradip (AHRQ/CFACT)
Sent: Thursday, June 16, 2016 2:06 PM
To: David Winsemius
Cc: r-help at r-project.org
Subject: Re: [R] dplyr's arrange function - 3 solutions received - 1 New Question
Hello David,
Your revisions to the earlier code have given me desired results.
library("gtools")
mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c("indicator", "prevalence_c") ]
Thanks,
Pradip
Pradip K. Muhuri, AHRQ/CFACT
5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564
-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net]
Sent: Thursday, June 16, 2016 12:54 PM
To: Muhuri, Pradip (AHRQ/CFACT)
Cc: r-help at r-project.org
Subject: Re: [R] dplyr's arrange function - 3 solutions received - 1 New Question
> On Jun 16, 2016, at 6:12 AM, Muhuri, Pradip (AHRQ/CFACT) <Pradip.Muhuri at ahrq.hhs.gov> wrote:
>
> Hello,
>
> I got 3 solutions to my earlier code. Thanks to the contributors. May I bring your attention to a new question below (with respect to David's solution)?
>
> 1) Thanks to Daniel Nordlund for the tips - replacing leading space with a 0 in the data.
>
> 2) Thanks to David Winsemius for his solution with the gtools::mixedorder function. I have added an argument to his.
>
> mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), ]
>
> 3) Thanks to Jim Lemon's for his solution. I have prepended a minus sign to reverse the order.
>
> numprev<-as.numeric(sapply(strsplit(trimws(mydata$prevalence_c),"
> "),"[",1)) mydata[order(-numprev), ]
>
>
> (New)Question for solution 2:
>
> I want to keep only 2 variables (say, indicator and prevalence_c) in the output. Where to insert the additional code? Why does the following code fail?
>
>> mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE),
>> c(mydata$indicator, mydata$prevalence_c) ]
>
Try instead just a vector of names for the second argument to "["
mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE),
c("indicator", "prevalence_c") ]
> Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c, decreasing = TRUE), :
> undefined columns selected
>
> ********************
>> str(mydata)
> Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of 10 variables:
> $ indicator : chr "1. Health check-up" "2. Blood cholesterol checked " "3. Recieved flu vaccine" "4. Blood pressure checked" ...
> $ subgroup : chr "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ ...
> $ n : num 2117 2127 2124 2135 1027 ...
> $ prevalence_c: chr "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2 (0.70)" ...
> $ prevalence_p: chr "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" ...
> $ sensitivity : chr "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0 (0.27)" ...
> $ specificity : chr "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7 (3.90)" ...
> $ ppv : chr "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3 (0.63)" ...
> $ npv : chr "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5 (3.27)" ...
> $ kappa : chr "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)" "0.626 (0.035)" ...
>
> Pradip K. Muhuri, AHRQ/CFACT
> 5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
>
>
>
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel
> Nordlund
> Sent: Wednesday, June 15, 2016 6:37 PM
> To: r-help at r-project.org
> Subject: Re: [R] dplyr's arrange function
>
> On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:
>> Hello,
>>
>> I am using the dplyr's arrange() function to sort one of the many data frames on a character variable (named "prevalence").
>>
>> Issue: I am not getting the desired output (line 7 is the problem, which should be the very last line in the sorted data frame) because the sorted field is character, not numeric.
>>
>> The reproducible example and the output are appended below.
>>
>> Is there any work-around to convert/treat this character variable (named "prevalence" in the data frame below) as numeric before using the arrange() function within the dplyr package?
>>
>> Any hints will be appreciated.
>>
>> Thanks,
>>
>> Pradip Muhuri
>>
>> # Reproducible Example
>>
>> library("readr")
>> testdata <- read_csv(
>> "indicator, prevalence
>> 1. Health check-up, 77.2 (1.19)
>> 2. Blood cholesterol checked, 84.5 (1.14) 3. Recieved flu vaccine,
>> 50.0 (1.33) 4. Blood pressure checked, 88.7 (0.88) 5. Aspirin
>> use-problems, 11.7 (1.02) 6.Colonoscopy, 60.2 (1.41) 7.
>> Sigmoidoscopy,
>> 6.1 (0.61) 8. Blood stool test, 14.6 (1.00) 9.Mammogram, 72.6 (1.82)
>> 10. Pap Smear test, 73.3 (2.37)")
>>
>> # Sort on the character variable in descending order
>> arrange(testdata,
>> desc(prevalence))
>>
>> # Results from Console
>>
>> indicator prevalence
>> (chr) (chr)
>> 1 4. Blood pressure checked 88.7 (0.88)
>> 2 2. Blood cholesterol checked 84.5 (1.14)
>> 3 1. Health check-up 77.2 (1.19)
>> 4 10. Pap Smear test 73.3 (2.37)
>> 5 9.Mammogram 72.6 (1.82)
>> 6 6.Colonoscopy 60.2 (1.41)
>> 7 7. Sigmoidoscopy 6.1 (0.61)
>> 8 3. Recieved flu vaccine 50.0 (1.33)
>> 9 8. Blood stool test 14.6 (1.00)
>> 10 5. Aspirin use-problems 11.7 (1.02)
>>
>>
>> Pradip K. Muhuri, AHRQ/CFACT
>> 5600 Fishers Lane # 7N142A, Rockville, MD 20857
>> Tel: 301-427-1564
>>
>>
>>
>
> The problem is that you are sorting a character variable.
>
>> testdata$prevalence
> [1] "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" "11.7 (1.02)"
> [6] "60.2 (1.41)" "6.1 (0.61)" "14.6 (1.00)" "72.6 (1.82)" "73.3 (2.37)"
>>
>
> Notice that the 7th element is "6.1 (0.61)". The first CHARACTER is a "6", so it is going to sort BEFORE the "50.0 (1.33)" (in descending order). If you want the character value of line 7 to sort last, it would need to be "06.1 (0.61)" or " 6.1 (0.61)" (notice the leading space).
>
> Hope this is helpful,
>
> Dan
>
> Daniel Nordlund
> Port Townsend, WA USA
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list