[R] simple generation of artificial data with defined features
drflxms
drflxms at googlemail.com
Sun Aug 24 13:01:27 CEST 2008
Hello all,
beside saying again thank you for your help, I'd like to present the
final solution of my problem and the results of the kappa-calculation:
> election.2005 <- c(16194,13136,3494,3838,4648,4118)
#data obtained via genesis-database of "Statistisches Bundesamt"
www.destatis.de
#simply cut of last 3 digits because of limited calculation-power of laptop
> attr(election.2005, "class") <- "table"
> attr(election.2005, "dim") <- c(1,6)
> attr(election.2005, "dimnames") <- list(c("votes"), c(1,2,3,4,5,6))
#used numbers instead of names of parties for easier handling later on
#1=spd,2=cdu,3=csu,4=gruene,5=fdp,6=pds
> head(election.2005)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 16194 13136 3494 3838 4648 4118
#replicate rows according to frequency-table:
> el.dt.exp <- el.dt[rep(1:nrow(el.dt), el.dt$Freq), -ncol(el.dt)]
> el.dt.exp$id=seq(1:nrow(el.dt.exp)) #add voter id
> el.dt.exp$year=2005 #add column with year of election
# remove a column we don't need:
> el.dt.exp<-subset(el.dt.exp, select=-c(Var1))
> dim(el.dt.exp)
[1] 45428 3
> head(el.dt.exp)
Var2 id year
1 1 1 2005
1.1 1 2 2005
1.2 1 3 2005
1.3 1 4 2005
1.4 1 5 2005
1.5 1 6 2005
1.5 1 6 2005
> el.dt.exp<-as.data.frame(el.dt.exp, row.names=seq(1:nrow(el.dt.exp)))
# get rid of the unusual numbering of rows
> head(el.dt.exp)
Var2 id year
1 1 1 2005
2 1 2 2005
3 1 3 2005
4 1 4 2005
5 1 5 2005
6 1 6 2005
> summary(el.dt.exp)
Var2 id year
1:16194 Min. : 1 Min. :2005
2:13136 1st Qu.:11358 1st Qu.:2005
3: 3494 Median :22715 Median :2005
4: 3838 Mean :22715 Mean :2005
5: 4648 3rd Qu.:34071 3rd Qu.:2005
6: 4118 Max. :45428 Max. :2005
Var2 is of type character, which is uncomfortable for further processing.
I changed type with the data editor using fix(el.dt.exp) to number.
#create the dataframe for the calculation of kappa
> library(reshape)
> el.dt.exp.molten<-melt(el.dt.exp, id=c(2,3), na.rm=FALSE)
> kappa.frame<-cast(el.dt.exp.molten, year ~ id)
> dim(kappa.frame)
[1] 1 45429
#calculate kappa
> library(irr)
> kappam.fleiss(kappa.frame, exact=FALSE, detail=TRUE)
Fleiss' Kappa for m Raters
Subjects = 1
Raters = 45428
Kappa = -2.2e-05
z = -1.35
p-value = 0.176
Kappa z p.value
1 0.000 -0.707 0.479
2 0.000 -0.707 0.479
3 0.000 -0.707 0.479
4 0.000 -0.707 0.479
5 0.000 -0.707 0.479
6 0.000 -0.707 0.479
What a surprise! So Greg was absolutely right, that this is probably not
a good example for Kappa. But still a very interesting one, if you ask me!
My theory: Kappa doesn't express simply agreement. As far as I learned
from the Handbook of Inter-Rater Reliability (Gwet, Kilem 2001; STATAXIS
Publishing Company; www.stataxis.com) Kappa tries to measure how
different and observed agreement is from an agreement that arises from
chance.
So in this case this probably means, that the results of the election
2005 are not significantly different from results, that could have
arisen by chance.
Anyway I personally learned a very interesting lesson about Kappa and R.
Thank you all for your professional and quick help to a newbie!
Greetings from Munich,
Felix
More information about the R-help
mailing list