[R] Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.
Allan Kamau
kamauallan at yahoo.com
Thu Jul 26 10:03:17 CEST 2007
Thanks so much Jim, Andaikalavan, Gabor and others for the help and suggestions.
The solution will result in a matrix containing nested matrices to enable each variable name, each variables distinct value and the count of the distinct value to be accessible individually.
The main matrix will contain the variable names, the first level nested matrices will consist of the variables unique values, and each such variable entry will contain a one element vector to contain the count or occurrence frequency.
This matrix can now be used in comparing other similar datasets for variable values and their frequencies.
Building on the input received so far, a probable solution in building the matrix will include the following.
1)I reading the csv file (containing column headers)
>my_data=read.table("<path/to/my/data.csv>",header=TRUE,sep=",",dec=".",fill=TRUE)
2)I group the values in each variable producing an occurrence count(frequency)
>x.val<-apply(my_data,2,table)
3)I obtain a vector of the names of the variables in the table
>names(x.val)
4)Now I make use of the names (obtained in step 3) to obtain a vector of distinct values in a given variable (in the example below the variable name is $PR14)
>names(v.val$PR14)
5)I obtain a vector (with one element) of the frequency of a value obtained from the step above (in our example the value is "V")
>as.vector(x.val$PR14["V"])
Todo:
Now I will need to place the steps above in a script (consisting of loops) to build the matrix, step 4 and 5 seem tricky to do programatically.
Allan.
----- Original Message ----
From: jim holtman <jholtman at gmail.com>
To: Allan Kamau <kamauallan at yahoo.com>
Cc: Adaikalavan Ramasamy <ramasamy at cancer.org.uk>; r-help at stat.math.ethz.ch
Sent: Wednesday, July 25, 2007 1:50:55 PM
Subject: Re: [R] Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.
Also if you want to access the individual values, you can just leave
it as a list:
> x.val <- apply(x, 2, table)
> # access each value
> x.val$PR14["V"]
V
8
On 7/25/07, Allan Kamau <kamauallan at yahoo.com> wrote:
> A subset of the data looks as follows
>
> > df[1:10,14:20]
> PR10 PR11 PR12 PR13 PR14 PR15 PR16
> 1 V T I K V G D
> 2 V S I K V G G
> 3 V T I R V G G
> 4 V S I K I G G
> 5 V S I K V G G
> 6 V S I R V G G
> 7 V T I K I G G
> 8 V S I K V E G
> 9 V S I K V G G
> 10 V S I K V G G
>
> The result I would like is as follows
>
> PR10 PR11 PR12 ...
> [V:10] [S:7,T:3] [I:10]
>
> The result can be in a matrix or a vector and each variablename, value and frequency should be accessible so as to be used for comparisons with another dataset later.
> The frequency can be a count or a percentage.
>
>
> Allan.
>
>
> ----- Original Message ----
> From: Adaikalavan Ramasamy <ramasamy at cancer.org.uk>
> To: Allan Kamau <kamauallan at yahoo.com>
> Cc: r-help at stat.math.ethz.ch
> Sent: Tuesday, July 24, 2007 10:21:51 PM
> Subject: Re: [R] Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.
>
> The name of the table should give you the "value". And if you have a
> matrix, you just need to convert it into a vector first.
>
> > m <- matrix( LETTERS[ c(1:3, 3:5, 2:4) ], nc=3 )
> > m
> [,1] [,2] [,3]
> [1,] "A" "C" "B"
> [2,] "B" "D" "C"
> [3,] "C" "E" "D"
> > tb <- table( as.vector(m) )
> > tb
>
> A B C D E
> 1 2 3 2 1
> > paste( names(tb), ":", tb, sep="" )
> [1] "A:1" "B:2" "C:3" "D:2" "E:1"
>
> If this is not what you want, then please give a simple example.
>
> Regards, Adai
>
>
>
> Allan Kamau wrote:
> > Hi all,
> > If the question below as been answered before I
> > apologize for the posting.
> > I would like to get the frequencies of occurrence of
> > all values in a given variable in a multivariate
> > dataset. In short for each variable (or field) a
> > summary of values contained with in a value:frequency
> > pair, there can be many such pairs for a given
> > variable. I would like to do the same for several such
> > variables.
> > I have used table() but am unable to extract the
> > individual value and frequency values.
> > Please advise.
> >
> > Allan.
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
More information about the R-help
mailing list