[R] use table function with data frame subsets

Mon Feb 20 22:27:28 CET 2017

The default for read.csv() is stringsAsFactors=TRUE when creating a data frame so all the character strings in your .csv file were converted to factors:

> testtable <- read.csv("clipboard", header=F)
> str(testtable)
'data.frame':   6 obs. of  5 variables:
 $ V1: int  20170101 20170101 20170101 20170102 20170102 20170102
 $ V2: int  10020 10020 10020 20001 20001 20001
 $ V3: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 3 4 5
 $ V4: Factor w/ 4 levels "a","b","d","m": 2 2 3 3 4 1
 $ V5: Factor w/ 2 levels "N","Y": 2 1 2 2 2 2

When you subset a data frame, the empty factor levels are not automatically removed:

> testtablea<-testtable[grep('^10',testtable[,2]),]
> str(testtablea)
'data.frame':   3 obs. of  5 variables:
 $ V1: int  20170101 20170101 20170101
 $ V2: int  10020 10020 10020
 $ V3: Factor w/ 5 levels "A","B","C","D",..: 1 2 3
 $ V4: Factor w/ 4 levels "a","b","d","m": 2 2 3
 $ V5: Factor w/ 2 levels "N","Y": 2 1 2

To drop the missing levels from all of the factors, use the droplevels() function:

> testtablea <- droplevels(testtablea)
> str(testtablea)
'data.frame':   3 obs. of  5 variables:
 $ V1: int  20170101 20170101 20170101
 $ V2: int  10020 10020 10020
 $ V3: Factor w/ 3 levels "A","B","C": 1 2 3
 $ V4: Factor w/ 2 levels "b","d": 1 1 2
 $ V5: Factor w/ 2 levels "N","Y": 2 1 2
> table(testtablea[,4],testtablea[,5])

    N Y
  b 1 1
  d 0 1

OR use stringsAsFactors=FALSE with read.csv() when you create the original data frame:

> testtable <- read.csv("clipboard", header=F, stringsAsFactors=FALSE)
> str(testtable)
'data.frame':   6 obs. of  5 variables:
 $ V1: int  20170101 20170101 20170101 20170102 20170102 20170102
 $ V2: int  10020 10020 10020 20001 20001 20001
 $ V3: chr  "A" "B" "C" "C" ...
 $ V4: chr  "b" "b" "d" "d" ...
 $ V5: chr  "Y" "N" "Y" "Y" ...
> testtablea<-testtable[grep('^10',testtable[,2]),]
> table(testtablea[,4],testtablea[,5])

    N Y
  b 1 1
  d 0 1

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of message
Sent: Monday, February 20, 2017 3:10 PM
To: r-help at r-project.org
Subject: [R] use table function with data frame subsets

Readers,

Data set:

20170101,10020,A,b,Y
20170101,10020,B,b,N
20170101,10020,C,d,Y
20170102,20001,C,d,Y
20170102,20001,D,m,Y
20170102,20001,L,a,Y

testtable<-read.csv('~/tmp/data.csv',header=F)
testtablea<-testtable[grep('^10',testtable[,2]),]
> testtable
         V1    V2 V3 V4 V5
1 20170101 10020  A  b  Y
2 20170101 10020  B  b  N
3 20170101 10020  C  d  Y
4 20170102 20001  C  d  Y
5 20170102 20001  D  m  Y
6 20170102 20001  L  a  Y
> testtablea
         V1    V2 V3 V4 V5
1 20170101 10020  A  b  Y
2 20170101 10020  B  b  N
3 20170101 10020  C  d  Y

> table(testtable[,4],testtable[,5])

     N Y
   a 0 1
   b 1 1
   d 0 2
   m 0 1
> table(testtablea[,4],testtablea[,5])

     N Y
   a 0 0
   b 1 1
   d 0 1
   m 0 0

Wy do values for rows beginning 'a' and 'm' appear when they do not 
satisfy the regular expression for the object 'testtablea'?

Please, how to use the 'table' function to show:

> table(testtablea[,4],testtablea[,5])

     N Y
   b 1 1
   d 0 1

Thanks.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.