[R] strange behavior in data frames with duplicated column names

William Revelle wr at revelle.net
Tue May 8 16:26:43 CEST 2007


Dear R gurus,

There is an interesting problem with accessing specific items in a 
column of data frame that has incorrectly been given a duplicate 
name, even though addressing the item by row and column number.
Although the column is correctly listed, an item addressed by row and 
column number gives the item with the correct row and the original 
not the duplicated column.

Here are the instructions to get this problem

x <- matrix(seq(1:12),ncol=3)
colnames(x) <- c("A","B","A")   #a redundant name for column 2
x.df <- data.frame(x)
x.df        #the redundant name is corrected
x.df[,3]    #show the column -- this always works
x.df[2,3]   #this works here
#now incorrectly label the columns with a duplicate name
colnames(x.df) <- c("A","B","A")  #the redundant name is not detected
x.df
x.df[,3]     #this works as above and shows the column
x.df[2,3]    #but this gives the value of the first column, not the third  <---
rownames(x.df) <- c("First","Second","Third","Third")  #detects duplicate name
x.df
x.df[4,]     #correct second row and corrected column names!
x.df[4,3]    #wrong column
x.df         #still has the original names with the duplication


and corresponding output:

>  x <- matrix(seq(1:12),ncol=3)
>  colnames(x) <- c("A","B","A")   #a redundant name for column 2
>  x.df <- data.frame(x)
>  x.df        #the redundant name is corrected
   A B A.1
1 1 5   9
2 2 6  10
3 3 7  11
4 4 8  12
>  x.df[,3]    #show the column -- this always works
[1]  9 10 11 12
>  x.df[2,3]   #this works here
[1] 10
>  #now incorrectly label the columns with a duplicate name
>  colnames(x.df) <- c("A","B","A")  #the redundant name is not detected
>  x.df
   A B  A
1 1 5  9
2 2 6 10
3 3 7 11
4 4 8 12
>  x.df[,3]     #this works as above and shows the column
[1]  9 10 11 12
>  x.df[2,3]    #but this gives the value of the first column, not the 
>third  <---
[1] 2
>  rownames(x.df) <- c("First","Second","Third","Third")  #detects 
>duplicate name
Error in `row.names<-.data.frame`(`*tmp*`, value = c("First", "Second",  :
	duplicate 'row.names' are not allowed
>  x.df
   A B  A
1 1 5  9
2 2 6 10
3 3 7 11
4 4 8 12
>  x.df[4,]     #correct second row and corrected column names!
   A B A.1
4 4 8  12
>  x.df[4,3]    #wrong column
[1] 4
>  x.df         #still has the original names with the duplication

>  unlist(R.Version())
                                      platform 
arch                                            os
                      "i386-apple-darwin8.9.1" 
"i386"                                 "darwin8.9.1"
                                        system 
status                                         major
                           "i386, darwin8.9.1" 
"Patched"                                           "2"
                                         minor 
year                                         month
                                         "5.0" 
"2007"                                          "04"
                                           day 
svn rev                                      language
                                          "25" 
"41315"                                           "R"
                                version.string
"R version 2.5.0 Patched (2007-04-25 r41315)"
>


Bill

-- 
William Revelle		http://personality-project.org/revelle.html
Professor			http://personality-project.org/personality.html
Department of Psychology       http://www.wcas.northwestern.edu/psych/
Northwestern University	http://www.northwestern.edu/
Use R for statistics:                 http://personality-project.org/r



More information about the R-help mailing list