[R] two questions for R beginners
Petr PIKAL
petr.pikal at precheza.cz
Wed Mar 3 15:44:50 CET 2010
"John Sorkin" <jsorkin at grecc.umaryland.edu> napsal dne 01.03.2010
15:19:10:
> If it looks like a duck and quacks like a duck, it ought to behave like
a duck.
>
> To the user a matrix and a dataframe look alike . . . except a dataframe
can
Well, matrix looks like a data.frame only on the first sight.
mat<-matrix(1:12, 3,4)
dat<-as.data.frame(mat)
str(dat)
'data.frame': 3 obs. of 4 variables:
$ V1: int 1 2 3
$ V2: int 4 5 6
$ V3: int 7 8 9
$ V4: int 10 11 12
str(mat)
int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
seems to me a pretty different look like.
Regards
Petr
> hold non-numeric values. Thus to the users, a matrix looks like a
special case
> of a DF, or perhaps conversely. If you can address elements of one
structure
> using a given syntax, you should be able to address elements of the
other
> structure using the same syntax. To do otherwise leads to confusion and
is
> counter intuitive.
> John
>
>
>
>
> John David Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)>>>
Petr
> PIKAL <petr.pikal at precheza.cz> 3/1/2010 8:57 AM >>>
> Hi
>
> r-help-bounces at r-project.org napsal dne 01.03.2010 13:03:24:
>
> < snip>
>
> > >
> > > I understand that 2 dimensional rectangular matrix looks quite
> > > similar to data frame however it is only a vector with dimensions.
> > > As such it can have items of only one type (numeric, character,
...).
> > > And you can easily change dimensions of matrix.
> > >
> > > matrix<-1:12
> > > dim(matrix) <- c(2,6)
> > > matrix
> > > dim(matrix) <- c(2,2,3)
> > > matrix
> > > dim(matrix) <-NULL
> > > matrix
> > >
> > > So rectangular structure of printed matrix is a kind of coincidence
> > > only, whereas rectangular structure of data frame is its main
feature.
> > >
> > > Regards
> > > Petr
> > >>
> > >> --
> > >> Karl Ove Hufthammer
> >
> > Petr, I think that could be confusing! The way I see it is that
> > a matrix is a special case of an array, whose "dimension" attribute
> > is of length 2 (number of "rows", number of "columns"); and "row"
> > and "column" refer to the rectangular display which you see when
> > R prints to matrix. And this, of course, derives directly from
> > the historic rectangular view of a matrix when written down.
> >
> > When you went from "dim(matrix)<-c(2,6)" to "dim(matrix)<-c(2,2,3)"
> > you stripped it of its special title of "matrix" and cast it out
> > into the motley mob of arrays (some of whom are matrices, but
> > "matrix" no longer is).
> >
> > So the "rectangular structure of printed matrix" is not a coincidence,
> > but is its main feature!
>
> Ok. Point taken. However I feel that possibility to manipulate
> matrix/array dimensions by simple changing them as I showed above
> together with perceiving matrix as a **vector with dimensions**
prevented
> me especially in early days from using matrices instead of data frames
and
> vice versa.
>
> Consider cbind and rbind confusing results for vectors with unequal
mode.
> Far to often we can see something like that
>
> > cbind(1:2,letters[1:2])
> [,1] [,2]
> [1,] "1" "a"
> [2,] "2" "b"
>
> instead of
>
> > data.frame(1:2,letters[1:2])
> X1.2 letters.1.2.
> 1 1 a
> 2 2 b
>
> and then a question why does not the result behave as expected. Each
type
> of object has some features which is good for some type of
> manipulation/analysis/plotting bud quite detrimental for others.
>
> Regards
> Petr
>
>
> >
> > To come back to Karl's query about why "$" works for a dataframe
> > but not for a matrix, note that "$" is the extractor for getting
> > a named component of a list. So, Karl, when you did
> >
> > d=head(iris[1:4])
> >
> > you created a dataframe:
> >
> > str(d)
> > # 'data.frame': 6 obs. of 4 variables:
> > # $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4
> > # $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
> > # $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
> > # $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4
> >
> > (with named components "Sepal.Length", ... , "Petal.Width"),
> > and a dataframe is a special case of a general list. In a
> > general list, the separate components can each be anything.
> > In a dataframe, each component is a vector; the different
> > vectors may be of different types (logical, numeric, ... )
> > but of course the elements of any single vector must be
> > of the same type; and, in a dataframe, all the vectors must
> > have the same length (otherwise it is a general list, not
> > a dataframe).
> >
> > So, when you print a dataframe, R chooses to display it
> > as a rectangular structure. On the other hand, when you
> > print a general list, R displays it quite differently:
> >
> > d
> > # Sepal.Length Sepal.Width Petal.Length Petal.Width
> > # 1 5.1 3.5 1.4 0.2
> > # 2 4.9 3.0 1.4 0.2
> > # 3 4.7 3.2 1.3 0.2
> > # 4 4.6 3.1 1.5 0.2
> > # 5 5.0 3.6 1.4 0.2
> > # 6 5.4 3.9 1.7 0.4
> >
> > d3 <- list(C1=c(1.1,1.2,1.3), C2=c(2.1,2.2,2.3,2.4))
> > d3
> > # $C1
> > # [1] 1.1 1.2 1.3
> > # $C2
> > # [1] 2.1 2.2 2.3 2.4
> >
> > Notice the similarity (though not identity) between the print
> > of d3 and the output of str(d). There is a bit more hard-wired
> > stuff built into a dataframe which makes it more than simply
> > a "list with all components vectors of equal length). However,
> > one could also say that "the rectangular structure is its
> > main feature".
> >
> > As to why "$" will not work on matrices: a matrix, as Petr
> > points out, is a vector with a "dimensions" attribute which
> > has length 2 (as opposed to a general array where the length
> > of the dimensions attribute could be anything). Hence it is
> > not a list of named components in the sense of "list".
> >
> > Hence "$" will not work with a matrix, since "$" will not
> > be able to find any list-components. which is basically what
> > the error message
> >
> > d2$Sepal.Width
> > # Error in d2$Sepal.Width : $ operator is invalid for atomic vectors
> >
> > is telling you: d2 is an atomic vector with a length-2 dimensions
> > attribute. It has no list-type components for "$" to get its
> > hands on.
> >
> > Ted.
> >
> > --------------------------------------------------------------------
> > E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> > Fax-to-email: +44 (0)870 094 0861
> > Date: 01-Mar-10 Time: 12:03:21
> > ------------------------------ XFMail ------------------------------
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:10}}
More information about the R-help
mailing list