[R] Tricky (?) conversion from data.frame to matrix where not all pairs exist

Martin Maechler maechler at stat.math.ethz.ch
Wed Jun 22 09:45:59 CEST 2011


>>>>> Marius Hofert <m_hofert at web.de>
>>>>>     on Wed, 22 Jun 2011 01:00:21 +0200 writes:

    > Thanks a lot! Works like charm :-))) Cheers,
well,
I'd still *strongly* vote for Bill Dunlap's version.

Martin



    > Marius

    > On 2011-06-22, at 24:51 , Dennis Murphy wrote:

    >> Ahhh...you want a matrix. xtabs() doesn't easily allow
    >> coercion to a matrix object, so try this instead:
    >> 
    >> library(reshape) as.matrix(cast(df, year ~ block, fill =
    >> 0)) a b c 2000 1 0 5 2001 2 4 6 2002 3 0 0
    >> 
    >> Hopefully this is more helpful...  Dennis
    >> 
    >> On Tue, Jun 21, 2011 at 3:35 PM, Dennis Murphy
    >> <djmuser at gmail.com> wrote:
    >>> Hi:
    >>> 
    >>> xtabs(value ~ year + block, data = df) block year a b c
    >>> 2000 1 0 5 2001 2 4 6 2002 3 0 0
    >>> 
    >>> HTH, Dennis
    >>> 
    >>> On Tue, Jun 21, 2011 at 3:13 PM, Marius Hofert
    >>> <m_hofert at web.de> wrote:
    >>>> Dear expeRts,
    >>>> 
    >>>> In the minimal example below, I have a data.frame
    >>>> containing three "blocks" of years (the years are
    >>>> subsets of 2000 to 2002). For each year and block a
    >>>> certain "value" is given.  I would like to create a
    >>>> matrix that has row names given by all years ("2000",
    >>>> "2001", "2002"), and column names given by all blocks
    >>>> ("a", "b", "c"); the entries are then given by the
    >>>> corresponding value or zero if not year-block
    >>>> combination exists.
    >>>> 
    >>>> What's a short way to achieve this?
    >>>> 
    >>>> Of course one can setup a matrix and use for loops (see
    >>>> below)... but that's not nice.  The problem is that the
    >>>> years are not running from 2000 to 2002 for all three
    >>>> "blocks" (the second block only has year 2001, the
    >>>> third one has only 2000 and 2001).  In principle,
    >>>> table() nicely solves such a problem (see below) and
    >>>> fills in zeros.  This is what I would like in the end,
    >>>> but all non-zero entries should be given by df$value,
    >>>> not (as table() does) by their counts.
    >>>> 
    >>>> Cheers,
    >>>> 
    >>>> Marius
    >>>> 
    >>>> (df <- data.frame(year=c(2000, 2001, 2002, 2001, 2000,
    >>>> 2001), block=c("a","a","a","b","c","c"), value=1:6))
    >>>> table(df[,1:2]) # complements the years and fills in 0
    >>>> 
    >>>> year <- c(2000, 2001, 2002) block <- c("a", "b", "c")
    >>>> res <- matrix(0, nrow=3, ncol=3, dimnames=list(year,
    >>>> block)) for(i in 1:3){ # year for(j in 1:3){ # block
    >>>> for(k in 1:nrow(df)){ if(df[k,"year"]==year[i] &&
    >>>> df[k,"block"]==block[j]) res[i,j] <- df[k,"value"] } }
    >>>> } res # does the job; but seems complicated
    >>>> 
    >>>> ______________________________________________
    >>>> R-help at r-project.org mailing list
    >>>> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
    >>>> read the posting guide
    >>>> http://www.R-project.org/posting-guide.html and provide
    >>>> commented, minimal, self-contained, reproducible code.
    >>>> 
    >>> 

______________________________________________
    > R-help at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
    > read the posting guide
    > http://www.R-project.org/posting-guide.html and provide
    > commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list