[R] Creating matrix from long table in database (pivoting)
Phil Spector
spector at stat.berkeley.edu
Tue Mar 2 20:01:22 CET 2010
Jan -
Here's one way:
> tbl = data.frame(id=c(1,1,1,1,1,2,2,2,2,2),
text=c('this','is','the','first','row','this','is','the','second','row'))
> xtabs(~id+text,tbl)
text
id first is row second the this
1 1 1 1 0 1 1
2 0 1 1 1 1 1
It's a bit tricky to automatically get the column headings to
be in the order you want. This comes close:
> tbl$text = factor(tbl$text,levels=tbl$text[!duplicated(tbl$text)])
> xtabs(~id+text,tbl)
text
id this is the first row second
1 1 1 1 1 1 0
2 1 1 1 0 1 1
Hope this helps.
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
On Tue, 2 Mar 2010, Jan Hornych wrote:
> Hi all,
>
> I have a table in database that is very long and when simplified it has only
> two columns in it (id, text). id is the row, and text is the column.
> Technically the text is a term and and id is the document.
> If simplifying this and assuming there is only one occurrence of the term
> per the document. I shall be able to convert this into a binary matrix.
> Table looks like this...
>
> *ID** **Text*
> ------------
> 1 this
> 1 is
> 1 the
> 1 first
> 1 row
> 2 this
> 2 is
> 2 the
> 2 send
> 2 row
> ...
>
>
> in R I would like to have it as
>
> *id this is the first second row*
> ------------------------------------------------
> 1 1 1 1 1 0 1
> 2 1 1 1 0 1 1
>
> it would be simpler for me to do this transformation in R as I guess the
> language is more handy as the SQL. The table in R have few dozen thousand of
> columns and rows as well. I know how to read the data from database, but
> just unsure if there is some suitable transformation available.
>
> Thank you
> Jan
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list