[R] Convert COLON separated format

William Dunlap wdunlap at tibco.com
Tue Oct 9 18:01:46 CEST 2012


Matrix::spMatrix can help.

Read your data file with lns <- readLines("fileName") to get
something like
   lns <- c("1 5:15 7:17 9:19",
                 "2 2:22 8:28",
                 "4 6:46")
Then use a function like the following that reformats the
data to the i=row,j=col,x=value vectors that spMatrix can use.
   f <- function(lns, nrow=NULL, ncol=NULL)
   {
      # expect lines of the form "rowNum<whiteSpace>colNum:value[<whiteSpace>colNum:value ...]"
      triples <- unlist(lapply(strsplit(lns, "[ \t]+"), function(ln)paste(sep=":",ln[1],ln[-1]))))
      triples <- strsplit(triples, ":")
      if (any(which <- vapply(triples, length, 0) != 3)) stop("formatting error")
      ijx <- matrix(as.numeric(unlist(triples)), ncol=3, byrow=TRUE)
      if (is.null(nrow)) nrow <- max(ijx[,1])
      if (is.null(ncol)) ncol <- max(ijx[,2])
      spMatrix(nrow=nrow, ncol=ncol, i=ijx[,1], j=ijx[,2], x=ijx[,3])
   }
Use it as
> f(lns)
4 x 9 sparse Matrix of class "dgTMatrix"

[1,] .  . . . 15  . 17  . 19
[2,] . 22 . .  .  .  . 28  .
[3,] .  . . .  .  .  .  .  .
[4,] .  . . .  . 46  .  .  .

or, if you know the number of rows and columns, tell it:

> f(lns, 10, 10)
10 x 10 sparse Matrix of class "dgTMatrix"

 [1,] .  . . . 15  . 17  . 19 .
 [2,] . 22 . .  .  .  . 28  . .
 [3,] .  . . .  .  .  .  .  . .
 [4,] .  . . .  . 46  .  .  . .
 [5,] .  . . .  .  .  .  .  . .
 [6,] .  . . .  .  .  .  .  . .
 [7,] .  . . .  .  .  .  .  . .
 [8,] .  . . .  .  .  .  .  . .
 [9,] .  . . .  .  .  .  .  . .
[10,] .  . . .  .  .  .  .  . .

Use as.matrix() on its output if you don't want to continue
using the sparse matrix format.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Noah Silverman
> Sent: Monday, October 08, 2012 9:57 PM
> To: r-help
> Subject: [R] Convert COLON separated format
> 
> I have a bunch of data sets that were created for the libsvm tool.  They are in "colon
> separated sparse format".
> 
> i.e.
> 
> 1  5:1  27:3  345:10
> 
> Is a row with the label of "1" and only has values in columns 5, 27, and 345.
> 
> I want to read these into a data.frame in R.
> 
> Is there a simple way to do this?
> 
> --
> Noah Silverman, M.S.
> UCLA Department of Statistics
> 8117 Math Sciences Building
> Los Angeles, CA 90095
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list