[R] data frame pointers?

David Winsemius dwinsemius at comcast.net
Thu Oct 24 02:24:19 CEST 2013

On Oct 23, 2013, at 4:36 PM, Jon BR wrote:

> Hello,
>    I've been running several programs in the unix shell, and it's time to
> combine results from several different pipelines.  I've been writing shell
> scripts with heavy use of awk and grep to make big text files, but I'm
> thinking it would be better to have all my data in one big structure in R
> so that I can query whatever attributes I like, and print several
> corresponding tables to separate files.
> I haven't used R in years, so I was hoping somebody might be able to
> suggest a solution or combinatin of functions that could help me get
> oriented..
> Right now, I can import my data into a data frame that looks like this:
> df <-
> data.frame(case=c("case_1","case_1","case_2","case_3"),gene=c("gene1","gene1","gene1","gene2"),issue=c("nsyn","amp","del","UTR"))
>> df
>    case  gene issue
> 1 case_1 gene1  nsyn
> 2 case_1 gene1   amp
> 3 case_2 gene1   del
> 4 case_3 gene2   UTR
> I'd like to cook up some combination of functions/scripting that can
> convert a table like df to produce a list or a data frame/ matrix that
> looks like df2:
>> df2
>        case_1 case_2 case_3
> gene1 nsyn,amp    del      0
> gene2        0      0    UTR
> I can build df2 manually, like this:
> df2
> <-data.frame(case_1=c("nsyn,amp","0"),case_2=c("del","0"),case_3=c("0","UTR"))
> rownames(df2)<-c("gene1","gene2")

Factors will be a hassle:

 df <-
data.frame(case=c("case_1","case_1","case_2","case_3"), gene=c("gene1","gene1","gene1","gene2"), issue=c("nsyn","amp","del","UTR"), stringsAsFactors=FALSE)

with( df, matrix( tapply(issue, list(gene, case), list) ,
                   nrow=length(unique(gene)),ncol=length(unique(case)) )

     [,1]        [,2]  [,3] 
[1,] Character,2 "del" NA   
[2,] NA          NA    "UTR"

> dmat[1,1]
[1] "nsyn" "amp" 

> as.data.frame(dmat)
         V1  V2  V3
1 nsyn, amp del  NA
2        NA  NA UTR

> but obviously do not want to do this by hand; I want R to generate df2 from
> df.
> Any pointers/ideas would be most welcome!
> Thanks,
> Jonathan
> 	[[alternative HTML version deleted]]

R is a plain text mailing list. Old school, admittedly,  but much better for coding questions. Surely an awk user can appreciate the wisdom of that request?

David Winsemius
Alameda, CA, USA

More information about the R-help mailing list