[R] help with rowsum/aggregate type functions

Henrique Dallazuanna wwwhsd at gmail.com
Tue Mar 25 12:38:19 CET 2008


Try this:

aggregate(list(Number=x$Number), by=list(Gene_Name=x$Gene_Name), sum)

On 25/03/2008, Charles Murtaugh <murtaugh at genetics.utah.edu> wrote:
> Hi--
>
>   This is a question with a trivial and obvious answer, I'm sure, but I can't seem to find it in the help files and books that I have handy.  I have a dataframe consisting of two columns, "Gene_Name," a list of gene symbols, and "Number," a numeric measure of how frequently a tag representing that gene showed up in a SAGE library.  Several of the genes are represented by multiple tags, and therefore are present more than once in the list, e.g.:
>
>  1167     Zcchc8      6
>  1168     Zcwpw1      5
>  1169     Zdhhc18     6
>  1170     Zdhhc20     5
>  1171     Zdhhc3      6
>  1172     Zdhhc3      5
>  1173     Zeb2        9
>  1174     Zeb2        6
>
>   What I want is to collapse the list by gene name, such that duplicates are summed up and appear only once in the final version:
>
>
>
>  Zcchc8      6
>
>  Zcwpw1      5
>
>  Zdhhc18     6
>  Zdhhc20     5
>
>  Zdhhc3     11
>
>  Zeb2       15
>
>
>
>   The only way I can figure out to do this is via rowsum:
>
>
>
>  > rowsum (Number,Gene_Name)
>
>
>
>  gives me exactly what I want, *except* that in the end, I am left with a matrix containing the Number values and with the Gene_Names used as row names (the output therefore looks exactly as printed above) -- what I want is a dataframe equivalent to the starting table, with numbered rows and separate, accessible columns containing the Gene_Name and Number values.
>
>
>
>   I was able to put such a dataframe together manually, by cobbling together the row names of the above list with the values:
>
>
>
>  > genes.unique <- data.frame (rownames (rowsum(Number,Gene_Name)), rowsum(Number,Gene_Name))
>
>
>
>  but then I have to manually replace the row names of the dataframe with numbers, to get back to what I wanted in the first place.
>
>
>
>   I hope this makes some sort of sense.  Is there an easier way to do this?  Thanks in advance!
>
>
>
>   Charlie Murtaugh
>
>
>
>
>
>
>
>  =====
>
>  L. Charles Murtaugh
>  Assistant Professor
>
>  University of Utah
>  Dept. of Human Genetics
>  15 N. 2030 E. Rm. 2100
>  Salt Lake City, UT 84112
>
>  tel 801-581-5958
>  fax 801-581-6463
>  email murtaugh at genetics.utah.edu
>
>
>         [[alternative HTML version deleted]]
>
>  ______________________________________________
>  R-help at r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O



More information about the R-help mailing list