[R] how to find number of unique rows for combination of r columns
Gerrit Eichner
gerr|t@e|chner @end|ng |rom m@th@un|-g|e@@en@de
Fri Nov 8 16:19:41 CET 2019
It seems as if dt is not a (base R) data frame but a
data table. I assume, you will have to transform dt
into a data frame (maybe with as.data.frame) to be
able to apply unique in the suggested way. However,
I am not familiar with data tables. Perhaps somebody
else can provide a more profound guess.
Regards -- Gerrit
---------------------------------------------------------------------
Dr. Gerrit Eichner Mathematical Institute, Room 212
gerrit.eichner using math.uni-giessen.de Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany
http://www.uni-giessen.de/eichner
---------------------------------------------------------------------
Am 08.11.2019 um 16:02 schrieb Ana Marija:
> I tried it but I got this error:
>> udt <- unique(dt[c("chr", "pos", "gene_id")])
> Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) :
> When i is a data.table (or character vector), the columns to join by
> must be specified using 'on=' argument (see ?data.table), by keying x
> (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing
> column names between x and i (i.e., a natural join). Keyed joins might
> have further speed benefits on very large data due to x being sorted
> in RAM.
>
> On Fri, Nov 8, 2019 at 8:58 AM Gerrit Eichner
> <gerrit.eichner using math.uni-giessen.de> wrote:
>>
>> Hi, Ana,
>>
>> doesn't
>>
>> udt <- unique(dt[c("chr", "pos", "gene_id")])
>> nrow(udt)
>>
>> get close to what you want?
>>
>> Hth -- Gerrit
>>
>> ---------------------------------------------------------------------
>> Dr. Gerrit Eichner Mathematical Institute, Room 212
>> gerrit.eichner using math.uni-giessen.de Justus-Liebig-University Giessen
>> Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany
>> http://www.uni-giessen.de/eichner
>> ---------------------------------------------------------------------
>>
>> Am 08.11.2019 um 15:38 schrieb Ana Marija:
>>> Hello,
>>>
>>> I have a data frame like this:
>>>
>>>> head(dt,20)
>>> chr pos gene_id pval_nominal pval_ret wl wr
>>> 1: chr1 54490 ENSG00000227232 0.6084950 0.7837780 31.62278 21.2838
>>> 2: chr1 58814 ENSG00000227232 0.2952110 0.8975820 31.62278 21.2838
>>> 3: chr1 60351 ENSG00000227232 0.4397880 0.8679590 31.62278 21.2838
>>> 4: chr1 61920 ENSG00000227232 0.3195280 0.6018090 31.62278 21.2838
>>> 5: chr1 63671 ENSG00000227232 0.2377390 0.9880390 31.62278 21.2838
>>> 6: chr1 64931 ENSG00000227232 0.2766790 0.9070370 31.62278 21.2838
>>> 7: chr1 81587 ENSG00000227232 0.6057930 0.6167630 31.62278 21.2838
>>> 8: chr1 115746 ENSG00000227232 0.4078770 0.7799110 31.62278 21.2838
>>> 9: chr1 135203 ENSG00000227232 0.4078770 0.9299130 31.62278 21.2838
>>> 10: chr1 138593 ENSG00000227232 0.8464560 0.5696060 31.62278 21.2838
>>>
>>> it is very big,
>>>> dim(dt)
>>> [1] 73719122 8
>>>
>>> To count number of unique rows for all 3 columns: chr, pos and gene_id
>>> I could just join those 3 columns and than count. But how would I find
>>> unique number of rows for these 4 columns without joining them?
>>>
>>> Thanks
>>> Ana
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list