[R] Fastest way to compare a single value with all values in one column of a data frame

Wed Jan 30 17:03:12 CET 2013

HI,

Sorry, my previous solution doesn't work.
This should work for your dataset:
set.seed(1851)
x<- data.frame(item=sample(letters[1:5],20,replace=TRUE),a=sample(1:15,20,replace=TRUE),b=sample(20:30,20,replace=TRUE),stringsAsFactors=F)
y<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
 x[x$a%in%which.min(x[x$a<y$a,]$a),]<- y #if there are multiple minimum values

set.seed(1241)
x1<- data.frame(item=sample(letters[1:10],1e4,replace=TRUE),a=sample(1:30,1e4,replace=TRUE),b=sample(1:100,1e4,replace=TRUE),stringsAsFactors=F)
y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
length(x1$a[x1$a==1])
#[1] 330
 system.time({x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1})
#   user  system elapsed 
 # 0.000   0.000   0.001 
length(x1$a[x1$a==1])
#[1] 0

#For some reason, it is not working when the multiple number of minimum values > some value

set.seed(1241)
x1<- data.frame(item=sample(letters[1:10],1e5,replace=TRUE),a=sample(1:30,1e5,replace=TRUE),b=sample(1:100,1e5,replace=TRUE),stringsAsFactors=F)
y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
length(x1$a[x1$a==1])
#[1] 3404
x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1
 length(x1$a[x1$a==1])
#[1] 3404 #not getting replaced

#However, if I try:
set.seed(1241)
 x1<- data.frame(item=sample(letters[1:10],1e6,replace=TRUE),a=sample(1:5000,1e6,replace=TRUE),b=sample(1:100,1e6,replace=TRUE),stringsAsFactors=F)
 y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
 length(x1$a[x1$a==1])
#[1] 208
 system.time(x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1)
#user  system elapsed 
 # 0.124   0.016   0.138 
  length(x1$a[x1$a==1])
#[1] 0

#Tried Jessica's solution:
set.seed(1851)
 x<- data.frame(item=sample(letters[1:5],20,replace=TRUE),a=sample(1:15,20,replace=TRUE),b=sample(20:30,20,replace=TRUE),stringsAsFactors=F)
 y<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
 x[intersect(which(x$a < y$a),which.min(x$a)),] <- y
 x
#   item  a  b
#1     a  8 25
#2     a 10 26
#3     f  3 10 #replaced
#4     e 15 26
#5     b 13 20
#6     a  5 23
#7     d  4 29
#8     e  2 24
#9     c  7 30
#10    e 14 24
#11    d  2 20
#12    e 10 21
#13    c 13 27
#14    d 12 23
#15    b 11 26
#16    e  5 22
#17    c  1 26  #it is not replaced
#18    a  8 21
#19    e 10 26
#20    c  2 22

A.K.

----- Original Message -----
From: Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com>
To: r-help <r-help at r-project.org>
Cc: 
Sent: Tuesday, January 29, 2013 4:11 PM
Subject: [R] Fastest way to compare a single value with all values in one column of a data frame

Hello!

I have a large data frame x:
x<-data.frame(item=letters[1:5],a=1:5,b=11:15)  # in actuality, x has 1000
rows
x$item<-as.character(x$item)
I also have a small data frame y with just 1 row:
y<-data.frame(item="f",a=3,b=10)
y$item<-as.character(y$item)

I have to decide if y$a is larger than the smallest of all the values in
x$a. If it is, I want y to replace the whole row in x that has the lowest
value in column a.
This is how I'd do it.

if(y$a>min(x$a)){
  whichmin<-which(x$a==min(x$a))
  x[whichmin,]<-y[1,]
}

I am wondering if there is a faster way of doing it. What would be the
fastest possible way? I'd have to do it, unfortunately, many-many times.

Thank you very much!

-- 
Dimitri Liakhovitski
gfk.com <http://marketfusionanalytics.com/>

    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.