[R] 2 matrix scatter x [a lot]

Dennis Murphy djmuser at gmail.com
Tue Aug 16 00:53:56 CEST 2011


Here's one way, using the following reproducible example.

# Method 1: the variable names are the same in each data frame
# Create two separate data frames
ds1 <- data.frame(x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10))
ds2 <- data.frame(x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10))

# Melt the data to create a factor whose levels are the variable
# names and a variable 'value' to contain the corresponding values
dm1 <- melt(ds1)
# Since the values are different in each data frame, change the
# name of the value variable in each
names(dm1)[2] <- 'val1'
dm2 <- melt(ds2)
names(dm2)[2] <- 'val2'
# Since I know the two melted data frames have the same
# dimensions, I can cbind the value variable of the second
# to the first
dm <- cbind(dm1, val2 = dm2[['val2']])

# Conditioning plots:

# lattice version
xyplot(val2 ~ val1 | variable, data = dm)
# ggplot2 version
ggplot(dm, aes(x = val1, y = val2)) + geom_point() +
   facet_wrap( ~ variable)

# Method 2: Variable names are different
ds1 <- data.frame(x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10))
ds2 <- data.frame(y1 = rnorm(10), y2 = rnorm(10), y3 = rnorm(10))

dm1 <- melt(ds1)
names(dm1)[2] <- 'val1'
dm2 <- melt(ds2)
names(dm2)[2] <- 'val2'
dm <- cbind(dm1, val2 = dm2[['val2']])
# Change the level labels of variable to represent the
# column numbers instead:
dm$Variable <- factor(dm$variable,
                 labels = seq_len(length(levels(dm$variable))))

xyplot(val2 ~ val1 | Variable, data = dm, xlab = 'x', ylab = 'y')
ggplot(dm, aes(x = val1, y = val2)) + geom_point() +
   facet_wrap( ~ Variable) + labs(x = 'x', y = 'y')

You've probably got something more complicated than this in terms of
variable names, but the outline above should be enough to get you


On Mon, Aug 15, 2011 at 3:13 PM, Ben qant <ccquant at gmail.com> wrote:
> Hello,
> I'm pretty new to R. Basically, how do I speed up the for loop below. Or
> better yet, get rid of the for loop all together.
> objective: plot two data sets column against column by index. These data
> sets have alot NA's. Some columns are all NA's. I need the plots to overlay.
> I don't like the plots in matplot(). Needs to be much faster than the code
> below...
> #simple sample data.. my data sets have 61 rows and over 11k columns each.
> x = matrix(1:4,2,2)
> y = matrix(4:1,2,2)
> y[2,2] = NA
> y[1,1] = NA
> #calc'd here to save time on plotting
> xlim.v = c(min(x, na.rm = TRUE),max(x,na.rm = TRUE))
> ylim.v = c(min(y, na.rm = TRUE),max(y,na.rm = TRUE))
> for(i in 1:ncol(x)){
>  xy = na.omit(cbind(x[,i],y[,i]))
>  if(length(dim(xy)[1]) > 0){
>    plot(xy[,1],xy[,2],xlim = xlim.v,ylim= ylim.v); par(new=T);
>  }
> }
> Thanks!
>        [[alternative HTML version deleted]]
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list