[R] Efficient way to create new column based on comparison with another dataframe
Gaius Augustus
gaiusjaugustus at gmail.com
Fri Jan 29 19:52:06 CET 2016
I have two dataframes. One has chromosome arm information, and the other
has SNP position information. I am trying to assign each SNP an arm
identity. I'd like to create this new column based on comparing it to the
reference file.
*1) Mapfile (has millions of rows)*
Name Chr Position
S1 1 3000
S2 1 6000
S3 1 1000
*2) Chr.Arms file (has 39 rows)*
Chr Arm Start End
1 p 0 5000
1 q 5001 10000
*R Script that works, but slow:*
Arms <- c()
for (line in 1:nrow(Mapfile)){
Arms[line] <- Chr.Arms$Arm[ Mapfile$Chr[line] == Chr.Arms$Chr &
Mapfile$Position[line] > Chr.Arms$Start & Mapfile$Position[line] <
Chr.Arms$End]}
}
Mapfile$Arm <- Arms
*Output Table:*
Name Chr Position Arm
S1 1 3000 p
S2 1 6000 q
S3 1 1000 p
In words: I want each line to look up the location ( 1) find the right Chr,
2) find the line where the START < POSITION < END), then get the ARM
information and place it in a new column.
This R script works, but surely there is a more time/processing efficient
way to do it.
Thanks in advance for any help,
Gaius
[[alternative HTML version deleted]]
More information about the R-help
mailing list