[R] Speeding up code?

Thu Jul 16 13:56:54 CEST 2015

Hi Collin,

The objective of the gen.names function is to generate N *unique *random
names, where N is a *large *number. In my computer `gen.names(n = 50000)`
takes under a second, so is probably not the root problem in my code. That
said, I would love to improve it. I'm not exactly sure how you propose to
change it using sample. What is the object that I would be sampling? I
would love to run a little benchmark to compare my version with yours.

What really takes a long time to run is:

    separoPairs <- as.data.frame(str_split(string = AllpairsTmp, pattern =
"-"))

So that and the chunk of code before that is probably where I would get big
gains in speed. Sadly, I have no clue how to do it differently

Thanks a lot for the help!!

Ignacio

On Wed, Jul 15, 2015 at 11:34 PM Collin Lynch <cflynch at ncsu.edu> wrote:

> Hi Ignacio, If I am reading your code correctly then the top while loop is
> essentially seeking to select a random set of names from the original set,
> then using unique to reduce it down, you then iterate until you have built
> your quota.  Ultimately this results in a very inefficient attempt at
> sampling without replacement.  Why not just sample without replacement
> rather than loop iteratively and use unique?  Or if the set of possible
> names are short enough why not just randomize it and then pull the first n
> items off?
>
>     Best,
>     Collin.
>
> On Wed, Jul 15, 2015 at 11:15 PM, Ignacio Martinez <ignacio82 at gmail.com>
> wrote:
>
>> Hi R-Help!
>>
>> I'm hoping that some of you may give me some tips that could make my code
>>
> more efficient. More precisely, I would like to make the answer to my
>> stakoverflow
>> <
>> http://stackoverflow.com/questions/31137940/randomly-assign-teachers-to-classrooms-imposing-restrictions
>> >
>
>
>> question more efficient.
>>
>> This is the code:
>>
>> library(dplyr)
>> library(randomNames)
>> library(geosphere)
>>
> set.seed(7142015)# Define Parameters
>
>
>> n.Schools <- 20
>> first.grade<-3
>> last.grade<-5
>> n.Grades <-last.grade-first.grade+1
>> n.Classrooms <- 20 # THIS IS WHAT I WANTED TO BE ABLE TO CHANGE
>> n.Teachers <- (n.Schools*n.Grades*n.Classrooms)/2 #Two classrooms per
>> teacher
>> # Define Random names function:
>> gen.names <- function(n, which.names = "both", name.order = "last.first"){
>>   names <- unique(randomNames(n=n, which.names = which.names,
>> name.order = name.order))
>>   need <- n - length(names)
>>   while(need>0){
>>     names <- unique(c(randomNames(n=need, which.names = which.names,
>> name.order = name.order), names))
>>     need <- n - length(names)
>>   }
>>   return(names)}
>> # Generate n.Schools names
>> gen.schools <- function(n.schools) {
>>   School.ID <-
>>     paste0(gen.names(n = n.schools, which.names = "last"), ' School')
>>   School.long <- rnorm(n = n.schools, mean = 21.7672, sd = 0.025)
>>   School.lat <- rnorm(n = n.schools, mean = 58.8471, sd = 0.025)
>>   School.RE <- rnorm(n = n.schools, mean = 0, sd = 1)
>>   Schools <-
>>     data.frame(School.ID, School.lat, School.long, School.RE) %>%
>>     mutate(School.ID = as.character(School.ID)) %>%
>>     rowwise() %>%  mutate (School.distance = distHaversine(
>>       p1 = c(School.long, School.lat),
>>       p2 = c(21.7672, 58.8471), r = 3961
>>     ))
>>   return(Schools)}
>>
>> Schools <- gen.schools(n.schools = n.Schools)
>> # Generate Grades
>> Grades <- c(first.grade:last.grade)
>> # Generate n.Classrooms
>>
>> Classrooms <- LETTERS[1:n.Classrooms]
>> # Group schools and grades
>>
>> SchGr <- outer(paste0(Schools$School.ID, '-'), paste0(Grades, '-'),
>> FUN="paste")#head(SchGr)
>> # Group SchGr and Classrooms
>>
>> SchGrClss <- outer(SchGr, paste0(Classrooms, '-'),
>> FUN="paste")#head(SchGrClss)
>> # These are the combination of  School-Grades-Classroom
>> SchGrClssTmp <- as.matrix(SchGrClss, ncol=1, nrow=length(SchGrClss) )
>> SchGrClssEnd <- as.data.frame(SchGrClssTmp)
>> # Assign n.Teachers (2 classroom in a given school-grade)
>> Allpairs <- as.data.frame(t(combn(SchGrClssTmp, 2)))
>> AllpairsTmp <- paste(Allpairs$V1, Allpairs$V2, sep=" ")
>>
>> library(stringr)
>> separoPairs <- as.data.frame(str_split(string = AllpairsTmp, pattern =
>> "-"))
>> separoPairs <- as.data.frame(t(separoPairs))
>> row.names(separoPairs) <- NULL
>> separoPairs <- separoPairs %>% select(-V7)  %>%  #Drops empty column
>>   mutate(V1=as.character(V1), V4=as.character(V4), V2=as.numeric(V2),
>> V5=as.numeric(V5)) %>% mutate(V4 = trimws(V4, which = "both"))
>>
>> separoPairs[120,]$V4#Only the rows with V1=V4 and V2=V5 are valid
>
>
>> validPairs <- separoPairs %>% filter(V1==V4 & V2==V5) %>% select(V1, V2,
>> V3, V6)
>> # Generate n.Teachers
>>
>> gen.teachers <- function(n.teachers){
>>   Teacher.ID <- gen.names(n = n.teachers, name.order = "last.first")
>>   Teacher.exp <- runif(n = n.teachers, min = 1, max = 30)
>>   Teacher.Other <- sample(c(0,1), replace = T, prob = c(0.5, 0.5),
>> size = n.teachers)
>>   Teacher.RE <- rnorm(n = n.teachers, mean = 0, sd = 1)
>>   Teachers <- data.frame(Teacher.ID, Teacher.exp, Teacher.Other,
>> Teacher.RE)
>>   return(Teachers)}
>> Teachers <- gen.teachers(n.teachers = n.Teachers) %>%
>>   mutate(Teacher.ID = as.character(Teacher.ID))
>> # Randomly assign n.Teachers teachers to the "ValidPairs"
>> TmpAssignments <- validPairs[sample(1:nrow(validPairs), n.Teachers), ]
>> Assignments <- cbind.data.frame(Teachers$Teacher.ID, TmpAssignments)
>> names(Assignments) <- c("Teacher.ID", "School.ID", "Grade", "Class_1",
>> "Class_2")
>> # Tidy Data
>> library(tidyr)
>> TeacherClassroom <- Assignments %>%
>>   gather(x, Classroom, Class_1,Class_2) %>%
>>   select(-x) %>%
>>   mutate(Teacher.ID = as.character(Teacher.ID))
>> # Merge
>> DF_Classrooms <- TeacherClassroom %>% full_join(Teachers,
>> by="Teacher.ID") %>% full_join(Schools, by="School.ID")
>> rm(list=setdiff(ls(), "DF_Classrooms")) # Clean the work space!
>>
>> *I want to end up with the same*  'DF_Classrooms *data frame* but getting
>
>
>> there in a more efficient way. In particular, when is use n.Classrooms
>> <-4 the
>>
> code run fast, but *if I increase it to something like 20 it is painfully
>> slow.*
>>
>> Thanks!!!
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

	[[alternative HTML version deleted]]