[R] POSIX and ecdf()

Doran, Harold HDoran at air.org
Mon Mar 30 11:40:56 CEST 2015

Below is some working code that, generally speaking, accomplishes why I want, but am looking for a necessary improvement in the final step. The code below scrapes data from a website (thousands of pages actually) and organizes athlete’s scores in a data frame. The final variable, called Workout05 in the original data is a timed event. So, I use strplit() to pull out the data I want in that column and format it using as.POSIXct() as you can see in the code below (using a regular expression I’m sure would improve on how to pull out those data in the column, but that is not my primary question).

After I have all data, I want to find the empirical CDF of the data, so I use ecdf() on those data just as I would on other variables. Now, the main issue I’m interested is in the final step where you plug in a specific time to find its percentile

## These are below in context of the real problem as well
fn <- ecdf(dat$score5)

This works, but not in the way I want. What I want is for a user to easily be able to enter their time in “lay” terms such as 5:35 and from that it would return the percentile rank.

So, I’d like something like the following to be able to work


The larger context for this problem for why I want this can be seen if you visit my web app built using shiny. I’ve built a site where athletes can build customized reports based on their performance on certain events by entering in data. This specific issue would be found on the “get my percentile” tab where a user can use the text input box to enter their time in a way humans typically understand it and then it gets passed to the R fn() function that runs in the background and builds the plot for them.


So, my question is how can I structure this such that a time can be expressed as simply minute:seconds (e.g., 4:52) in a text box so that it would still work to return a percentile rank as I’ve described here.



        i = 1; j = 0; division = 1
        url <-
        paste(paste('http://games.crossfit.com/scores/leaderboard.php?stage=5&sort=0&page=', i, sep=''), paste('&division=1&region=', j, sep=''), '&numberperpage=100&competition=0&frontpage=0&expanded=1&year=15&full=1&showtoggles=0&hidedropdowns=0&showathleteac=1&=&is_mobile=0', sep='')
        tmp <- try(readHTMLTable(readLines(url), which=1, header=TRUE))
if(!is.null(dim(tmp))){ # new part here
        names(tmp) <- gsub("\\n", "", names(tmp))
        names(tmp) <- gsub(" +", "", names(tmp))
        tmp[] <- lapply(tmp, function(x) gsub("\\n", "", x))
        tmp$region <- j
        dat <- tmp

   aa <- strsplit(dat$Workout05, split = '\\(')
bb <- sapply(aa, function(x) x[2])
aa <- strsplit(bb, split = '\\)')

dat$score5 <- as.character(sapply(strsplit(bb, split = '\\)'), function(x) x))
dat$score5 <- as.POSIXct(dat$score5, format="%M:%S")

fn <- ecdf(dat$score5)

	[[alternative HTML version deleted]]

More information about the R-help mailing list