[R] Irregular time series frequencies
Achim Zeileis
Achim.Zeileis at uibk.ac.at
Thu Oct 31 08:48:07 CET 2013
On Wed, 30 Oct 2013, sartene at voila.fr wrote:
> Hi everyone,
>
> I have a data frame with email addresses in the first column and in the second column a list of times (of different lengths) at which an email was sent from the
> user in the first column.
>
> Here is an example of my data:
>
> Email Email_sent
> john at doe.com "2013-09-26 15:59:55" "2013-09-27 09:48:29" "2013-09-27 10:00:02" "2013-09-27 10:12:54"
> jane at shoe.com "2013-09-26 09:50:28" "2013-09-26 14:41:24" "2013-09-26 14:51:36" "2013-09-26 17:50:10" "2013-09-27 13:34:02" "2013-09-27 14:41:10"
> "2013-09-27 15:37:36"
> ...
>
> I cannot find any way to calculate the frequencies between each email sent for each user:
> john at doe.com 0.02 email / hour
> jane at shoe.com 0.15 email / hour
> ...
>
> Can anyone help me on this problem?
You could do something like this:
## scan your data file
d <- scan(<yourfile>, what = "character")
## here I use the data from above
d <- scan(textConnection('john at doe.com "2013-09-26 15:59:55"
"2013-09-27 09:48:29" "2013-09-27 10:00:02" "2013-09-27 10:12:54"
jane at shoe.com "2013-09-26 09:50:28" "2013-09-26 14:41:24"
"2013-09-26 14:51:36" "2013-09-26 17:50:10" "2013-09-27 13:34:02"
"2013-09-27 14:41:10" "2013-09-27 15:37:36"'), what = "character")
## find position of e-mail addresses
n <- grep("@", dc, fixed = TRUE)
## extract list of dates
n <- c(n, length(d) + 1)
x <- lapply(1:(length(n) - 1),
function(i) as.POSIXct(d[(n[i] + 1):(n[i+1] - 1)]))
## add e-mail addresses as names
names(x) <- d[head(n, -1)]
## functions that could extract quantities of interest such as
## number of mails per hour or mean time difference etc.
meantime <- function(timevec)
mean(as.numeric(diff(timevec), units = "hours"))
numperhour <- function(timevec)
length(timevec) / as.numeric(diff(range(timevec)), units = "hours")
## apply to full list
sapply(x, numperhour)
sapply(x, meantime)
## apply to list by date
sapply(x, function(timevec) tapply(timevec, as.Date(timevec), numperhour))
sapply(x, function(timevec) tapply(timevec, as.Date(timevec), meantime))
hth,
Z
> The ultimate goal (which seems amibitious at this time) is to calculate, for each user, the frequencies between each mail per day, between the first email sent
> and the last email sent each day (to avoid taking nights into account), i.e.:
>
> 2013-09-26 2013-09-27
> john at doe.com 1.32 emails / hour 0.56 emails / hour
> jane at shoe.com 10.57 emails / hour 2.54 emails / hour
> ...
>
> At this time it seems pretty impossible, but I guess I will eventually find a way :-)
>
> Thanks a lot,
>
>
> Sartene Bel
> R learner
> ___________________________________________________________
> Qu'y a-t-il ce soir à la télé ? D'un coup d'?il, visualisez le programme sur Voila.fr http://tv.voila.fr/programmes/chaines-tnt/ce-soir.html
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list