6

Average time difference based on multiple criteria

 3 years ago
source link: https://www.codesd.com/item/average-time-difference-based-on-multiple-criteria.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Average time difference based on multiple criteria

advertisements

I have a dataset that includes timestamped sales information for a specific day, for multiple retailers and product type. Also, it is a multiproduct vending machine, and the transaction can be initialized a number of ways.

Here is a small sample of how it is organzied:

retailer    session    eventtype    timestamp
123         1          1            5:15:09
123         1          3            5:15:13
123         2          1            5:16:21
123         2          3            5:16:33
123         3          2            5:16:59
123         3          1            5:17:04
123         3          6            5:17:22
555         1          2            6:45:32
555         1          3            6:45:46
555         2          1            6:56:02
555         2          3            7:02:21
929         4          1            3:21:23
929         4          6            3:22:09

So, retailer 123 had 3 sessions, with a few different events within each session (money being inserted, sales being made, vouchers being redeemed, etc).

What I'd like to know is what is the mean time between sessions overall for all retailers. So, when a session changes to a new number, under the same retailer, how many seconds/minutes passed in between?

Also, I'd like to be able to play around with it- what was the stdev? Is there a difference in times between sessions depending if the session started with a 1 or 3 type event? What if I just want to completely ignore sessions that start with event type 2?

Sorry for the detailed request, but this is beyond my R capabilities and I can't seem to find good documentation on it (mainly because I'm probably not looking in the right places).

Thanks!


Here is a base R alternative.

# convert timestamp to as.POSIXct object
df$time <- as.POSIXct(paste(Sys.Date(), df$timestamp))

# create an index of session change for each retailer
df$change <- with(df, ave(session, retailer, FUN = function(x){
  c(0, diff(x))
}))

# for each retailer, calculate time difference between first time of new session
# and time of the previous row
diffs <- by(data = df, df$retailer, function(x){
  difftime(x$time[x$id == 1], x$time[which(x$id == 1) - 1], units = "secs")
})

diffs
# df$retailer: 123
# Time differences in secs
# [1] 68 26
# attr(,"tzone")
# [1] ""
# ---------------------------
#   df$retailer: 555
# Time difference of 616 secs
# ---------------------------
#   df$retailer: 929
# Time difference of  secs

# overall mean of time differences between sessions
mean(unlist(diffs))
# [1] 236.6667

Tags excel

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK