Breaking timeseries data into sessions

October 26, 2014/0/0
Home / Breaking timeseries data into sessions / Breaking timeseries data into sessions

Edit: Since writing this I’ve found lots of ways to simplify this solution, for instance making use of the lag function in MS SQL, or MySQL solutions using variables, but I’ll leave this here anyway!

Problem:

  • You have a lot of data about users interacting with your site, with a UserID and timestamp for each transaction.
  • However, you want to break this down into seperate user browsing sessions.
  • You can’t use any normal session variables from the logs, because they get recycled if the user logs on a second time later in the same day.
  • For a simple first approximation, you want to apply the rule that if a user hasn’t had any transactions for 30 minutes, they’ve finished their session.
  • You have a lot of data and you need a set-wise SQL solution.

The problem is actually similar to the ‘islands and gaps’ sequence problem, which has lots of solutions online, but it’s a tad harder because you can’t use some of the properties of sequence data. The main trick is that it’s much easier to find the big gaps between sessions than it is to find continuous sessions. So the solution below starts by finding the big gaps, and then converts these gaps to get the ‘islands’ of continuous activity, and finally grabs a couple of stats for the sake of demonstration.

Navigation

Add comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.