8

Query Service Tricks in Adobe Experience Platform (Part 4: Using Adobe Defined F...

 3 years ago
source link: https://medium.com/adobetech/favorite-query-service-tricks-in-adobe-experience-platform-part-4-da83a3501fa0
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Query Service Tricks in Adobe Experience Platform (Part 4: Using Adobe Defined Functions [Pathing])

Authors: Frederik Werner, Akintunde Ajayi, and Nikhil Maddirala

Image for post
Image for post

Frederik Werner is a German Analytics Lead and Data Scientist working in and writing about Web Analytics and Online Marketing Technology. In this fourth post of a four-part series, Frederik will provide the overview and get-started tips on Adobe Experience Platform Query Service. He gives examples of his favorite tricks using Query Service and specifically for customer journey analysis use cases in Customer Journey Analytics. It also includes how to write back data to Adobe Experience Platform.

Query Service is a feature of Adobe Experience Platform. Adobe Experience Platform provides SQL analytics and AI/ML capabilities that enable enterprises to generate and operationalize customer intelligence for driving improved personalized customer experiences.

Adobe-defined functions: Pathing

Next and previous page dimensions

One of the things that are still missing from Adobe Analytics Workspace is the next and previous page dimensions to allow for easy pathing analysis. There is a way to create those reports in Workspace, but can’t we have this as a real dimension? Sure, with Query Service! We can use the NEXT and PREVIOUS functions from Adobe for this.

Again, this is a window function that allows us to be quite flexible on what we want. The syntax looks like this:

NEXT(key, [shift, [ignoreNulls]]) OVER ([partition] [order] [frame])

or for the previous function:

PREVIOUS(key, [shift, [ignoreNulls]]) OVER ([partition] [order] [frame])

So what do we have here? With the key parameter, we set the column for which we want the next or previous value. If we want the next value from the page column, we just put that column there. We can even set how far we want to go into the past or future by modifying the shift parameter from the default value of 1. The partitioning can be modified as described above to look at a session, user, or anything else. For example, the next and 2nd next page can be created like this:

NEXT(page,1,true) OVER (PARTITION BY userid, session_nr ORDER BY timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING).value "next_page_session"NEXT(page,2,true) OVER (PARTITION BY userid, session_nr ORDER BY timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING).value "second_next_page_session"

That’s a very convenient way to get this very useful information. Again, we don’t need to limit ourselves to sessions, users, or the page dimension. For example, why not do this on a user level for campaigns or marketing channels? The possibilities are endless!

Going crazy with Pathing

Here we are going to recreate pathing reports from the ancient Reports & Analytics within Adobe Analytics! And not only that: We are going to use Adobe Analytics Logfiles as our dataset to keep things interesting. The base dataset looks like this:

Image for post
Image for post
Figure 1: Sessionized Analytics Logfiles

First, I would like to know how my users move through Adobe Analytics. This can be achieved by using two Spark SQL functions:

concat_ws(' -> ',collect_list(event) OVER (PARTITION BY userid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) "full_path_session"

What are we doing here? First, we are using collect_list() to gather a list of all the events for a session. This is, again, a window function and based on userid and session.num. After we collected the actions, we can use concat_ws() to glue them all together, using ' -> ‘ a separator. The output is values like this:

Image for post
Image for post
Figure 2: Full path reports from Query Service

We now get the complete path of actions for every hit in a session. If we would leave out the session.num partition criteria, we would get the same path but for the whole user journey instead of just the current session! Let’s modify the command a bit to see what else we can do:

, concat_ws(' -> ',collect_list(event) OVER (PARTITION BY userid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)) "preceding_path_session", concat_ws(' -> ',collect_list(event) OVER (PARTITION BY userid, session.num ORDER BY timestamp ASC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)) "following_path_session"

I didn’t need to modify much here. All that changed is the ROWS BETWEEN setting in the OVER statement. By introducing this new limit, we only get the path before the current row or after the current row, so only past or future values. Now our result looks like this:

Image for post
Image for post
Figure 3: Preceding Path

We see how with each new event, our Path grows by one element. The following path for the same session looks like this:

Image for post
Image for post
Figure 4: Following Path

Which is the exact opposite. We could use this in a report to see what happened before or after a certain event. We could for example break down our page's report by the following or preceding path to see what our users were up to before the current page. There is an even clearer way for this (with a more complicated syntax):

concat_ws(' -> ',collect_list(event) OVER (PARTITION BY userid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),'Current Event',collect_list(event) OVER (PARTITION BY userid, session.num ORDER BY timestamp ASC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)) "relative_path_session"

Here we now combine both the preceding and following path, inserting a Current Itembetween them. This gives us paths like this:

Image for post
Image for post
Figure 5: Relative Pathing

Here we clearly see how the Current Event moves through the path with each new event. What else can we do?

Counting and calculating

There is an awesome Adobe Defined Function to calculate the time until or since a certain event. While this is very helpful to analyze things like “time to first conversion” or event “time to next conversion”, we will focus on something else here. With the NEXT function, we can get values from future rows of data. Based on our already sessionized dataset, it’s easy to get the Time Spent value of a row like this:

NEXT(session.timestamp_diff,1,true) OVER (PARTITION BY userid, session.num ORDER BY timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING).value "time_spent"

We can confirm this is working by looking at the output:

Image for post
Image for post
Figure 6: Time spent in Query Service

Perfect! Our functions pull the time spent by looking at the next row for a session. But looking at the depth parameter for our session, I wonder if we could do the same on a user level. Thanks to SQL, we can.

ROW_NUMBER() OVER (PARTITION BY userid ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) "user_depth"

This is a new function for us, returning the number of an event relative to the user, thanks to our window function's settings. We get an output like this rightmost column:

Image for post
Image for post
Figure 7: User Depth

Now it’s easy for us to analyze the second thing a user ever did, no matter what session it was in! That gives me another idea: Can we know how many actions a session or user will take in total? Here is the code snippet:

attribution_last_touch(timestamp, '', session.depth) OVER (PARTITION BY userid,session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).value max_session_depth, attribution_last_touch(timestamp, '', user_depth) OVER (PARTITION BY userid ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).value max_user_depth

Using the attribution functions we talked about earlier, we can easily get the last value from a session or user, which looks like these two new columns:

Image for post
Image for post
Figure 8: Maximum of session and user depth

That was surprisingly simple. We know can already analyze how far sessions or even users went into our properties. Now let’s have some fun and throw in a bit of math:

int(session.depth / max_session_depth * 100) session_depth_percentage, int(user_depth / max_user_depth * 100) user_depth_percentage

You see right: I just calculated the percentage of how far in a session or user journey a certain event occurred. It looks like those two new columns now:

Image for post
Image for post
Figure 9: Depth percentages

What does this tell us? We could now build a segment for every action in the second half of a session or user journey! We could also see if a certain action occurs more often at the beginning or end of a session, regardless of how deep the session was. And we can do the same for the actual time with just a few more columns:

attribution_first_touch(timestamp, '', timestamp) OVER (PARTITION BY userid,session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).value min_session_timestamp, attribution_last_touch(timestamp, '', timestamp) OVER (PARTITION BY userid,session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).value max_session_timestamp, attribution_first_touch(timestamp, '', timestamp) OVER (PARTITION BY userid ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).value min_user_timestamp, attribution_last_touch(timestamp, '', timestamp) OVER (PARTITION BY userid ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).value max_user_timestamp

First, we have to get the first and last timestamp of both the session and the user journey. Based on those, we can do some math again:

int(((to_unix_timestamp(timestamp) - to_unix_timestamp(min_session_timestamp))/(to_unix_timestamp(max_session_timestamp) - to_unix_timestamp(min_session_timestamp)))*100) session_time_percentage, int(((to_unix_timestamp(timestamp) - to_unix_timestamp(min_user_timestamp))/(to_unix_timestamp(max_user_timestamp) - to_unix_timestamp(min_user_timestamp)))*100) user_time_percentage

This returns an output like:

Image for post
Image for post
Figure 9: Relative time

Those two columns on the right now give us a percentage of where an event occurred relative to the session and user journey based on time. This is slightly different than the calculation based on the hit depth since hits can be unevenly distributed on a timeline across a session or journey. With these time-based calculations, we could even get the time prior to the event on both a session and user level, as well as the time after an event:

(to_unix_timestamp(timestamp) - to_unix_timestamp(min_session_timestamp)) session_time_prior_to_event, (to_unix_timestamp(timestamp) - to_unix_timestamp(min_user_timestamp)) user_time_prior_to_event, (to_unix_timestamp(max_session_timestamp) - to_unix_timestamp(timestamp)) session_time_after_event, (to_unix_timestamp(max_user_timestamp) - to_unix_timestamp(timestamp)) user_time_after_event

The first two rows give us an indication of where in a visit or user journey an event occurred (similar to Time prior to Event in Adobe Analytics), while the last two rows show how close to the end of a session or journey something happened. That way we can answer questions like “what happened in the first five minutes of a user journey?” or “how does the last minute of a converting visit look like?” with ease and use it in segmentation.

Wrap up

I hope you found this article helpful. You should be able to create some advanced fields with Query Service and write it back to an existing or new dataset. Also, you now know how to find out what went wrong if datasets can’t be written back successfully. There will be one more subsequent blog with my other favorite tips using Query Service.

Will I ever return to normal Adobe Analytics after all this Query Service and Customer Journey Analytics work? Yes, of course. It’s still my main driver for day-to-day analysis. But with all the exciting information I can get out of my boring old data I will gradually spend more and more time in Query Service and Customer Journey Analytics in the future.

This blog originally appeared The Full Stack Analyst on 10/21/2020.

Follow the Adobe Tech Blog for more developer stories and resources, and check out Adobe Developers on Twitter for the latest news and developer products. Sign up here for future Adobe Experience Platform Meetups

References

Related Blogs


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK