38

JSLT: An open source language for JSON processing

 5 years ago
source link: https://www.tuicool.com/articles/hit/vmA3qm2
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

JSLT is a language for querying and transforming JSON that runs 9 billion transforms every day at Schibsted. We are now open sourcing it so anyone can use it.

Every day, Schibsted’s data platform receives roughly 800 million events in many different JSON formats. The platform has to route the events to the correct destinations, clean up bad data, anonymize events, convert to new formats, and so on. And since this processing is done on behalf of 40 different subsidiary companies, people in all those companies have to be able to understand and change the parts of the logic that are relevant to them.

The platform is written in Scala, but many of the people working with this configuration are not Scala developers, and some are not even developers at all. So we created JSLT, a language for JSON processing that’s much easier than a full programming language, and use it to implement the routing and transformations.

The simplest way to use JSLT is for queries. If you write .foo you will get the value of the “foo” key inside the outermost object. .foo.bar will get the “bar” key inside the “foo” object. And so on.

To test whether a tracking event is from a logged-in user, and therefore should be included in the identified data set, we just write .actor.”spt:userId”. If this returns a value, the event is included, if not it’s excluded. All of the routing is implemented with simple tests like these, although sometimes they are more complicated, like

get-client(.) == "vgpluss" and .provider.component == "chameleon"

A simple transform

The examples we’ve shown so far are a simple query language for JSON that can be used to extract data from JSON objects. And once you have a query language you have nearly everything you need to make a transformation language, because for writing JSON output we can reuse JSON itself. So JSLT transforms are basically JSON templates with JSLT queries in them.

Let’s say we want to make a new data set of tracking data which is dramatically simplified: we want only the page URL and the time the user saw the page. To write this transform in JSLT we specify the static bits using JSON syntax, then insert the values we want, like this:

{
  "page-url" : .object.url,
  "time-viewed" : .published
}

That’s it.

Object matching

Let’s try something more ambitious. Let’s say we want to anonymize the tracking data to create a data set that’s less useful, but that we can retain longer, and that more people can get access to. For that we need to mostly keep the event as it is, but we want to modify some fields.

JSLT supports this through “object matching”, where you can write something like this:

{
  "foo" : .compute.value,
  * : .
}

The star will copy all the keys that haven’t been defined and keep their original values. So this transform will change the “foo” key, but leave everything else untouched. With this we can easily do our anonymization.

let salt = "long-random-string" // define variable used in hashing
 
{
  "actor " : {
    "spt:userId" : if ( .actor."spt:userId" )
                     sha256( $salt + .actor."spt:userId" ),
    "spt:remoteAddress" : null, // drop the IP address
    * : .
 },
 * : .
}

More complicated things

JSLT can also transform arrays, rewrite objects, and do quite complicated transforms. You can write your own functions in Java and plug them into the language. In fact, you can even define your own functions inside the language. We are cleaning up bad data by doing complicated string matching logic and so on in JSLT itself.

We’ve found JSLT to be very useful for our application, and we think more people might have uses for it, which is why we are releasing it as open source. The implementation is written in pure Java, and only depends on Jackson, so it should be easy to include in your own projects.

If you’re interested,


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK