17

The Value of Bitemporality

 5 years ago
source link: https://www.tuicool.com/articles/hit/AFrIJz6
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Most of us might have coded up the humble database column updated_date , at some point or another, giving us a feeling that data and time are nearly always intrinsically linked in our database designs. Along with updated_date , we may also find ourselves needing created_date , to preserve an immutable record of the row’s origin.

VjIrmue.png!web

SQL Table with updated_date and created_date

Sometimes we will need to keep the historical versions of entities around also, either for audit reasons or to serve business requirements where historical data can inform future decisions.

QJfmAzE.png!web

SQL Table with valid_from and valid_to

In table above we maintain versions of entities by using valid_from and valid_to . This allows us to navigate to a particular version of an entity that was live at a given time.

This approach is pragmatic and has benefits, but it adds complexity to the database design. It also relies on updating database rows to set the date columns, which hampers future efforts to work with the database as an immutable store.

Temporal Databases

Temporal databases aim to make our programming lives easier around time, by baking time itself into the engine. One major feature of temporal databases is the ability to query the database as of a particular point in time.

The SQL:2011 specification gives syntax for this:

SELECT... AS OF SYSTEM TIME

To facilitate this, the database itself maintains a transaction time (aka system time ) for each transaction, and it is always increasing so that the past is immutable. This allows you to query the database at any point on the transaction time axis, to see what the database looked like in a given point in history.

The transaction time of a database fact is the time when the fact is current in the database and may be retrieved

— Glossary of Temporal Database Concepts

transaction time is the time at which the database sees incoming data. This gives an inherent audit log, maintaining an immutable history of database transactions.

Temporal databases may also offer the ability to get the history of entities, or to query using temporal predicates (i.e. find me all cases when a field was changed in a certain time span ). They may also offer support for time-series investigations to help analyse out trends and to make predictions.

An example use-case of where as of may be useful is in financial services, where regulatory regimes demand that financial trades can be reconstructed to show how they were priced, based on the available market data at the time.

To develop this kind of trade-reconstruction functionality without temporal functionality such as as of , is surprisingly difficult to do. To get around this, data is often copied and put aside, with an increased operations management overhead.

If our databases have temporal functionality then we can simply query the database as of the trade creation time, to discover why the trade was priced the way it was.

Whose time is it anyway?

If your database is the only database in the mix and is therefore the singular source of truth, then transaction time could be all that matters.

What happens however, if your database is fed from some upstream source, which is a common occurrence in large enterprises?

In systems with substantial data, the database being queried against is often a materialised view formed off the back of an upstream event-log, and it’s the time when the data entered the event-log that we care most about. It’s this time axis that we may want to query against using techniques such as as of , rather than transaction time .

This additional timestamp field - originating upstream - could represent the real truth of when a fact was made a reality, not just when it was transacted.

Bitemporality is about the introduction of a second time axis, called valid time .

The valid time of a fact is the time when the fact is true in the modelled reality

— Glossary of Temporal Database Concepts

Valid Time

With bitemporality , the powerful things you can do in a temporal database against a single time axis of transaction time , you can do against two axes of time: transaction time and valid time .

In the case of a database being fed from some upstream source, the immediate database would still own and govern transaction time , but valid time would be passed to it in each transaction.

This would allow programmers to perform temporal queries such as as of against valid time, which is a more typical business use-case

Immutability of the database will still be preserved, because transaction time is always increasing and you can’t update or erase the transaction time past.

When performing temporal queries, programmers will be able to query as of a bitemporal co-ordinate, i.e. to be shown the state of the world at a point in valid time , but also at a point in transaction time past. This allows for query consistency, factoring in both time axes.

Updating the Past

valid time has a fundamental difference to transaction time , in that you can insert into the past.

In the messy world of enterprise IT, data is frequently moved about wholesale, be it data-centre migrations to the cloud, compacting, backups, ETL jobs to provide materialised views…​ data is nearly always on the move.

There may be times when we want to do bulk-loads into the past, and if we’re reliant purely on a single monotonically increasing time-axis, then this is going to be a challenge, especially if we want to preserve transaction time as something meaningful to query against. Add in the potential need for parallel data-writing to speed up ingestion, then the problem is exacerbated.

valid time is more flexible. You can insert into the past of valid time , and therefore it doesn’t matter in which order transactions are committed.

This also helps for global data topologies; if your database is being fed transactions from upstream sources, it might be inescapable that transactions will occasionally arrive out of order.

valid time also opens up the possibility of sharding for horizontal scaling, whereby different data-stores might have differing transaction times for when facts were transacted, but then you could achieve query consistency using valid time .

N-Temporality?

Why would you stop at two time axes? Why not go for three, or four, or N many?

Firstly, bitemporality is a significant increment as it allows for correcting against the past. Now you can have your immutable data cake and eat it.

Secondly, it’s helpful to see both transaction time and valid time as the essential building blocks for creating temporal data-models.

With transaction time you achieve immutability and you have the rock solid audit log. Should you then need an additional domain time, you have options for modelling your data using independent facts with their own valid time , rather than overloading multiple time fields onto a single fact or entity.

It isn’t trivial, but with two dimensions you can model your data to reflect views of time for different business use-cases.

Conclusion

Bitemporality gives programmers more control over time. It’s useful for query, but it’s also important for the messy world that we live in, where data is moved around and we can’t guarantee strict transactional ordering.

Having two time dimensions allows us to take more advantage of the features a temporal database offers.

JUXT is releasing a new open-source bitemporal database calledCrux, on April 19, 2019 at Clojure/north .


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK