7

Converting Bitmasks to Text Arrays in PostgreSQL

 3 years ago
source link: https://rudism.com/converting-bitmasks-to-text-arrays-in-postgresql/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Converting Bitmasks to Text Arrays in PostgreSQL

2017-04-04 (posted in blog)

The Problem

You have some data encoded into a bitmask in an int column, and you’d like to decode it into discreet values in a text[] column instead.

For example, let’s say you’re working with the following enumeration in C#:

1
2
3
4
5
6
7
8
9
10
[Flags]
public enum SnorkAccessories
{
Snorkel = 0x1,
Belt = 0x2,
Shirt = 0x4,
Pants = 0x8,
Hair = 0x10,
Bows = 0x20
}

…and you have a Snork entity that looks something like this:

1
2
3
4
5
6
7
public class Snork
{
public Guid Id { get; set; }
public string Name { get; set; }
public SnorkColor Color { get; set; }
public SnorkAccessories Accessories { get; set; }
}

Until now, the Accessories column of your Snorks table in the database has been an integer, and you’ve populated and used your Snork entities with code along these lines:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Snork mySnork = null;

if(reader.Read())
{
mySnork = new Snork
{
Id = (Guid) reader.Current["Id"],
Name = (string) reader.Current["Name"],
Color = (SnorkColor) Enum.GetValue(typeof(SnorkColor), (string) reader.Current["Color"]),
Accessories = (SnorkAccessories) ((int) reader.Current["Accessories"])
};
}

// test if mySnork has a belt
return mySnork != null
&& mySnork.Accessories.HasFlag(SnorkAccessories.Hat);

You’ve since decided that storing your SnorkAccessories values as bitmasks was a bad idea—it’s hard to write ad-hoc queries against, and the data can’t easily be shared with other systems that don’t have knowledge of the C# enumeration and its values. So you’ve decided to migrate that data into a text[] column instead. Previously, a Snork with Shirt and Pants would have an Accessories value in the database of 0x4 | 0x8 = 0xC. In the new world, your Snork accessories would be stored in a PostgreSQL array as {"Shirt", "Pants"}.

The problem is, how do you translate your old int values into text[] values as painlessly as possible, ideally without having to write external processors to manipulate the data?

Exploring the Bitmask

The first problem is how to go about parsing the bitmask in PostgreSQL. Your first impulse might be to try something like this:

1
2
-- get all snorks with hair
SELECT * FROM snorks WHERE accessories & 0x10 > 0; -- DOESN'T WORK

You’ll note right away that PostgreSQL doesn’t seem to understand the 0xN hexadecimal notation. The way you actually do that in PostgreSQL is like this:

1
2
-- get all snorks with hair
SELECT * FROM snorks WHERE accessories & x'10'::int > 0; -- WORKS

That’s all fine and dandy, but when we’re writing our script it will be hard to keep track of Snorkel = x'1'::int, Belt = x'2'::int, and so on. It would be nicer if there was a way to define named variables or something. One way to do this is to declare variables with the \set command, but that will only work if you’re processing your script with the psql tool—it won’t work in a query run directly against the server. Another handy trick is to use a Common Table Expression to declare your values and cross join that with the rest of your query. Take this example:

1
2
3
4
5
6
7
8
9
WITH accessory AS (
SELECT
x'1'::int AS snorkel,
x'2'::int AS belt,
x'4'::int AS shirt,
...
)
SELECT * FROM accessory CROSS JOIN snorks
WHERE snork.accessories & accessory.shirt > 0;

It’s unnecessary, to be sure, but much easier to read and understand.

Creating The Text Array

Now that we have a way to parse the existing integer values, how do we go about converting it into a text array? The first step, I suppose, would be to create the new column. Let’s rename the old one as well:

1
2
ALTER TABLE snorks RENAME COLUMN accessories TO _accessories;
ALTER TABLE snorks ADD COLUMN accessories text[];

We’ve now got our old integer value available to us in the _accessories column, and a brand new accessories column ready to store our text array version of the same data. For sanity’s sake, let’s initialize the new column and add a not null constraint, since we don’t want to have to deal with null checks everywhere in our code when reading and writing Snork entities in the database:

1
2
UPDATE snorks SET accessories='{}';
ALTER TABLE snorks ALTER COLUMN accessories SET NOT NULL;

With that out of the way, it’s not too hard to imagine how we could start writing UPDATE statements to populate that new accessories column. We’ll make use of PostgreSQL’s array concatenation operator ||:

1
2
3
WITH accessory AS (...)
UPDATE snorks SET accessories = accessories || ARRAY['Snorkel']
FROM accessory WHERE _accessories & accesory.snorkel > 0;

That statement will append 'Snorkel' to the accessories text array for every row in the snorks table with the corresponding bit in its _accessories bitmask. Nice!

Except it’s kind of lamesauce to do a whole new UPDATE statement for each possible value of our enumeration. Especially if we declare the accessory CTE for each one. In fact it kind of defeats the purpose of the CTE in the first place (you’d only be using one value from it per UPDATE). Instead, let’s try doing all the UPDATE statements in one go. We can make use of the fact that concatenating an array with null returns the original array. PostgreSQL won’t let you make multiple assignments to the same column in one statement, but you can chain together as many concatenations as you want:

1
2
3
4
5
6
7
8
9
WITH accessory AS (...)
UPDATE snorks SET accessories = accessories
|| CASE WHEN _accessories & accessory.snorkel > 0 THEN ARRAY['Snorkel'] ELSE null END
|| CASE WHEN _accessories & accessory.belt > 0 THEN ARRAY['Belt'] ELSE null END
|| CASE WHEN _accessories & accessory.shirt > 0 THEN ARRAY['Shirt'] ELSE null END
|| CASE WHEN _accessories & accessory.pants > 0 THEN ARRAY['Pants'] ELSE null END
|| CASE WHEN _accessories & accessory.hair > 0 THEN ARRAY['Hair'] ELSE null END
|| CASE WHEN _accessories & accessory.bows > 0 THEN ARRAY['Bows'] ELSE null END
FROM accessory;

Once that statement runs, your new accessories array will now contain values that correspond exactly to the old bitmask. Now you can save your work, commit it all, and run it on your production database. Bam. Done. Oh except for one minor, pesky, basically insignificant thing you might still want to do…

Updating Your Code

Since the accessories column in your table has changed from an int to a text[], your code to populate your Snork entities from the database will need to be updated. Before, it was enough to cast the int value into your SnorkAccessories enum and use HasFlag on the property when you needed to test for a given accessory. Presumably, you want to retain the HasFlag capability, as opposed to changing your Snork.Accessories property into a SnorkAccessories[] array.

The following code describes how that could be accomplished. It assumes that the ORM you’re using reads the text[] column into a string array:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Snork mySnork = null;

if(reader.Read())
{
mySnork = new Snork
{
Id = (Guid) reader.Current["Id"],
Name = (string) reader.Current["Name"],
Color = (SnorkColor) Enum.GetValue(typeof(SnorkColor), (string) reader.Current["Color"]),
Accessories = (SnorkAccessories) 0 // initialize as empty
};
var accessories = (string[]) reader.Current["Accessories"];
foreach(var accessory in accessories)
{
mySnork.Accessories |= (SnorkAccessories) Enum.GetValue(typeof(SnorkAccessories), accessory);
}
}

// test if mySnork has a belt
return mySnork != null
&& mySnork.Accessories.HasFlag(SnorkAccessories.Belt);

If you wanted to get fancy, you could replace the foreach loop with a Linq aggregation:

1
2
3
4
5
6
7
mySnork = new Snork
{
...
Accessories = ((string[]) reader.Current["Accessories"])
.Aggregate((SnorkAccessories) 0, (agg, curr) =>
agg | (SnorkAccessories) Enum.GetValue(typeof(SnorkAccessories), curr))
};

The principle behind how you’d need to modify your code to write your entity back to the database would be similar, which I’ll leave as an exercise up to you. Grading will not be on a curve, and I expect your completed assignments to be handed in before midnight on Friday.

Also, I am aware that there is an apparent “bug” in my example code, in which a comment indicates my intent to check my Snork for a belt, but instead I’m checking for a hat, which doesn’t even exist in the SnorkAccessories enumeration. If you caught that, you get bonus points. Originally my examples all used Smurfs instead of Snorks, but it occurred to me that the Smurfs have been ruined forever by contemporary movie re-imaginings, so I’m switching to a different franchise from my childhood that is less likely to make a similar horrifying re-emergence into the mainstream.

Short Permalink for Attribution: rdsm.ca/xtn04

Comments

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK