Converting Bitmasks to Text Arrays in PostgreSQL
source link: https://rudism.com/converting-bitmasks-to-text-arrays-in-postgresql/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Converting Bitmasks to Text Arrays in PostgreSQL
2017-04-04 (posted in blog)The Problem
You have some data encoded into a bitmask in an int
column, and you’d like to decode it into discreet values in a text[]
column instead.
For example, let’s say you’re working with the following enumeration in C#:
1
2
3
4
5
6
7
8
9
10
[Flags]
public enum SnorkAccessories
{
Snorkel = 0x1,
Belt = 0x2,
Shirt = 0x4,
Pants = 0x8,
Hair = 0x10,
Bows = 0x20
}
…and you have a Snork
entity that looks something like this:
1
2
3
4
5
6
7
public class Snork
{
public Guid Id { get; set; }
public string Name { get; set; }
public SnorkColor Color { get; set; }
public SnorkAccessories Accessories { get; set; }
}
Until now, the Accessories
column of your Snorks
table in the database has been an integer, and you’ve populated and used your Snork
entities with code along these lines:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Snork mySnork = null;
if(reader.Read())
{
mySnork = new Snork
{
Id = (Guid) reader.Current["Id"],
Name = (string) reader.Current["Name"],
Color = (SnorkColor) Enum.GetValue(typeof(SnorkColor), (string) reader.Current["Color"]),
Accessories = (SnorkAccessories) ((int) reader.Current["Accessories"])
};
}
// test if mySnork has a belt
return mySnork != null
&& mySnork.Accessories.HasFlag(SnorkAccessories.Hat);
You’ve since decided that storing your SnorkAccessories
values as bitmasks was a bad idea—it’s hard to write ad-hoc queries against, and the data can’t easily be shared with other systems that don’t have knowledge of the C# enumeration and its values. So you’ve decided to migrate that data into a text[]
column instead. Previously, a Snork
with Shirt
and Pants
would have an Accessories
value in the database of 0x4 | 0x8 = 0xC
. In the new world, your Snork
accessories would be stored in a PostgreSQL array as {"Shirt", "Pants"}
.
The problem is, how do you translate your old int
values into text[]
values as painlessly as possible, ideally without having to write external processors to manipulate the data?
Exploring the Bitmask
The first problem is how to go about parsing the bitmask in PostgreSQL. Your first impulse might be to try something like this:
1
2
-- get all snorks with hair
SELECT * FROM snorks WHERE accessories & 0x10 > 0; -- DOESN'T WORK
You’ll note right away that PostgreSQL doesn’t seem to understand the 0xN
hexadecimal notation. The way you actually do that in PostgreSQL is like this:
1
2
-- get all snorks with hair
SELECT * FROM snorks WHERE accessories & x'10'::int > 0; -- WORKS
That’s all fine and dandy, but when we’re writing our script it will be hard to keep track of Snorkel = x'1'::int
, Belt = x'2'::int
, and so on. It would be nicer if there was a way to define named variables or something. One way to do this is to declare variables with the \set
command, but that will only work if you’re processing your script with the psql
tool—it won’t work in a query run directly against the server. Another handy trick is to use a Common Table Expression to declare your values and cross join that with the rest of your query. Take this example:
1
2
3
4
5
6
7
8
9
WITH accessory AS (
SELECT
x'1'::int AS snorkel,
x'2'::int AS belt,
x'4'::int AS shirt,
...
)
SELECT * FROM accessory CROSS JOIN snorks
WHERE snork.accessories & accessory.shirt > 0;
It’s unnecessary, to be sure, but much easier to read and understand.
Creating The Text Array
Now that we have a way to parse the existing integer values, how do we go about converting it into a text array? The first step, I suppose, would be to create the new column. Let’s rename the old one as well:
1
2
ALTER TABLE snorks RENAME COLUMN accessories TO _accessories;
ALTER TABLE snorks ADD COLUMN accessories text[];
We’ve now got our old integer value available to us in the _accessories
column, and a brand new accessories
column ready to store our text array version of the same data. For sanity’s sake, let’s initialize the new column and add a not null
constraint, since we don’t want to have to deal with null checks everywhere in our code when reading and writing Snork
entities in the database:
1
2
UPDATE snorks SET accessories='{}';
ALTER TABLE snorks ALTER COLUMN accessories SET NOT NULL;
With that out of the way, it’s not too hard to imagine how we could start writing UPDATE
statements to populate that new accessories
column. We’ll make use of PostgreSQL’s array concatenation operator ||
:
1
2
3
WITH accessory AS (...)
UPDATE snorks SET accessories = accessories || ARRAY['Snorkel']
FROM accessory WHERE _accessories & accesory.snorkel > 0;
That statement will append 'Snorkel'
to the accessories
text array for every row in the snorks
table with the corresponding bit in its _accessories
bitmask. Nice!
Except it’s kind of lamesauce to do a whole new UPDATE
statement for each possible value of our enumeration. Especially if we declare the accessory
CTE for each one. In fact it kind of defeats the purpose of the CTE in the first place (you’d only be using one value from it per UPDATE
). Instead, let’s try doing all the UPDATE
statements in one go. We can make use of the fact that concatenating an array with null
returns the original array. PostgreSQL won’t let you make multiple assignments to the same column in one statement, but you can chain together as many concatenations as you want:
1
2
3
4
5
6
7
8
9
WITH accessory AS (...)
UPDATE snorks SET accessories = accessories
|| CASE WHEN _accessories & accessory.snorkel > 0 THEN ARRAY['Snorkel'] ELSE null END
|| CASE WHEN _accessories & accessory.belt > 0 THEN ARRAY['Belt'] ELSE null END
|| CASE WHEN _accessories & accessory.shirt > 0 THEN ARRAY['Shirt'] ELSE null END
|| CASE WHEN _accessories & accessory.pants > 0 THEN ARRAY['Pants'] ELSE null END
|| CASE WHEN _accessories & accessory.hair > 0 THEN ARRAY['Hair'] ELSE null END
|| CASE WHEN _accessories & accessory.bows > 0 THEN ARRAY['Bows'] ELSE null END
FROM accessory;
Once that statement runs, your new accessories
array will now contain values that correspond exactly to the old bitmask. Now you can save your work, commit it all, and run it on your production database. Bam. Done. Oh except for one minor, pesky, basically insignificant thing you might still want to do…
Updating Your Code
Since the accessories
column in your table has changed from an int
to a text[]
, your code to populate your Snork
entities from the database will need to be updated. Before, it was enough to cast the int
value into your SnorkAccessories
enum and use HasFlag
on the property when you needed to test for a given accessory. Presumably, you want to retain the HasFlag
capability, as opposed to changing your Snork.Accessories
property into a SnorkAccessories[]
array.
The following code describes how that could be accomplished. It assumes that the ORM you’re using reads the text[]
column into a string array:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Snork mySnork = null;
if(reader.Read())
{
mySnork = new Snork
{
Id = (Guid) reader.Current["Id"],
Name = (string) reader.Current["Name"],
Color = (SnorkColor) Enum.GetValue(typeof(SnorkColor), (string) reader.Current["Color"]),
Accessories = (SnorkAccessories) 0 // initialize as empty
};
var accessories = (string[]) reader.Current["Accessories"];
foreach(var accessory in accessories)
{
mySnork.Accessories |= (SnorkAccessories) Enum.GetValue(typeof(SnorkAccessories), accessory);
}
}
// test if mySnork has a belt
return mySnork != null
&& mySnork.Accessories.HasFlag(SnorkAccessories.Belt);
If you wanted to get fancy, you could replace the foreach
loop with a Linq aggregation:
1
2
3
4
5
6
7
mySnork = new Snork
{
...
Accessories = ((string[]) reader.Current["Accessories"])
.Aggregate((SnorkAccessories) 0, (agg, curr) =>
agg | (SnorkAccessories) Enum.GetValue(typeof(SnorkAccessories), curr))
};
The principle behind how you’d need to modify your code to write your entity back to the database would be similar, which I’ll leave as an exercise up to you. Grading will not be on a curve, and I expect your completed assignments to be handed in before midnight on Friday.
Also, I am aware that there is an apparent “bug” in my example code, in which a comment indicates my intent to check my Snork for a belt, but instead I’m checking for a hat, which doesn’t even exist in the SnorkAccessories
enumeration. If you caught that, you get bonus points. Originally my examples all used Smurfs instead of Snorks, but it occurred to me that the Smurfs have been ruined forever by contemporary movie re-imaginings, so I’m switching to a different franchise from my childhood that is less likely to make a similar horrifying re-emergence into the mainstream.
Short Permalink for Attribution: rdsm.ca/xtn04
CommentsRecommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK