7

Converting JSON Keys to Snake Case in PostgreSQL

 3 years ago
source link: https://rudism.com/converting-json-keys-to-snake-case-in-postgresql/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Converting JSON Keys to Snake Case in PostgreSQL

2019-11-05 (posted in blog)

Well, it’s that time of the decade again when I dust off the old technical blog and write about a recent random and esoteric task that I completed for my job.

This time the problem I had involved transforming the keys of a jsonb object stored in PostgresQL from camelCase to snake_case. The reason why isn’t super important to the details, but imagine a couple tables like so:

1
2
3
4
5
6
7
+---------------------+   +-----------------------+
| my_table | | my_table_patches |
+---------------------+ +-----------------------+
| pk_id int | | pk_id int |
| some_value text | | fk_my_table_id int |
| another_value int | | my_table_patch jsonb |
+---------------------+ +-----------------------+

This schema uses the PostgreSQL-recommended case-insensitive snake_case naming conventions. Now imagine my_table has a row with pk_id = 1, and my_table_patches has a row referencing that with something like this:

1
2
fk_my_table_id = 1
my_table_patch = {"someValue": "new patched value"}

The idea is that something would happen in the system that would “resolve” the row in my_table_patches, and a trigger would fire that should update the row in my_table with the new some_value value specified in the JSON patch under the someValue key.

You might be asking yourself a couple of very reasonable questions right about now. Firstly, why store the patch as JSON instead of discreet columns in my_table_patches? Because of reasons, that’s why. Actually in the model I’m working with, the “patch” table can reference any of several other tables and the patch data can reference columns in any of those, so a one-to-one column mapping for updated data would introduce a lot of ungainly overhead into the model.

The next very reasonable question you might be asking is why is the patch stored as camelCase anyway? Why not just store the patches with snake_case keys to match the schema and vastly simplify the dynamic SQL needed to update my_table? Well, the answer to that involves a Postgraphile API that automatically translates snake_cased PostgresQL schemas into camelCased Graphile models, and a front-end that writes subsets of those Graphile models back into the patch table directly.

If we just neededed to apply the patch in a single mutation (that’s the GraphAPI term for an API endpoint that modifies data, I think), it wouldn’t be a problem because we could write that custom mutation as a Postgraphile extension and let it handle the conversions between camelCase and snake_case for us the same way it does with everything else. In this case, however, it made much more sense to put the patch logic into an SQL trigger, meaning it had to be written in pure SQL (or PL/pgSQL) without the benefit of Postgraphile’s serialization magic.

And so began my odyssey into the wonderful world of PostgreSQL’s json manipulation and introspection capabilities. As a starting point, here was (roughly) what I threw down as an initial pass for the proof of concept:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
create or replace function json_keys_to_snake_case(
p_json jsonb
) returns jsonb
as $$
declare
t_json text;
t_match text;
t_snake text;
begin
t_json := p_json::text;
for t_match in (select regexp_matches(t_json, '("([a-zA-Z]+)":)', 'g')) loop
t_snake := lower(regexp_replace(t_match[2], '([A-Z])', '_\1'));
t_json := replace(t_json, t_match[1], '"' || t_snake || '":');
end loop;
return t_json::jsonb;
end;
$$ language plpgsql stable;

This function converts the entire JSON value to a string, then uses a regular expression to find everything in that string that looks like a JSON property name (a quote followed by some string of letters followed by another quote and a colon). For each of those property-looking things it extracts the name from between the quotes, runs another regular expression that inserts an underscore before any capital letters in the name, and then swaps the lowercase version of that in for the old property before converting the whole thing back into a JSON object.

This actually seems to work, but is horrifying and inelegant for a number of reasons, the primary one being that there’s almost certainly a way to craft a valid JSON string that, having passed through this function, would result in at best an invalid JSON string or at worst something that would result in malicious SQL injection issues down the road when we try to apply this payload as a patch.

The latest versions of PostgreSQL actually have a number of useful JSON functions, so when it came time to revisit this procedure I wanted to replace it with something more robust that didn’t rely on regular expressions to extract keys, and didn’t rely on converting the entire payload to text and back to JSON. After many attempts and many visits back to the PostgreSQL JSON documentation, this is what I ended up with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
create or replace function json_keys_to_snake_case(
p_data jsonb
) returns jsonb
as $$
declare
t_dtype text;
t_key text;
t_val jsonb;
t_type text;
t_newkey text;
t_newval jsonb;
begin
t_dtype := jsonb_typeof(p_data);

-- if it's just a value, return it
if t_dtype not in ('object', 'array') then
return p_data;
-- if it's an array, run this function on each element
elsif t_dtype = 'array' then
select json_agg(json_keys_to_snake_case(value)) into t_newval from jsonb_array_elements(p_data);
return t_newval;
end if;

-- otherwise it's an object, so loop through every key in the json
for t_key, t_val in (select key, value from jsonb_each(p_data)) loop
-- convert the key from camelCase to snake_case
t_newkey := lower(regexp_replace(t_key, '([A-Z])', '_\1'));

-- recursively run this function on the value
t_newval := json_keys_to_snake_case(t_val);

-- replace the old camelCase property with the new snake_caseified one in the data
p_data := p_data - t_key;
p_data := p_data || jsonb_build_object(t_newkey, t_newval);
end loop;
return p_data;
end;
$$ language plpgsql stable;

The comments inline should do a fairly good job of explaining how it works. This new function uses jsonb_each instead of a sketchy regular expression to extract the keys and values. It still uses a regular expression to do the actual conversion from camelCase to snake_case, which could probably be improved upon, but at least we know it’s running on an actual property name here and not something that just looks like a property name based on another regular expression. The function also checks every value in the JSON data using jsonb_typeof and makes recursive calls to itself so that property names in child objects and object arrays will also get converted to snake_case. At no point is any data converted between text and JSON.

My main takeaway from this exercise was discovering just how overinflated my sense of accomplishment can get, even for achieving relatively simple and mundane tasks, when the language I’m working with is an obtuse pseudo-language like PL/pgSQL. I mean I felt really, really good about this one. Like I remember feeling back in college when, usually late at night in some computer lab, I would come up with what I thought was a really clever way to solve some homework assignment. Of course, in hindsight, the solution was not so much “clever” but more “correct” and “expected.”

Still, it is nice to occasionally feel that way again, even now decades later. So thank you for that, PL/pgSQL.

Short Permalink for Attribution: rdsm.ca/nt53g

Comments

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK