9

Inserting JSON Data with Default Values in PostgreSQL

 3 years ago
source link: https://rudism.com/inserting-json-data-with-default-values-in-postgresql/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Inserting JSON Data with Default Values in PostgreSQL

2020-04-02 (posted in blog)

So you have a table, and you have some JSON data you want to insert into that table, but the JSON either contains null or no definition at all for some columns in the table that have default values defined. This is the first thing you try:

1
2
3
4
5
6
7
8
9
10
create table my_table (
value1 text not null,
value2 integer not null default 100,
value3 text[] not null default '{}'::text[],
created_date timestamptz not null default now()
);

insert into my_table
select (json_populate_record(null::my_table,
'{"value1":"my text"}'::json)).*;

In theory, value1 is the only column you should need to specify a value for, since it’s the only non-nullable column without a default specified. However, executing the above query results in the following error:

1
2
ERROR:  null value in column "value2" violates not-null constraint
DETAIL: Failing row contains (my text, null, null, null).

Even though the JSON only has value1 specified, calling json_populate_record throws null into all the other columns causing the insert to fail because value2 can’t be null. One way to fix that is to select the specific columns from the JSON that you know to be populated, like this:

1
2
3
4
5
6
with my_json as (
select (json_populate_record(null::my_table,
'{"value1":"my text"}'::json)).*
)
insert into my_table (value1)
select value1 from my_json;

That works, and the default values are populated to the unspecified columns, but this approach may not be flexible enough. What if you don’t know which properties of the JSON will be populated ahead of time? For example, maybe it has value2 specified, in which case that’s what you want to insert, but maybe it doesn’t in which case you want to insert the default.

You can’t just select value2 as part of the insert, because that will try to insert null and fail as in the first example. One way to do this would be to use coalesce in the select:

1
2
3
4
5
6
7
8
9
10
with my_json as (
select (json_populate_record(null::my_table,
'{"value1":"my text"}'::json)).*
)
insert into my_table (value1, value2, value3)
select
value1,
coalesce(value2, 100),
coalesce(value3, '{}'::text[])
from my_json;

But that’s lame because it doesn’t actually use the default values–you have to manually specify them here in whatever function you’re writing. Now you have to maintain the default values for that table in multiple different places. Gross.

Now you decide to get clever. Your PostgreSQL knowledge includes the fact that you can determine what the default value for a column is by querying the information_schema.columns table. From this you can build a JSON object containing all of those default values and then merge the data you’re trying to insert on top of that before inserting:

1
2
3
4
select json_object_agg(column_name, column_default)
from information_schema.columns where
table_schema='public' and
table_name='my_table';

Results in:

1
2
3
4
5
6
{
"value1": null,
"value2": "100",
"value3": "'{}'::text[]",
"created_date": "now()"
}

Let’s get rid of those null values so we just have the columns that actually have defaults specified:

1
2
3
4
5
select json_strip_nulls(
json_object_agg(column_name, column_default))
from information_schema.columns where
table_schema='public' and
table_name='my_table';

This gives us:

1
2
3
4
5
{
"value2": "100",
"value3": "'{}'::text[]",
"created_date": "now()"
}

You may have already guessed that there could be a problem with this, but let’s give it a try. We have to switch over to using jsonb instead of json so that we can use the || operator to concatenate our values over top of the defaults:

1
2
3
4
5
6
7
8
9
10
11
12
with dflt as (
select jsonb_strip_nulls(
jsonb_object_agg(column_name, column_default)) as vals
from information_schema.columns where
table_schema='public' and
table_name='my_table'
)
insert into my_table
select (jsonb_populate_record(null::my_table,
dflt.vals || jsonb_strip_nulls('{"value1":"my text"}'::jsonb)
)).*
from dflt;

You’ll note that I’m passing my object through jsonb_strip_nulls before concatenating to the defaults; that’s so any properties that are specified with a null value in the object I’m inserting won’t overwrite the defaults. Running the above results in the following error:

1
2
ERROR:  malformed array literal: "'{}'::text[]"
DETAIL: Array value must start with "{" or dimension information.

So close! The problem seems to be that the default empty array value is being evaluated as a string instead of an array. This is the only remaining problem. In fact, if you were to drop the value3 column and retry the above query, it would work! I wasn’t expecting it to, but apparently if you cast the literal string 'now()' to a timestamp it actually works as you might expect, which is why the created_date column doesn’t cause the same problem. Apparently you can cast other non-date strings to timestamps in PostgreSQL too:

1
2
3
4
# select 'yesterday'::timestamptz;
timestamptz
------------------------
2020-04-01 00:00:00+00

Weird! I didn’t know that before.

If you don’t have any default values specified that can’t be directly cast from strings (like the text[] array in my example), then the above trick is probably good enough. Build your jsonb object of default values, and then concatenate the object you want to insert on top of it.

If you do have default values that can’t be cast from strings, here is a little stored procedure that runs through all the columns in a table and builds the defaults object by actually evaluating them:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
create function column_defaults_jsonb(
p_schema text,
p_table text
) returns jsonb
as $$
declare
t_defaults jsonb;
t_value jsonb;
t_column text;
t_default text;
begin
t_defaults := '{}'::jsonb;

for t_column, t_default in
select column_name, column_default
from information_schema.columns
where
table_schema=p_schema and
table_name=p_table
loop
if t_default is not null then
execute 'select to_jsonb(' || t_default || ')' into t_value;
t_defaults := jsonb_set(t_defaults, array[t_column], t_value);
end if;
end loop;

return t_defaults;
end;
$$ language plpgsql;

By running the above procedure against our table, you can see that the defaults object comes back populated with the values already evaluated:

1
select column_defaults_jsonb('public', 'my_table');

The integer becomes an actual number, the empty array becomes an actual empty JSON array, and today’s date is already populated:

1
2
3
4
5
{
"value2": 100,
"value3": [],
"created_date": "2020-04-02T22:40:13.803923+00:00"
}

Now you can use that function in your insert like so:

1
2
3
4
insert into my_table
select (jsonb_populate_record(null::my_table,
column_defaults_jsonb('public', 'my_table')
|| '{"value1":"my text"}'::jsonb)).*;

And that’s how you do it. It even seems to work for tables with auto-incrementing primary keys, because evaluating the default for that actually generates the new value and increments the sequence. Behold:

1
2
3
4
5
6
7
8
9
10
11
12
create table my_table2 (
id serial primary key,
value1 text not null,
value2 integer not null default 100,
value3 text[] not null default '{}'::text[],
created_date timestamptz not null default now()
);

insert into my_table2
select (jsonb_populate_record(null::my_table2,
column_defaults_jsonb('public', 'my_table2')
|| '{"value1":"my text"}'::jsonb)).*;

However, if you run that insert on the new table a few times, you’ll see something weird with the data:

1
2
3
4
5
6
# select * from my_table2;
id | value1 | value2 | value3 | created_date
----+---------+--------+--------+-------------------------------
1 | my text | 100 | {} | 2020-04-02 22:51:59.407879+00
6 | my text | 100 | {} | 2020-04-02 22:52:05.849629+00
11 | my text | 100 | {} | 2020-04-02 22:52:20.624157+00

The id column is incrementing by 5 each time, which is conspicuously the exact same number of columns that are in the table. It would appear as though the column_defaults_jsonb procedure is evaluating the new id multiple times with each call. I’m not sure why that is. Maybe you can figure it out.

Short Permalink for Attribution: rdsm.ca/4ewjy

Comments

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK