6

EXPLAIN that parameterized statement in PostgreSQL!

 1 year ago
source link: https://www.cybertec-postgresql.com/en/explain-that-parameterized-statement/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

EXPLAIN that parameterized statement in PostgreSQL!

the database consultant needs a plan to get a clue about the parameterized statement

© Laurenz Albe 2022

For detailed query performance analysis, you need EXPLAIN (ANALYZE, BUFFERS) output for an SQL statement. With a parameterized statement, it can be difficult to construct a run-able statement for EXPLAIN (ANALYZE). Sometimes, you don’t even know the parameter values. I’ll show you how you can get at least plain EXPLAIN output for the generic plan in this case. Such an execution plan is better than nothing, and maybe even enough to guess what the problem is.

Parameterized statements

With the extended query protocol, PostgreSQL allows you to separate the SQL statement and the constants used in the statement. This enhances security, since it makes SQL injection impossible, but it is mostly a performance feature. Such parameterized statements can be named and reused with different parameter values, thereby saving the database engine the work of parsing the same statement over and over. What is more, using this feature PostgreSQL can sometimes avoid the much greater overhead of generating execution plans for each execution of the statement.

You typically encounter parameterized statements in two situations:

  • prepared statements with parameters (typically used through a client API)
  • static SQL statements in PL/pgSQL functions that use variables

The placeholders for the parameters are $1, $2 and so on. In the case of PL/pgSQL, you won’t see them; the PL/pgSQL call handler will substitute these placeholders instead of the variable names. Note that parameterized statements can only be SELECT, INSERT, UPDATE, DELETE and VALUES statements.

Generic plans for parameterized statements

Normally, PostgreSQL generates an execution plan whenever a query is executed. However, PostgreSQL can cache execution plans for named parameterized statements. Plans are only ever cached inside a single database session; there is no shared memory plan cache.

If PostgreSQL thinks it can do so without hurting performance, it will start using a generic plan for the statement. Such an execution plan does not take the parameter values into account and can be reused. The decision whether to switch to a generic plan or not is based on a heuristic and is typically made when the statement executes for the sixth time. You can tell such a plan from the placeholders that it uses instead of the parameters. Here is an example using a prepared statement:

PREPARE stmt(text) AS SELECT oid FROM pg_class WHERE relname = $1;
EXPLAIN (COSTS OFF) EXECUTE stmt('pg_proc');
QUERY PLAN                       
═════════════════════════════════════════════════════════
Index Scan using pg_class_relname_nsp_index on pg_class
Index Cond: (relname = 'pg_proc'::text)
(2 rows)

The next four executions of EXPLAIN look the same, but then we see

EXPLAIN (COSTS OFF) EXECUTE stmt('pg_attribute');
QUERY PLAN                       
═════════════════════════════════════════════════════════
Index Scan using pg_class_relname_nsp_index on pg_class
Index Cond: (relname = $1)
(2 rows)

PostgreSQL has started using a generic plan! From that point on, the query planning time will become much shorter.

Forcing a generic plan for a parameterized statement

You can use the PostgreSQL parameter plan_cache_mode to influence the behavior described in the previous section. The default setting “auto” chooses the heuristic described above, where PostgreSQL decides after a few executions whether a generic plan would be beneficial or not.

With the setting “force_custom_plan” you can tell PostgreSQL to never use a generic plan. That is a good idea if the generic plan turns out to be not quite as good as PostgreSQL thought. It is also a good setting for data warehouses, where you normally run expensive analytical queries and saving on planning time is less important than getting the best possible execution plan.

Finally, the setting “force_generic_plan” makes PostgreSQL use a generic plan immediately. We will make use of that setting later.

Where can you encounter a parameterized statement?

Parameterized statements in the PostgreSQL log

A parameterized statement looks like this in the log:

LOG:  duration: 0.012 ms  execute stmt: SELECT oid FROM pg_class WHERE relname = $1
DETAIL:  parameters: $1 = 'pg_proc'

Usually, the parameters are logged as a detail message, but if there are many of them, it can be a considerable effort to replace all the placeholders with the parameter values. You also don’t see the parameter data type in the log, so you may have to look up the table definition to know whether you should write 42 or '42'. If the statement caused an error and you didn’t set log_parameter_max_length_on_error to a nonzero value, you won’t get the parameters logged at all:

ERROR:  canceling statement due to statement timeout
STATEMENT:  SELECT oid FROM pg_class WHERE relname = $1

Parameterized statements in pg_stat_statements

pg_stat_statements is the Swiss army knife for analyzing database workloads. One of its features is that it ignores the values of constants, so that statements that only differ in constants are aggregated together. So if you query the pg_stat_statements view, you will see placeholders even in statements that were not parameterized originally. Also, since pg_stat_statements collects statistics over many executions of a statement, it doesn’t collect the actual parameter values for any of them.

The need for a generic plan

If you find a problem statement in the log or in pg_stat_statements, you want to analyze its performance. In order to do that, you have to guess appropriate parameter values, so that you can get an execution plan with EXPLAIN (ANALYZE, BUFFERS). This can be tedious and take a long time.

For a first analysis, it is helpful to see the execution plan as generated by EXPLAIN (without ANALYZE). Since “plain” EXPLAIN does not execute the query, it should not depend on the actual parameter values, as long as we are happy with the generic plan. Unfortunately, EXPLAIN refuses to generate a generic plan for a parameterized statement:

EXPLAIN SELECT oid FROM pg_class WHERE relname = $1;
ERROR:  there is no parameter $1
LINE 1: EXPLAIN SELECT oid FROM pg_class WHERE relname = $1;
^

We get that error even if we set plan_cache_mode = force_generic_plan.

Generating a generic plan for a parameterized statement using PREPARE

We want to do better than that, and we can. Using PREPARE, we can create a prepared statement with a placeholder:

PREPARE stmt(name) AS SELECT oid FROM pg_class WHERE relname = $1;

Now we can force a generic plan and EXPLAIN the prepared statement. We can supply NULL as parameter value, since NULL exists for every data type, and the parameter value is ignored anyway:

SET plan_cache_mode = force_generic_plan;
EXPLAIN EXECUTE stmt(NULL);
QUERY PLAN                                        
═══════════════════════════════════════════════════════════════════════════════════════════
Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.28..8.29 rows=1 width=4)
Index Cond: (relname = $1)
(2 rows)
DEALLOCATE stmt;

The only remaining fly in the ointment is that we have to figure out the appropriate data types for the parameters.

Using the pseudo-type unknown with a parameterized statement

Pseudo-types are data types that you cannot use in table definitions. One of these data types is “unknown”: It is used during query parsing for string constants whose data type must be resolved later, based on the context. We can use unknown as a data type for query parameters and let PostgreSQL figure out the appropriate data type itself:

PREPARE stmt(unknown) AS SELECT oid FROM pg_class WHERE relname = $1;
SET plan_cache_mode = force_generic_plan;
EXPLAIN EXECUTE stmt(NULL);
QUERY PLAN                                        
═══════════════════════════════════════════════════════════════════════════════════════════
Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.28..8.29 rows=1 width=4)
Index Cond: (relname = $1)
(2 rows)
DEALLOCATE stmt;

Putting it all together in an extension

Now we have a simple algorithm to get the generic plan of a parameterized statement:

  • count the number of parameters
  • create a prepared statement with that many “unknown” parameters
  • set “plan_cache_mode” to “force_generic_plan
  • EXPLAIN the prepared statements using NULLs as arguments

I wrapped all this into a function and wrote the extension generic_plan. It is written in PL/pgSQL and does not require superuser permissions to install. Here you can see it in action:

CREATE EXTENSION IF NOT EXISTS generic_plan;
SELECT generic_plan('SELECT * FROM pg_sequences WHERE max_value < last_value + $1');
generic_plan                                        
═════════════════════════════════════════════════════════════════════════════════════════════
Subquery Scan on pg_sequences  (cost=1.09..24.10 rows=1 width=245)
Filter: (pg_sequences.max_value < (pg_sequences.last_value + $1))
->  Nested Loop  (cost=1.09..24.09 rows=1 width=245)
Join Filter: (c.oid = s.seqrelid)
->  Seq Scan on pg_sequence s  (cost=0.00..1.06 rows=6 width=49)
->  Materialize  (cost=1.09..22.76 rows=3 width=136)
->  Hash Join  (cost=1.09..22.74 rows=3 width=136)
Hash Cond: (c.relnamespace = n.oid)
->  Seq Scan on pg_class c  (cost=0.00..21.62 rows=8 width=76)
Filter: (relkind = 'S'::"char")
->  Hash  (cost=1.05..1.05 rows=3 width=68)
->  Seq Scan on pg_namespace n  (cost=0.00..1.05 rows=3 width=68)
Filter: (NOT pg_is_other_temp_schema(oid))
(13 rows)

Conclusion

It can be complicated to collect parameter values to analyze the execution of a parameterized statement, but using the generic_plan extension we can at least get a generic plan easily. The tricks used are a prepared statement with parameters of type “unknown”, adjusting plan_cache_mode and using NULLs as parameter values.

In case you’re interested in learning more about parameters, check out my blog on Query Parameter Data Types and Performance.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK