16

Avinash Kumar: Using plpgsql_check to Find Compilation Errors and Profile Functi...

 4 years ago
source link: https://www.tuicool.com/articles/IrQby2r
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

eqUR736.png!web There is always a need for profiling tools in databases for admins or developers. While it is easy to understand the point where an SQL is spending more time using EXPLAIN or EXPLAIN ANALYZE in PostgreSQL, the same would not work for functions. Recently, Jobin has published a blog post where he demonstrated howplprofiler can be useful in profiling functions. plprofiler builds call graphs and creates flame graphs which make the report very easy to understand. Similarly, there is another interesting project called plpgsql_check which can be used for a similar purpose as plprofiler , while it also looks at code and points out compilation errors. Let us see all of that in action, in this blog post.

Installing plpgsql-check

You could use yum on RedHat/CentOS to install this extension from PGDG repository. Steps to perform source installation on Ubuntu/Debian are also mentioned in the following logs.

On RedHat/CentOS

$ sudo yum install plpgsql_check_11

On Ubuntu/Debian

$ sudo apt-get install postgresql-server-dev-11 libicu-dev gcc make 
$ git clone https://github.com/okbob/plpgsql_check.git 
$ cd plpgsql_check/ 
$ make && make install

Creating and enabling this extension

There are 3 advantages of using plpgsql_check

  1. Checking for compilation errors in a function code
  2. Finding dependencies in functions
  3. Profiling functions

When using plpgsql_check for the first 2 requirements, you may not need to add any entry to shared_preload_libraries . However, if you need to use it for profiling functions (3), then you should add appropriate entries to shared_preload_libraries so that it could load both plpgsql and plpgsql_check . Due to dependencies, plpgsql must be before plpgsql_check in the shared_preload_libraries config as you see in the following example :

shared_preload_libraries = plpgsql, plpgsql_check

Any change to shared_preload_libraries requires a restart. You may see the following error when you do not have plpgsql before plpgsql_check in the shared_preload_libraries config.

$ grep "shared_preload_libraries" $PGDATA/postgresql.auto.conf
shared_preload_libraries = 'pg_qualstats, pg_stat_statements, plpgsql_check'
 
$ pg_ctl -D /var/lib/pgsql/11/data restart -mf
waiting for server to shut down.... done
server stopped
waiting for server to start....2019-07-07 02:07:10.104 EDT [6423] 
FATAL: could not load library "/usr/pgsql-11/lib/plpgsql_check.so": /usr/pgsql-11/lib/plpgsql_check.so: undefined symbol: plpgsql_parser_setup
2019-07-07 02:07:10.104 EDT [6423] LOG: database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.
 
$ grep "shared_preload_libraries" $PGDATA/postgresql.auto.conf
shared_preload_libraries = 'pg_qualstats, pg_stat_statements, plpgsql, plpgsql_check'
 
$ pg_ctl -D /var/lib/pgsql/11/data start
.....
server started

Once the PostgreSQL instance is started, create this extension in the database where you need to perform any of the previously discussed 3 tasks.

$ psql -d percona -c "CREATE EXTENSION plpgsql_check"
CREATE EXTENSION

Finding Compilation Errors

As discussed earlier, this extension can help developers and admins determine compilation errors. But why is it needed? Let’s consider the following example where we see no errors while creating the function. By the way, I have taken this example from myprevious blog post where I was talking about Automatic Index recommendations using hypopg and pg_qualstats . You might want to read that blog post to understand the logic of the following function.

percona=# CREATE OR REPLACE FUNCTION find_usable_indexes()
percona-# RETURNS VOID AS
percona-# $$
percona$# DECLARE
percona$#     l_queries     record;
percona$#     l_querytext     text;
percona$#     l_idx_def       text;
percona$#     l_bef_exp       text;
percona$#     l_after_exp     text;
percona$#     hypo_idx      record;
percona$#     l_attr        record;
percona$#     /* l_err       int; */
percona$# BEGIN
percona$#     CREATE TABLE IF NOT EXISTS public.idx_recommendations (queryid bigint, 
percona$#     query text, current_plan jsonb, recmnded_index text, hypo_plan jsonb);
percona$#     FOR l_queries IN
percona$#     SELECT t.relid, t.relname, t.queryid, t.attnames, t.attnums, 
percona$#     pg_qualstats_example_query(t.queryid) as query
percona$#       FROM 
percona$#         ( 
percona$#          SELECT qs.relid::regclass AS relname, qs.relid AS relid, qs.queryid, 
percona$#          string_agg(DISTINCT attnames.attnames,',') AS attnames, qs.attnums
percona$#          FROM pg_qualstats_all qs
percona$#          JOIN pg_qualstats q ON q.queryid = qs.queryid
percona$#          JOIN pg_stat_statements ps ON q.queryid = ps.queryid
percona$#          JOIN pg_amop amop ON amop.amopopr = qs.opno
percona$#          JOIN pg_am ON amop.amopmethod = pg_am.oid,
percona$#          LATERAL 
percona$#               ( 
percona$#                SELECT pg_attribute.attname AS attnames
percona$#                FROM pg_attribute
percona$#                JOIN unnest(qs.attnums) a(a) ON a.a = pg_attribute.attnum 
percona$#                AND pg_attribute.attrelid = qs.relid
percona$#                ORDER BY pg_attribute.attnum) attnames,     
percona$#          LATERAL unnest(qs.attnums) attnum(attnum)
percona$#                WHERE NOT 
percona$#                (
percona$#                 EXISTS 
percona$#                       ( 
percona$#                        SELECT 1
percona$#                        FROM pg_index i
percona$#                        WHERE i.indrelid = qs.relid AND 
percona$#                        (arraycontains((i.indkey::integer[])[0:array_length(qs.attnums, 1) - 1], 
percona$#                         qs.attnums::integer[]) OR arraycontains(qs.attnums::integer[], 
percona$#                         (i.indkey::integer[])[0:array_length(i.indkey, 1) + 1]) AND i.indisunique)))
percona$#                        GROUP BY qs.relid, qs.queryid, qs.qualnodeid, qs.attnums) t
percona$#                        GROUP BY t.relid, t.relname, t.queryid, t.attnames, t.attnums                   
percona$#     LOOP
percona$#         /* RAISE NOTICE '% : is queryid',l_queries.queryid; */
percona$#         execute 'explain (FORMAT JSON) '||l_queries.query INTO l_bef_exp;
percona$#         execute 'select hypopg_reset()';
percona$#         execute 'SELECT indexrelid,indexname FROM hypopg_create_index(''CREATE INDEX on '||l_queries.relname||'('||l_queries.attnames||')'')' INTO hypo_idx;      
percona$#         execute 'explain (FORMAT JSON) '||l_queries.query INTO l_after_exp;
percona$#         execute 'select hypopg_get_indexdef('||hypo_idx.indexrelid||')' INTO l_idx_def;
percona$#         INSERT INTO public.idx_recommendations (queryid,query,current_plan,recmnded_index,hypo_plan) 
percona$#         VALUES (l_queries.queryid,l_querytext,l_bef_exp::jsonb,l_idx_def,l_after_exp::jsonb);        
percona$#     END LOOP;    
percona$#         execute 'select hypopg_reset()';
percona$# END;
percona$# $$ LANGUAGE plpgsql;
CREATE FUNCTION

From the above log, it has created the function with no errors. Unless we call the above function, we do not know if it has any compilation errors. Surprisingly, with this extension, we can use the plpgsql_check_function_tb ( ) function to learn whether there are any errors, without actually calling it.

percona=# SELECT functionid, lineno, statement, sqlstate, message, detail, hint, level, position, 
left (query,60),context FROM plpgsql_check_function_tb('find_usable_indexes()');
-[ RECORD 1 ]------------------------------------------------------------
functionid | find_usable_indexes
lineno     | 14
statement  | FOR over SELECT rows
sqlstate   | 42P01
message    | relation "pg_qualstats_all" does not exist
detail     | 
hint       | 
level      | error
position   | 306
left       | SELECT t.relid, t.relname, t.queryid, t.attnames, t.attnums,
context    |

From the above log, it is clear that there is an error where a relation used in the function does not exist. But, if we are using dynamic SQLs that are assembled in runtime, false positives are possible, as you can see in the following example. So, you may avoid the functions using dynamic SQL’s or comment the code containing those SQLs before checking for other compilation errors.

percona=# select * from plpgsql_check_function_tb('find_usable_indexes()');
-[ RECORD 1 ]------------------------------------------------------------------------------
functionid | find_usable_indexes
lineno     | 50
statement  | EXECUTE
sqlstate   | 00000
message    | cannot determinate a result of dynamic SQL
detail     | There is a risk of related false alarms.
hint       | Don't use dynamic SQL and record type together, when you would check function.
level      | warning
position   | 
query      | 
context    | 
-[ RECORD 2 ]------------------------------------------------------------------------------
functionid | find_usable_indexes
lineno     | 52
statement  | EXECUTE
sqlstate   | 55000
message    | record "hypo_idx" is not assigned yet
detail     | The tuple structure of a not-yet-assigned record is indeterminate.
hint       | 
level      | error
position   | 
query      | 
context    | SQL statement "SELECT 'select hypopg_get_indexdef('||hypo_idx.indexrelid||')'"

Finding Dependencies

This extension can be used to find dependent objects in a function. This way, it becomes easy for administrators to understand the objects being used in a function without spending a huge amount of time reading the code. The trick is to simply pass your function as a parameter to plpgsql_show_dependency_tb ( ) as you see in the following example.

percona=# select * from plpgsql_show_dependency_tb('find_usable_indexes()');
   type   |  oid  |   schema   |            name            |  params   
----------+-------+------------+----------------------------+-----------
 FUNCTION | 16547 | public     | pg_qualstats               | ()
 FUNCTION | 16545 | public     | pg_qualstats_example_query | (bigint)
 FUNCTION | 16588 | public     | pg_stat_statements         | (boolean)
 RELATION |  2601 | pg_catalog | pg_am                      | 
 RELATION |  2602 | pg_catalog | pg_amop                    | 
 RELATION |  1249 | pg_catalog | pg_attribute               | 
 RELATION |  1262 | pg_catalog | pg_database                | 
 RELATION |  2610 | pg_catalog | pg_index                   | 
 RELATION | 16480 | public     | idx_recommendations        | 
 RELATION | 16549 | public     | pg_qualstats               | 
 RELATION | 16559 | public     | pg_qualstats_all           | 
 RELATION | 16589 | public     | pg_stat_statements         | 
(12 rows)

Profiling Functions

This is one of the very interesting features. Once you have added the appropriate entries to shared_preload_libraries as discussed earlier, you could easily enable or disable profiling through a GUC: plpgsql_check .profiler . This parameter can either be set globally or for only your session. Here’s an example to understand it better.

percona=# ALTER SYSTEM SET plpgsql_check.profiler TO 'ON';
ALTER SYSTEM
percona=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

When you set it globally, all the functions running in the database are automatically profiled and stored. This may be fine in a development or a testing environment, but you could also enable profiling of functions called in a single session through a session-level setting as you see in the following example.

percona=# BEGIN;
BEGIN
percona=# SET plpgsql_check.profiler TO 'ON';
SET
percona=# select salary_update(1000);
 salary_update 
---------------
 
(1 row)
 
percona=# select lineno, avg_time, source from plpgsql_profiler_function_tb('salary_update(int)');
 lineno | avg_time |                                                    source                                                     
--------+----------+---------------------------------------------------------------------------------------------------------------
      1 |          | 
      2 |          | DECLARE 
      3 |          | l_abc record;
      4 |          | new_sal     INT;
      5 |    0.005 | BEGIN
      6 |   144.96 |   FOR l_abc IN EXECUTE 'SELECT emp_id, salary FROM employee where emp_id between 1 and 10000 and dept_id = 2'
      7 |          |   LOOP
      8 |      NaN |       new_sal := l_abc.salary + sal_raise;
      9 |      NaN |       UPDATE employee SET salary = new_sal WHERE emp_id = l_abc.emp_id;
     10 |          |   END LOOP;
     11 |          | END;
(11 rows)
 
--- Create an Index and check if it improves the execution time of FOR loop.
 
percona=# CREATE INDEX idx_1 ON employee (emp_id, dept_id);
CREATE INDEX
percona=# select salary_update(1000);
 salary_update 
---------------
 
(1 row)
 
percona=# select lineno, avg_time, source from plpgsql_profiler_function_tb('salary_update(int)');
 lineno | avg_time |                                                    source                                                     
--------+----------+---------------------------------------------------------------------------------------------------------------
      1 |          | 
      2 |          | DECLARE 
      3 |          | l_abc record;
      4 |          | new_sal     INT;
      5 |    0.007 | BEGIN
      6 |   73.074 |   FOR l_abc IN EXECUTE 'SELECT emp_id, salary FROM employee where emp_id between 1 and 10000 and dept_id = 2'
      7 |          |   LOOP
      8 |      NaN |       new_sal := l_abc.salary + sal_raise;
      9 |      NaN |       UPDATE employee SET salary = new_sal WHERE emp_id = l_abc.emp_id;
     10 |          |   END LOOP;
     11 |          | END;
(11 rows)
 
percona=# END;
COMMIT
percona=# show plpgsql_check.profiler;
 plpgsql_check.profiler 
------------------------
 on
(1 row)

In the above log, I have opened a new transaction block and enabled the parameter plpgsql_check .profiler only for that block. So any function that I have executed in that transaction is profiled, which can be extracted using plpgsql_profiler_function_tb ( ) . Once we have identified the area where the execution time is high, the immediate action is to tune that piece of code. After creating the index, I have called the function again. It has now performed faster than earlier.

Conclusion

Special thanks to Pavel Stehule who has authored this extension and also to the contributors who have put this extension into a usable stage today. This is one of the simplest extensions that can be used to check for compilation errors and dependencies. While this can also be a handy profiling tool, a developer may find both plprofiler or plpgsql_check helpful for profiling as well.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK