Ibrar Ahmed: Writing PostgreSQL Extensions is Fun – C Language - JOYK Joy of Geek, Geek News, Link all geek

PostgreSQL is a powerful open source relational database management system. It extends the SQL language with additional features. A DBMS is not only defined by its performance and out of the box features, but also its ability to support bespoke/additional user-specific functionality. Some of these functionalities may be in the form of database constructs or modules, like stored procedures or functions, but their scope is generally limited to the functionality being exposed by the DBMS. For instance, how will you write a custom query-analyzing application that resides within your DBMS?

To support such options, PostgreSQL provides a pluggable architecture that allows you to install extensions. Extensions may consist of a configuration (control) file, a combination of SQL files, and dynamically loadable libraries.

This means you can write your own code as per the defined guidelines of an extension and plug it in a PostgreSQL instance without changing the actual PostgreSQL code tree. An extension by very definition extends what PostgreSQL can do, but more than that, it gives you the ability to interact with external entities. These external entities can be other database management systems like ClickHouse, Mongo or HDFs (normally these are called foreign data wrappers), or other interpreters or compilers (thus allowing us to write database functions in another language like Java, Python, Perl or TCL, etc.). Another potential use case of an extension can be for code obfuscation which allows you to protect your super secret code from prying eyes.

Build your own

To build your own extension, you don’t need a complete PostgreSQL code base. You can build and install an extension using installed PostgreSQL (it may require you to install a devel RPM or Debian package). Details about extensions can be found in PostgreSQL’s official documentation[1]. There many extensions available for different features in the contrib directory of PostgreSQL source. Other than the contrib directory, people are also writing extensions readily available on the internet but currently not part of the PostgreSQL source tree. The pg_stat_statements, PL/pgSQL, and PostGIS are examples of the best known or most widely used extensions.

Generally available PostgreSQL extensions may be classified into four main categories:

Add support of a new language extension (PL/pgSQL, PL/Python, and PL/Java)
Data type extensions where you can introduce new ((Hstore, cube, and hstore)
Miscellaneous extensions (contrib folder has many miscellaneous extensions)
Foreign Data Wrapper extension (postgres_fdw, mysqldb_fdw, clickhousedb_fdw)

There are four basic file types that are required for building an extension:

Makefile: Which uses PGXS PostgreSQL’s build infrastructure for extensions.
Control File: Carries information about the extension.
SQL File(s): If the extension has any SQL code, it may reside in form SQL files (optional)
C Code: The shared object that we want to build (optional).

Extension Makefile

To compile the C code, we need a makefile. It’s a very simple makefile with the exception of “PGXS”, which is PostgreSQL’s infrastructural makefile for creating extensions. The inclusion of “PGXS” is done by invoking pg_config binary with “–pgxs” flag. The detail of that file can be found at GitHub[2].

This is a sample makefile, which can be used to compile the C code.

EXTENSION = log
MODULE_big = log
DATA = log--0.0.1.sql
OBJS = log.o 
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

Extension Control File

This file must be named as [EXTENSION NAME]. control file. The control file can contain many options which can be found at official documentation [3]. But in this example, I have used some basic options.

comments: Comments about the extension.

default_version: This is the extension SQL version. The name of the SQL file contains this information in the file name.

relocatable: Tells PostgreSQL if it is possible to move contained objects into a different schema.

module_pathname: This information is replaced with the actual lib file path.

comment = 'PostgreSQL Utility Command Logger
'default_version = '0.0.1'
relocatable = true
module_pathname = '$libdir/log'

The contents of the file can be seen using the psql command \dx psql.

postgres=# \dx log      
List of installed extensions Name   | Version | Schema |       Description 
------------------------------------+---------+--------+---------------------------------- 
log                                 | 0.0.1   | public | PostgreSQL Utility Command Logger

Extension SQL File

This is a mapping file, which I used to map the PostgreSQL function with the corresponding C function. Whenever you call the SQL function then the corresponding C function is called. The name of the file must be [EXTENSION NAME]–[default-version].sql. This is the same default_version as defined in the control file.

CREATE FUNCTION pg_all_queries(OUT query TEXT, pid OUT TEXT)
RETURNS SETOF RECORDAS 'MODULE_PATHNAME', 
'pg_all_queries'
LANGUAGE C STRICT VOLATILE;

Extension C Code

There are three kinds of functions you can write in c code.

The first is where you call your c code function using SQL function written in SQL file of the extension.

The second type of function is callbacks. You register that callback by assigning the pointer of the function. There is no need for an SQL function here. You call this function automatically when a specific event occurs.

The third type of function is called automatically, even without registering. These functions are called on events such as extension load/unload time etc.

This is the C file containing the definition of the C code. There is no restriction for the name, or of the number of C files.

#include "postgres.h"
/* OS Includes */
/* PostgreSQL Includes */
PG_MODULE_MAGIC;
void _PG_init(void);
void _PG_fini(void);
Datum pg_all_queries(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(pg_all_queries);
static void process_utility(PlannedStmt *pstmt, const char *queryString,ProcessUtilityContext context,ParamListInfo params,QueryEnvironment *queryEnv,DestReceiver *dest,       char *completionTag);

You need to include postgres.h for the extension. You can include other PostgreSQL as needed. PG_MODULE_MAGIC is macro which you need to include in the C file for the extension. _PG_init and _PG_fini are functions that are called when the extension is loaded or unloaded, respectively.

Here is an example of the load and unload functions of an extension.

void _PG_init(void)
{
    /* ... C code here at time of extension loading ... */
    ProcessUtility_hook = process_utility;
}
 
Void _PG_fini(void)
{
    /* ... C code here at time of extension unloading ... */
}

Here is an example of a callback function that you can invoke whenever you call a utility statement e.g. any DDL statement. The “queryString” variable contains the actual query text.

static void process_utility(PlannedStmt *pstmt, 
                           const char *queryString,
                           ProcessUtilityContext context,
                           ParamListInfo params,
                           QueryEnvironment *queryEnv,DestReceiver *dest,
                           char *completionTag)
{
    /* ... C code here ... */
    standard_ProcessUtility(pstmt,   
                            queryString,   
                            context,   
                            params,   
                            queryEnv,   
                            dest, 
                            completionTag);
    /* ... C code here ... */
}

Finally, here’s an example of a C function that is invoked through a user-defined SQL function. This internally calls the C function contained in our shared object.

Datum pg_all_queries(PG_FUNCTION_ARGS)
{
    /* ... C code here ... */
    tupstore = tuplestore_begin_heap(true, false, work_mem);
    /* ... C code here ... */     
    values[0] = CStringGetTextDatum(query);
    values[1] = CStringGetTextDatum(pid);
    /* ... C code here ... */
    tuplestore_donestoring(tupstore);
    return (Datum) 0;
}

Compile and Install

Before compilation, you need to set the PATH for PostgreSQL’s bin directory if there is no pg_config available in the system paths.

export PATH=/usr/local/pgsql/bin:$PATH

make USE_PGXS=1

make USE_PGXS=1 install

Output

We can now use our extension through a simple SQL query. Following is the output that is coming straight out of extension written in C programming language.

postgres=# select * from pg_all_queries();          
query                      | pid 
--------------------------+|------ 
create table foo(a int);  +| 8196 
create table bar(a int);  +| 8196 
drop table foo;           +| 8196 
(3 rows)

I hope this example can serve as a starting point for you to create more useful extensions that will not only help you and your company, but will also give you an opportunity to share and help the PostgreSQL community grow.

The complete example can be found at Github[4].

[1]: https://www.postgresql.org/docs/current/external-extensions.html

[2]: https://github.com/postgres/postgres/blob/master/src/makefiles/pgxs.mk

[3]: https://www.postgresql.org/docs/9.1/extend-extensions.html

[4]: https://github.com/ibrarahmad/Blog-Examples/tree/master/log

—

Photo from Pexels