27

Markus Winand: PostgreSQL11 Reestablishes Window Functions Leadership

 5 years ago
source link: https://www.tuicool.com/articles/hit/FjqE7zZ
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

PosgreSQL 11 was released four months ago and my review is long overdue. Here we go!

With respect to standard SQL, the main theme in PostgreSQL 11 is window functions ( over ). For almost eight years, from 2009 until 2017, PostgreSQL was the only major free open-source product to support SQL window functions. Just a year later, by September 2018, all open-source competitors have caught up …and some even overtook PostgreSQL . The PostgreSQL community was prepared. PostgreSQL 11 was just released in 2018 , and it has restored and even expanded its leadership position.

This article explains this race and covers other improvements in PostgreSQL 11.

Contents:

  1. Complete SQL:2011 Over Clause

  2. Create Procedure and Call

  3. Parameters in Fetch First|Next

  4. Relative XPath Expressions
  5. Extra: Create Index … Include

Complete SQL:2011 Over Clause

The over clause defines which rows are visible to a window function. Window functions were originally standardized with SQL:2003, and PostgreSQL has supported them since PostgreSQL 8.4 (2009). In some areas, the PostgreSQL implementation was less complete than the other implementations ( range frames, ignore nulls ), but in other areas it was the first major system to support them (the window clause). In general, PostgreSQL was pretty close to the commercial competitors, and it was the only major free database to support window functions at all—until recently.

In 2017, MariaDB introduced window functions. MySQL and SQLite followed in 2018. At that time, the MySQL implementation of the over clause was even more complete than that of PostgreSQL, a gap that PostgreSQL 11 closed. Furthermore, PostgreSQL is again the first to support some aspects of the over clause, namely theand. These are not yet supported by any other major SQL database—neither open-source, nor commercial.

The only over clause feature not supported by PostgreSQL 11 are pattern and related clauses. These clauses were just standardized with SQL:2016 and do a framing based on a regular expression. No major database supports this this framing yet.

Frame Units

Before looking into the new functionality in PostgreSQL 11, I’ll show you a typical use case of window functions. We can then proceed to the so-called framing .

The example calculates the running total over the column amnt , so the sum over all rows before and up to the current row according to the specified order by clause:

SELECT SUM(amnt)
       OVER(ORDER BY id
<b>            ROWS BETWEEN UNBOUNDED PRECEDING
                     AND CURRENT ROW</b>
           ) running_total
  FROM …

The aggregate function sum is used with the over clause rather than with a group by clause—that makes it a window function. The interesting part in this example is the framing, which is in bold.

Window framing narrows an ordered window to the rows between a specified start and end row.

The general syntax is:

<unit> BETWEEN <window frame bound>
           AND <window frame bound>
[<frame exclusion>]

Let’s start with the window frame bounds, i.e. the definition of the frame’s beginning and end.

The window frame bounds can be specified in terms relative to the current row or they can be “unbounded”. Unbounded refers to the start or end of the current result or partition.

CURRENT ROW
| <distance> (PRECEDING|FOLLOWING)
| UNBOUNDED  (PRECEDING|FOLLOWING)

The following frame definition uses frame bounds relative to the current row.

<unit> BETWEEN 1 PRECEDING AND CURRENT ROW

To truly understand the meaning of relative bounds, we must also understand how the three frame units— rows , range , and groups —change the meaning of these bounds.

The rows unit does just what you might expect: it interprets current row as referring to the current row and <distance> in preceding and following as a number of rows. With the rows unit, the previous example defines a frame that includes up to two rows: one before the current row and the current row itself. If there is no row before the current row, e.g. because the current row is the first row, the frame just covers the current row itself.

The next frame unit, range , does not count rows at all. Instead it uses the value of the sort key ( order by expression) and adds or subtracts the specified <distance> . All rows for which the value of the sort key falls into the specified range are taken into the frame.

Note that current row as range bound refers to all rows with the same value as the current row. That can be many rows. Think of current row as though it was 0 preceding or 0 following .In case of range , “current peers” or “current value” might have been a better choice than current row .

The following figure uses the unit range instead of rows . As the value of the current row is two, the frame covers all rows with the values one to two (inclusive). The frame begins at the first row, because its value is one and thus falls into the value range. The end of the frame is even beyond the current row as the next row still falls into the value range.

This is an example that works in MySQL 8.0, but not in PostgreSQL prior to version 11. Although range frames were supported by PostgreSQL before, you could not use a numeric distance as shown above. Only unbounded and current row could be used before PostgreSQL 11. That is still the case in SQL Server and SQLite, by the way. PostgreSQL 11 supports all frame units with all boundary types.

On my Own Behalf

I make my living from training , other SQL related services and sellingmy book. Learn more at https://winand.at/ .

Even the last frame unit, groups , is fully supported by PostgreSQL 11. Groups assigns each row of the result or partition into a group just like the group by clause does. The <distance> then refers to the number of groups to cover before and after the current row, i.e. the number of distinct sort key values.

The following figure shows how the groups frame covers one distinct value before the current value ( 1 preceding ) and the current value itself ( current row ). The numeric difference between the values does not matter, nor does the number of rows. Groups is solely about the number of distinct values.

PostgreSQL 11 is the first major SQL database to support groups frames.

Frame Exclusion

Another feature that is not yet implemented by any other major SQL product is frame exclusion . It removes rows from the frame that are related to the current row.

The default is exclude no others , which does not remove any rows.

The next option is to remove the current row itself from the frame.

EXCLUDE CURRENT ROW

Note that the meaning of the exclude clause is not affected by the frame unit. Current row just removes the current row—even if the range or groups unit is used and the current row has peers. This is different from the behavior of current row in a frame bound.

To remove the current row along with all its peers from the frame, use exclude group .

EXCLUDE GROUP

Again, this is independent of the frame unit and thus also removes peers when using the rows unit.

Finally, it is also possible to remove the peers of the current row, but not the current row itself:

EXCLUDE TIES

Compatibility

Create Procedure and Call

PostgreSQL has supported user-defined functions for ages. Create procedure , on the other hand, was not supported before PostgreSQL 11. Previously, returns void functions were often used to mimic procedures.

The new procedures introduced to PostgreSQL 11 differ from functions in several ways:

  • They cannot return anything

  • They are invoked by the call statement, rather than in an SQL expression

  • They can contain transaction control statements (in particular commit and rollback )

There is not much more I have to say about procedures, except that it is still a work in progress topic . This is especially true for drivers like JDBC.

Despite the size of this support matrix, there are several aspects I have not tested:

  • Security (T323, T324)

  • Semantics of drop restrict|cascade (F032)

  • Cyclic dependencies (T655)

  • Collection type parameters (S201, S202)

  • Dynamic SQL in routines (T652)

  • Schema statements in routines (T651)

  • Overloading (T341)

  • Proprietary extensions such as alter procedure , drop routine , transaction control in routines, etc.

Parameters in Fetch First|Next

Fetch first N rows only is the standard SQL syntax for the well known but proprietary limit clause. It was introduced with SQL:2008 and then promptly supported by PostgreSQL in 2009 (version 8.4). However, there was one very small gotcha: the use of a parameter instead of a literal value required the parameter to be enclosed in a pair of parentheses.

FETCH FIRST ($1) ROWS ONLY

Although it is not a big deal if you know about it, it can drive you crazy when you get the error message “ syntax error at or near "$1" ” when you omit the parentheses.

PostgreSQL 11 accepts parameters (and expressions) without parentheses.

Relative XPath Expressions

Another small annoyance—one that was easy to live with when you knew about it—was that PostgreSQL has interpreted relative XPath expressions in XML functions as being relative to the root node of the document. Makes sense, right? Not really, it should be the document node .

Consider the following example, which uses xmltable (introduced with PostgreSQL 10) to transform an XML document into columns and rows.

SELECT c
  FROM (VALUES ('<b><root>
                  <c>c1</c>
                  <c>c2</c>
                </root></b>'::xml
               )
       ) t(x)
     , XMLTABLE (<b>'root/c' -- XPath expression</b>
                 PASSING x
                 COLUMNS c TEXT PATH '.'
                )

If the XPath expression 'root/c' is interpreted relative to the document node, as mandated by the standard, it will match both <c> elements in <root> .

Until version 10, PostgreSQL evaluated those expressions relative to the root note <root> , meaning that this expression doesn’t match anything. In older releases you would either have to use the relative XPath expression 'c' or, preferably, the absolute XPath expression '/root/c' to get the same result.

Other News

The above-mentioned features that relate to the SQL standard are just a small part of the changes in PostgreSQL 11. Please have a look at the release notes for all the changes .

For your convenience, I’ll give you a little teaser:

Partitioning

Partitioning is no longer sadly incomplete. New in PostgreSQL 11:

  • Cross partition primary key and unique constraints

  • Foreign keys are supported in one way (partitioned table can refer to non-partitioned table)

  • Update statements can move rows to another partition

  • Default partitions

  • Hash partitioning

Parallel Processing

Improvements of existing parallel execution ( Hash Join , Seq Scan ).

A few more commands that can be executed in parallel: creation of b-tree indexes, create table … as select , create materialized view .

Just In Time (JIT)

Expressions in queries can be compiled into native code rather than being interpreted from the abstract syntax tree. Benchmarks have shown almost 30% run time improvement on queries that are expression heavy .

Fast Add Column

Adding a new column to an existing table is a fairly common task. If the new column has a default value of null , PostgreSQL was already able to add this column by changing only the table’s metadata. PostgreSQL 11 extends this ability to columns with a constant default value.

Quit and Exit in Psql

PostgreSQL has learned its lessons from "vi" . Quoting from the PostgreSQL 11 announcement : “The inclusion of the keywords "quit" and "exit" in the PostgreSQL command-line interface to help make it easier to leave the command-line tool.”

Extra: Create Index … Include

There is another PostgreSQL 11 feature that actually deserves its own article: create index … include . This article will be published on Use The Index, Luke! soon. Follow use-the-index-luke.com via Twitter , e-mail or RSS to get it.

If you’d like to learn more about modern SQL, have a look at my training in Vienna . In addition to window functions (mentioned above), it covers recursion and indexing, and greatly improves your understanding of basic SQL concepts. The training is based on the current draft of my next book. Check it out now!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK