2

SQL Window Functions Introduction

 2 years ago
source link: https://turkogluc.com/sql-window-functions/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

SQL Window Functions Introduction

PostgreSQL • Oct 23, 2021

Window functions perform a calculation over a set of rows that are connected in some way to the current row. It can be compared with GROUP BY and aggregate functions. However with window functions, rows keep their separate identities in the output instead of being grouped into a single output row. The window function is able access more than just the current row of the query result under the hood. Window definition has the following syntax:

WINDOW window_name AS
(
	[ PARTITION BY expression [, ...] ]
	[ ORDER BY expression [ ASC | DESC ] [, ...] ]
)

The ORDER and PARTITION define what is called the "window", the ordered subset of data over which calculations are made.

PARTITION BY

Let's focus on the partitioning concept. When we partition by a field, the table is divided to the partitions/groups and each row individually can access to the items from its own partition. For example:

SELECT emp_no,
       department,
       salary,
       AVG(salary) OVER w
FROM emp_salary
    WINDOW w AS (PARTITION BY department)

The aggregate function AVG is used OVER the window w and the result occurs to be:

partition by query resultpartition by query result

Note that each color represents a partition.

The rows and the first 3 columns are the directly from the emp_salary table and we would see that exact part with SELECT * FROM emp_salary query, the last column comes with the aggregate function AVG that operates on the partition of the current row and calculates the average of salary of the partition.

For each row, the window function is calculated over the rows that fall in the same partition as the current row.

As you can see from the following explain analyse result, it is important to notice that the PARTITION BY statement first orders the table by the partitioned column:

+--------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN                                                                                                          |
+--------------------------------------------------------------------------------------------------------------------+
|WindowAgg  (cost=83.37..104.37 rows=1200 width=72) (actual time=0.589..0.879 rows=10 loops=1)                       |
|  ->  Sort  (cost=83.37..86.37 rows=1200 width=40) (actual time=0.371..0.477 rows=10 loops=1)                       |
|        Sort Key: department                                                                                        |
|        Sort Method: quicksort  Memory: 25kB                                                                        |
|        ->  Seq Scan on emp_salary  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.037..0.129 rows=10 loops=1)|
|Planning Time: 0.068 ms                                                                                             |
|Execution Time: 1.053 ms                                                                                            |
+--------------------------------------------------------------------------------------------------------------------+

ORDER BY

ORDER BY statement can be omitted as it is optional, however there is another important concept to understand the behaviour of using it. For each row, there is a set of rows within its partition called its window frame.

  • When the ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row.
  • When ORDER BY is omitted the default frame consists of all rows in the partition.

If we compare result of the following query with the previous one we can see the AVG is calculated within the frame, which means from start to the current row:

SELECT emp_no,
       department,
       salary,
       AVG(salary) OVER w
FROM emp_salary
    WINDOW w AS (PARTITION BY department ORDER BY salary)
order by query resultorder by query result

As we can see in the example emp#5 has the average of the first 4 rows.


Usual Aggregates: SUM, COUNT, and AVG

We can use the aggregate  functions that are used normally without windows.

SELECT emp_no,
       department,
       salary,
       AVG(salary) OVER w,
       SUM(salary) OVER w,
       COUNT(salary) OVER w
FROM emp_salary
    WINDOW w AS (PARTITION BY department ORDER BY salary)
sum, avg, count query result

Remember that the frame would be the whole partition if we did not use ORDER BY statement.

ROW_NUMBER

As the name implies, it shows the number of the row within the partition, and it does not take any parameter.

SELECT emp_no,
       department,
       salary,
       ROW_NUMBER() over w
FROM emp_salary
    WINDOW w AS (PARTITION BY department ORDER BY salary)
row_number query result

RANK and DENSE_RANK

Rank is similar to the row_number however when 2 fields are having the same order (based on the order by clause) their rank is considered as the same,  and the next rank is omitted, and +1 value is given for the upcoming row.  dense_rank function does not skip the next rank and assigns it to the upcoming row.

SELECT emp_no,
       department,
       salary,
       RANK() over w,
       DENSE_RANK() over w
FROM emp_salary
    WINDOW w AS (PARTITION BY department ORDER BY salary)
rank, dense_rank query resultrank, dense_rank query result

LAG and LEAD

It can often be useful to compare rows with preceding or following rows, especially if you have the data in a meaningful order. LAG function can access data of the previous rows, and LEAD function can access the next rows relative to the current row. Both functions take offset as parameter to provide the number of rows to go backward of forward.

SELECT emp_no,
       department,
       salary,
       LAG(salary, 1) over w as preceding_salary,
       LEAD(salary, 1) over w as following_salary
FROM emp_salary
    WINDOW w AS (PARTITION BY department ORDER BY salary ASC)
lag and lead query resultlag and lead query result

FIRST_VALUE and LAST VALUE

These functions return the first or last value of the window frame.

SELECT emp_no,
       department,
       salary,
       first_value(salary) over w as smallest_salary,
       last_value(salary) over w  as biggest_salary
FROM emp_salary
    WINDOW w AS (PARTITION BY department ORDER BY salary ASC)
first_value and last_value query resultfirst_value and last_value query result

Please note as we can use the named windows, we could define it directly in the projection column as well:

SELECT emp_no,
       department,
       salary,
       first_value(salary) over (PARTITION BY department ORDER BY salary) as smallest_salary,
       last_value(salary) over (PARTITION BY department ORDER BY salary)  as biggest_salary
FROM emp_salary

References

elephant.png
sql-facebook.png

Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK