139

Modern SQL: NULL — purpose, comparisons, NULL in expressions, mapping to/from NU...

 6 years ago
source link: http://modern-sql.com/concept/null
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

NULL in SQL: Indicating the Absence of Data


“Every [SQL] data type includes a special value, called the null value,”0 “that is used to indicate the absence of any data value”.1

The null value does not indicate why a value is absent—it simply marks the places that do not have a data value. For example, the SQL language itself uses the null value for the absent values resulting from an outer join.2 Although exceptions exist,3 it is not generally possible to tell why a value is null.

Users can use the null value for any reason they like. A very common use case is to allow optional attributes without introducing an extra table. Another important use case is error handling: Contrary to other programming languages, it is not an error to process null values in SQL. Null values just propagate through expressions without aborting execution.

Contents:

Comparisons Involving null

Comparisons (<, >, =, …) to null are neither true nor false but instead return the third logical value of SQL: unknown. The following example removes all rows—even those where col is actually null—because the where clause treats unknown like false.

WHERE col = null

Warning

The SQL Server setting SET ANSI_NULLS OFF changes this behavior.

Testing for null: is [not] null

The SQL predicate is [not] null tests whether the result of an expression is null or not.

<expression> is null

This returns true if the value of expression is null, and false otherwise. The optional not negates the result.

Null-Aware Equals Comparison: is not distinct from

SQL uses two different concepts to tell if two values are the “same”: equals (=) and distinct. The difference is that equals (=) treats comparisons to null as indeterminate—that’s why col = null returns unknown. Distinct, on the other hand, treats null like any other value—i.e. two null values are not distinct from each other (they are indistinguishable) whereas a null value and a not null value are distinct values (not the same).

Originally, distinct was just a concept used internally in the SQL standard to define the behavior of group by, select distinct, count(distinct …) and the like. SQL:1999 and SQL:2003 introduced the is [not] distinct from predicate to make distinct comparisons available to SQL programmers.

<expr1> IS NOT DISTINCT FROM <expr2>

The is [not] distinct from predicate is an optional feature that is still not widely supported. See NULL-Aware Comparison: is [not] distinct from for alternatives.

Mapping to and from null

Sometimes it is required to turn a particular value into null or vice versa. The case expression has two shortcuts for the most common cases.

Coalesce — Map null to Another Value

SQL’s coalesce turns a null value into another value.

COALESCE(<expression>, 0)

The example returns the result of the expression, unless it is null, then it returns zero (0).

Coalesce takes an arbitrary number of arguments and returns the first not null value or null if all arguments are null.

Nullif — Map a Single Value to null

SQL’s nullif maps one particular value to null. If the first argument equals (=) the second one, the result is null. Otherwise the first argument is passed through.

NULLIF(<expression>, <expression>)

Case — Map Multiple Values to null

The general case expression can be used for other cases—e.g. to map multiple values to null:

CASE WHEN <expression> IN (…)
     THEN null
     ELSE <expression>
 END

Of course, you can use other comparisons in the case expression as well: >, <, between and so on.

Null Propagates Through Expressions

Expressions and functions that process a null value generally return the null value.4 Noteworthy exceptions are aggregate functions and—due to the three-valued logic of SQLtwo logical operations.5

The result of the following expressions is therefore null:

1 + NULL
'foo ' || NULL || 'bar'
SUBSTRING('foo bar' FROM 4 FOR NULL)

Exceptions: Oracle and SQL Server Databases

The Oracle database treats an empty string as null and vice versa: on the one hand, '' IS NULL is true, on the other hand is null treated as an empty string in string concatenations (||). The Oracle databases returns 'foo bar' for the second example above.

SQL Server has the deprecated setting SET CONCAT_NULL_YIELDS_NULL OFF to treat null as an empty string when concatenating strings with the + operator.

Note: Non-Standard Functions

Most databases support more functions than specified by the SQL standard. These functions may or may not follow the general idea that null propagates.

The concat() function for string concatenation is an important example because many databases support it. They nevertheless handle null in the input differently: Db2, MySQL and MariaDB act in the spirit of the SQL standard and return null. H2, Oracle, PostgreSQL, and SQL Server treat null in the input to concat() as empty string.

In logical expressions (and, or, not), null is the same as the third truth value unknown.6 Null (unknown) only propagates through logical expressions if the result becomes undecidable due to a null operand. That means that there are two cases in which the result is not null, although one operand is null: Null and false is false because a logical conjunction (and) is false as soon as a single operand is false. The analogous logic applies to null or true. See Logical Operations Involving Unknown for more details.

Null in Aggregate Functions (count, sum, …)

Aggregate functions generally remove null values from their input before performing the aggregation.7 That means that the result of an aggregate function does not necessarily become null if the input contains null values. This behavior is most often used to implement pivot queries.

Think About It

How does the propagation of null through expressions and the removal of null values prior to aggregation affect the following expressions:

SUM(a+b)
SUM(a) + SUM(b)

The result of an aggregate function is only null if it operates on an effectively empty set. This is the case if (1) all rows are removed (because they are null values or due to a filter clause⁠), or (2) an explicit or implied group by () is applied to an actually empty set.8

Count and regr_count never return null. These functions return the numerical value zero (0) for an effectively empty set.9

Most aggregate functions that return structured data (array_agg, json_objectagg, xmlagg) do not remove null values.10 As opposed to json_objectagg, json_arrayagg defaults to absent on null11—although this is barely followed by current implementations.

Null in Distinct, Group by, Partition by, Union, etc.

Grouping operations use distinct comparisons to check whether two values belong to the same group.12 Consequently, all null values are put onto a single group.

This also affects operations that are specified in terms of grouping: distinct (in select or in aggregates), partition by, union (without all), etc.13

Null in Order By

The SQL standard leaves the sorting of null values relative to non-null values up to the vendors:14 null values may sort before or after non-null values (see Compatibility).

SQL:2003 introduced the order by modifier nulls (first|last) so that SQL developers have explicit control where null values appear. This is still not widely supported (see Compatibility).

ORDER BY <expression> [ASC|DESC] [NULLS (FIRST|LAST)]
     [ , <expression> [ASC|DESC] [NULLS (FIRST|LAST)] ]
       …

The effect of nulls (first|last) can be obtained with a case expression in all databases. The following example implements order by c nulls first:

ORDER BY CASE WHEN c IS NULL
              THEN 0
              ELSE 1
          END
       , c

Note that the case expression defines a new order by key for the sole purpose of separating null and not null values.

Null in Unique Constraints

Unique constraints treat null as different to everything—i.e. different to null and different to other values.15 This makes every null a different null so that unique constraints accept multiple null values.

Future developments: NULLS [NOT] DISTINCT

A draft for the next SQL standard introduces the nulls [not] distinct modifier to control the treatment of null values in unique constraints. Opposing SQL:2016 the draft defines the null treatment as implementation defined if the new modifier is absent. Further the draft even allows unique constraints with nulls not distinct to accept multiple null values if all columns of the constraint have the null value.

BigQueryaaDb2 (LUW)aaMariaDBMySQLOracle DBbcPostgreSQLSQL ServerSQLitedefault: nulls distinctdefault: nulls not distinct

  1. Doesn’t support null-able columns in unique constraints (T591)
  2. If all constraint columns are null
  3. If some but not all constraint columns are null

Compatibility

Null has been part of the SQL standard from the beginning. Support for unique constraints on possibly null columns was required by intermediate SQL-92. Since SQL:1999 it is an optional feature (T591).

SQL:2003 introduced null ordering (nulls (first|last)) as part of the optional feature T611, “Elementary OLAP operations”.

BigQuerycgDb2 (LUW)dhMariaDBgMySQLgOracle DBaehPostgreSQLbhSQL ServerbfgSQLitegnullUNIQUE accepts multiple nullorder by … nulls (first|last)is distinct fromis not distinct from

  1. Empty string is null. || treats null as empty string • concat() treats null as empty string
  2. concat() treats null as empty string
  3. Doesn’t support UNIQUE constraints
  4. Cannot create UNIQUE constraints on possibly null columns
  5. Deviates for multi-column constrains (see above)
  6. Accepts only one null occurrence
  7. By default sorted as smallest
  8. By default sorted as greatest

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK