Non-Equality Predicates

Oracle 19c Automatic Indexing: Non-Equality Predicates Part II (Let’s Spend The Night Together) January 21, 2021

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Non-Equality Predicates, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Performance Tuning.
trackback

lets-spend-the-night-together.jpg?w=300&h=300

In my previous post in this series, I discussed out Automatic Indexing currently does not consider Non-Equality predicates. Automatic Indexing will index columns based only on Equality predicates.

So how does Oracle handle the scenario when an SQL has a mixture of both Equality and Non-Equality predicates?

I’ll begin by creating two very similar tables, but with the second table having a more selective CODE column:

SQL> create table pink_floyd (id number, code number, create_date date, name varchar2(42));

Table created.

SQL> insert into pink_floyd select rownum, ceil(dbms_random.value(0, 5000)), sysdate-mod(rownum, 50000)+1, 'Dark Side of the Moon'

from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD');

PL/SQL procedure successfully completed.

SQL> create table pink_floyd1 (id number, code number, create_date date, name varchar2(42));

Table created.

SQL> insert into pink_floyd1 select rownum, ceil(dbms_random.value(0, 25000)), sysdate-mod(rownum, 50000)+1, 'Dark Side of the Moon'

from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD1');

PL/SQL procedure successfully completed.

So table PINK_FLOYD has 5,000 distinct CODE values, whereas table PINK_FLOYD1 has 25,000 distinct CODE values.

I’ll next run the following identical SQLs, which both use an Equality predicate on the CODE column and a Non-Equality predicate on the CREATE_DATE column. The CODE column provides some filtering (more so with the PINK_FLOYD1 table) but in combination with the CREATE_DATE column, results in the ultimate filtering with no rows returned:

SQL> select * from pink_floyd where code=42 and create_date> '19-JAN-2021';

no rows selected

Execution Plan

----------------------------------------------------------

Plan hash value: 1152280033

----------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 40 | 844 (11) | 00:00:01 |

|* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD | 1 | 40 | 844 (11) | 00:00:01 |

----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42)

filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42)

Note

-----

- automatic DOP: Computed Degree of Parallelism is 1

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

63660 consistent gets

63649 physical reads

0 redo size

426 bytes sent via SQL*Net to client

380 bytes received via SQL*Net from client

1 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

0 rows processed

SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021';

no rows selected

Execution Plan

----------------------------------------------------------

Plan hash value: 564520720

-----------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 41 | 856 (11) | 00:00:01 |

|* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD1 | 1 | 41 | 856 (11) | 00:00:01 |

-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00','syyyy-mm-dd hh24:mi:ss'))

filter("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00','syyyy-mm-dd hh24:mi:ss'))

Note

-----

- automatic DOP: Computed Degree of Parallelism is 1

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

64424 consistent gets

64413 physical reads

0 redo size

426 bytes sent via SQL*Net to client

381 bytes received via SQL*Net from client

1 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

0 rows processed

So how does Automatic Indexing handle this scenario. If we look at the subsequent Automatic Indexing report (highlights only):

INDEX DETAILS

-------------------------------------------------------------------------------

The following indexes were created:

*: invisible

-------------------------------------------------------------------------------

-----------------------------------------------------------------------------

-----------------------------------------------------------------------------

-----------------------------------------------------------------------------

-------------------------------------------------------------------------------

Parsing Schema Name : BOWIE

SQL ID : 7wag3gbk0b3tm

SQL Text : select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021'

Improvement Factor : 64442.3x

Execution Statistics:

-----------------------------

Original Plan Auto Index Plan

---------------------------- ----------------------------

Elapsed Time (s): 568513 2771

CPU Time (s): 275534 1874

Buffer Gets: 1031078 406

Optimizer Cost: 856 405

Disk Reads: 1030609 3

Direct Writes: 0 0

Rows Processed: 0 0

Executions: 16 1

PLANS SECTION

---------------------------------------------------------------------------------------------

- Original

-----------------------------

Plan Hash Value : 564520720

-----------------------------------------------------------------------------------

-----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | | | 856 | |

| 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD1 | 1 | 41 | 856 | 00:00:01 |

-----------------------------------------------------------------------------------

Notes

-----

- dop = 1

- px_in_memory_imc = no

- px_in_memory = no

- With Auto Indexes

-----------------------------

Plan Hash Value : 2703636439

-------------------------------------------------------------------------------------------------------

| Id  | Operation                           | Name                 | Rows | Bytes | Cost | Time       |

-------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 41 | 405 | 00:00:01 |

| * 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1 | 1 | 41 | 405 | 00:00:01 |

| * 2 | INDEX RANGE SCAN | SYS_AI_96snkmu4sk44g | 403 | | 3 | 00:00:01 |

-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

------------------------------------------

* 1 - filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

* 2 - access("CODE"=42)

Notes

-----

- Dynamic sampling used for this statement ( level = 11 )

If we look at the definitions of all indexes currently on these tables:

SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor

from user_indexes where table_name='PINK_FLOYD';

INDEX_NAME                     AUT VISIBILIT COMPRESSION   STATUS   NUM_ROWS   LEAF_BLOCKS CLUSTERING_FACTOR

------------------------------ --- --------- ------------- -------- ---------- ----------- -----------------

SYS_AI_dp2t0j12zux49 YES INVISIBLE ADVANCED LOW UNUSABLE 10000000 21702 4161898

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='PINK_FLOYD';

INDEX_NAME COLUMN_NAME COLUMN_POSITION

------------------------------ --------------- ---------------

SYS_AI_dp2t0j12zux49 CODE 1

SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor

from user_indexes where table_name='PINK_FLOYD1';

INDEX_NAME                     AUT VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR

------------------------------ --- --------- ------------- -------- ---------- ----------- -----------------

SYS_AI_96snkmu4sk44g YES VISIBLE ADVANCED LOW VALID 10000000 15400 9969473

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='PINK_FLOYD1';

INDEX_NAME COLUMN_NAME COLUMN_POSITION

------------------------------ --------------- ---------------

SYS_AI_96snkmu4sk44g CODE 1

In both cases, Automatic Indexing only created an index on the CODE column, as it was the only column with an Equality predicate.

However, the Automatic Index on the table PINK_FLOYD remained in an INVISIBLE/UNUSABLE. That’s because an index on only the CODE column was not efficient enough to improve the performance of the SQL, due to the filtering not being sufficient enough and because of the relatively poor Clustering Factor.

The index on the table PINK_FLOYD1 was eventually created as a VISIBLE/VALID index, as its better filtering was sufficient to actually improve the performance of the SQL.

So if we re-run the first query:

SQL> select * from pink_floyd where code=42 and create_date> '19-JAN-2021';

no rows selected

Execution Plan

----------------------------------------------------------

Plan hash value: 1152280033

----------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 40 | 844 (11) | 00:00:01 |

|* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD | 1 | 40 | 844 (11) | 00:00:01 |

----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42)

filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42)

Note

-----

- automatic DOP: Computed Degree of Parallelism is 1

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

63660 consistent gets

63649 physical reads

0 redo size

426 bytes sent via SQL*Net to client

380 bytes received via SQL*Net from client

1 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

0 rows processed

It continues to use a Full Table Scan.

If we re-run the second query:

SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021';

no rows selected

Execution Plan

----------------------------------------------------------

Plan hash value: 2703636439

------------------------------------------------------------------------------------------------------------

| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) | Time      |

------------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 41 | 415 (0) | 00:00:01 |

|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1 | 1 | 41 | 415 (0) | 00:00:01 |

|* 2 | INDEX RANGE SCAN | SYS_AI_96snkmu4sk44g | 412 | | 3 (0) | 00:00:01 |

------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

2 - access("CODE"=42)

Note

-----

- automatic DOP: Computed Degree of Parallelism is 1

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

406 consistent gets

0 physical reads

0 redo size

426 bytes sent via SQL*Net to client

381 bytes received via SQL*Net from client

1 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

0 rows processed

If now uses the newly created Automatic Index, with an improved 406 Consistent Gets (down from the previous 64424 Consistent Gets with the FTS).

BUT if we were to manually create an index on BOTH CODE and CREATE_DATE columns:

SQL> create index pink_floyd1_code_create_date_i on pink_floyd1(code, create_date) compress advanced low;

Index created.

SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021';

no rows selected

Execution Plan

----------------------------------------------------------

Plan hash value: 3366491378

----------------------------------------------------------------------------------------------------------------------

| Id | Operation                           | Name                           | Rows | Bytes | Cost (%CPU) | Time      |

----------------------------------------------------------------------------------------------------------------------

| 0

| SELECT STATEMENT                    |                                |

1 | 41 | 4 (0) | 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1 | 1 | 41 | 4 (0) | 00:00:01 |

|* 2 | INDEX RANGE SCAN | PINK_FLOYD1_CODE_CREATE_DATE_I | 1 | | 3 (0) | 00:00:01 |

----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND

"CREATE_DATE" IS NOT NULL)

Note

-----

- automatic DOP: Computed Degree of Parallelism is 1

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

3 consistent gets

0 physical reads

0 redo size

426 bytes sent via SQL*Net to client

381 bytes received via SQL*Net from client

1 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

0 rows processed

Performance improves significantly further, by reducing Consistent Gets down to just 3.

So if you have SQL statements with a mixture of both Equality and Non-Equality predicates, you may encounter these 2 scenarios:

A potentially efficient index that is not created at all as the filtering on just the Equality based predicates are not sufficient to create a viable index, or

A potentially suboptimal Automatic Index that doesn’t contain useful filtering columns because they’re used in Non-Equality predicates…

Oracle 19c Automatic Indexing: Non-Equality Predicates Part II (Let’s Spend The Night Together) January 21, 2021

Recommend

IPv6 Security Guide: Do you Have a Blindspot?

.NET 彈性日期時間字串解析及小地雷

體驗 .NET 5 單一檔案部署功能

How to use a function or expression in dplyr mutate on the attributes of a templ...

Service Worker学习与实践（二）——PWA简介

科技爱好者周刊（第 143 期）：世界尽头与冷酷仙境

What does waitpid () do?

浅析Spring Boot单体应用熔断技术

Node.js 资源压缩 zlib模块

PowerSehll 小技巧 - 開關參數與腳本使用說明

About Joyk