11

Non-Equality Predicates

 3 years ago
source link: https://richardfoote.wordpress.com/2021/01/21/oracle-19c-automatic-indexing-non-equality-predicates-part-ii-lets-spend-the-night-together/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Oracle 19c Automatic Indexing: Non-Equality Predicates Part II (Let’s Spend The Night Together) January 21, 2021

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Non-Equality Predicates, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Performance Tuning.
trackback

lets-spend-the-night-together.jpg?w=300&h=300

In my previous post in this series, I discussed out Automatic Indexing currently does not consider Non-Equality predicates. Automatic Indexing will index columns based only on Equality predicates.

So how does Oracle handle the scenario when an SQL has a mixture of both Equality and Non-Equality predicates?

I’ll begin by creating two very similar tables, but with the second table having a more selective CODE column:

SQL> create table pink_floyd (id number, code number, create_date date, name varchar2(42));
Table created.
SQL> insert into pink_floyd select rownum, ceil(dbms_random.value(0, 5000)), sysdate-mod(rownum, 50000)+1, 'Dark Side of the Moon'
from dual connect by level <=10000000;
10000000 rows created.
SQL> commit;
Commit complete.
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD');
PL/SQL procedure successfully completed.
SQL> create table pink_floyd1 (id number, code number, create_date date, name varchar2(42));
Table created.
SQL> insert into pink_floyd1 select rownum, ceil(dbms_random.value(0, 25000)), sysdate-mod(rownum, 50000)+1, 'Dark Side of the Moon'
from dual connect by level <=10000000;
10000000 rows created.
SQL> commit;
Commit complete.
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD1');
PL/SQL procedure successfully completed.

So table PINK_FLOYD has 5,000 distinct CODE values, whereas table PINK_FLOYD1 has 25,000 distinct CODE values.

I’ll next run the following identical SQLs, which both use an Equality predicate on the CODE column and a Non-Equality predicate on the CREATE_DATE column. The CODE column provides some filtering (more so with the PINK_FLOYD1 table) but in combination with the CREATE_DATE column, results in the ultimate filtering with no rows returned:

SQL> select * from pink_floyd where code=42 and create_date> '19-JAN-2021';
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1152280033
----------------------------------------------------------------------------------------
| Id | Operation                 | Name       | Rows | Bytes | Cost (%CPU) | Time      |
----------------------------------------------------------------------------------------
0 | SELECT STATEMENT          |            |    1 |    40 |    844 (11) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD |    1 |    40 |    844 (11) | 00:00:01  |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42)
filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42)
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
63660 consistent gets
63649 physical reads
0 redo size
426 bytes sent via SQL*Net to client
380 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021';
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 564520720
-----------------------------------------------------------------------------------------
| Id | Operation                 | Name        | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------------
0 | SELECT STATEMENT          |             |    1 |    41 |    856 (11) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD1 |    1 |    41 |    856 (11) | 00:00:01  |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00','syyyy-mm-dd hh24:mi:ss'))
filter("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00','syyyy-mm-dd hh24:mi:ss'))
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
64424 consistent gets
64413 physical reads
0 redo size
426 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed

So how does Automatic Indexing handle this scenario. If we look at the subsequent Automatic Indexing report (highlights only):

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
-----------------------------------------------------------------------------
| Owner | Table       | Index                | Key  | Type   | Properties   |
-----------------------------------------------------------------------------
| BOWIE | PINK_FLOYD1 | SYS_AI_96snkmu4sk44g | CODE | B-TREE | NONE         |
-----------------------------------------------------------------------------
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : 7wag3gbk0b3tm
SQL Text            : select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021'
Improvement Factor  : 64442.3x
Execution Statistics:
-----------------------------
Original Plan                Auto Index Plan
---------------------------- ----------------------------
Elapsed Time (s):     568513                       2771
CPU Time (s):         275534                       1874
Buffer Gets:          1031078                      406
Optimizer Cost:       856                          405
Disk Reads:           1030609                      3
Direct Writes:        0                            0
Rows Processed:       0                            0
Executions:           16                           1
PLANS SECTION
---------------------------------------------------------------------------------------------
- Original
-----------------------------
Plan Hash Value : 564520720
-----------------------------------------------------------------------------------
| Id | Operation                 | Name        | Rows | Bytes | Cost | Time       |
-----------------------------------------------------------------------------------
0 | SELECT STATEMENT          |             |      |       |  856 |            |
1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD1 |    1 |    41 856 | 00:00:01   |
-----------------------------------------------------------------------------------
Notes
-----
- dop = 1
- px_in_memory_imc = no
- px_in_memory = no
- With Auto Indexes
-----------------------------
Plan Hash Value : 2703636439
-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows | Bytes | Cost | Time       |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |    1 |    41 405 | 00:00:01   |
| * 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1          |    1 |    41 405 | 00:00:01   |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_96snkmu4sk44g |  403 |       |    3 | 00:00:01   |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 1 - filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
* 2 - access("CODE"=42)
Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

If we look at the definitions of all indexes currently on these tables:

SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='PINK_FLOYD';
INDEX_NAME                     AUT VISIBILIT COMPRESSION   STATUS   NUM_ROWS   LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dp2t0j12zux49           YES INVISIBLE ADVANCED LOW  UNUSABLE   10000000       21702           4161898
SQL> select index_name, column_name, column_position from user_ind_columns where table_name='PINK_FLOYD';
INDEX_NAME                     COLUMN_NAME     COLUMN_POSITION
------------------------------ --------------- ---------------
SYS_AI_dp2t0j12zux49           CODE                          1
SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='PINK_FLOYD1';
INDEX_NAME                     AUT VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_96snkmu4sk44g           YES VISIBLE   ADVANCED LOW  VALID      10000000       15400           9969473
SQL> select index_name, column_name, column_position from user_ind_columns where table_name='PINK_FLOYD1';
INDEX_NAME                     COLUMN_NAME     COLUMN_POSITION
------------------------------ --------------- ---------------
SYS_AI_96snkmu4sk44g           CODE                          1

In both cases, Automatic Indexing only created an index on the CODE column, as it was the only column with an Equality predicate.

However, the Automatic Index on the table PINK_FLOYD remained in an INVISIBLE/UNUSABLE. That’s because an index on only the CODE column was not efficient enough to improve the performance of the SQL, due to the filtering not being sufficient enough and because of the relatively poor Clustering Factor.

The index on the table PINK_FLOYD1 was eventually created as a VISIBLE/VALID index, as its better filtering was sufficient to actually improve the performance of the SQL.

So if we re-run the first query:

SQL> select * from pink_floyd where code=42 and create_date> '19-JAN-2021';
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1152280033
----------------------------------------------------------------------------------------
| Id | Operation                 | Name       | Rows | Bytes | Cost (%CPU) | Time      |
----------------------------------------------------------------------------------------
0 | SELECT STATEMENT          |            |    1 |    40 |    844 (11) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD |    1 |    40 |    844 (11) | 00:00:01  |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42)
filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42)
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
63660 consistent gets
63649 physical reads
0 redo size
426 bytes sent via SQL*Net to client
380 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed

It continues to use a Full Table Scan.

If we re-run the second query:

SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021';
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 2703636439
------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) | Time      |
------------------------------------------------------------------------------------------------------------
0 | SELECT STATEMENT                    |                      |    1 |    41 |     415 (0) | 00:00:01  |
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1          |    1 |    41 |     415 (0) | 00:00:01  |
|* 2 | INDEX RANGE SCAN                    | SYS_AI_96snkmu4sk44g |  412 |       |       3 (0) | 00:00:01  |
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
2 - access("CODE"=42)
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
406 consistent gets
0 physical reads
0 redo size
426 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed

If now uses the newly created Automatic Index, with an improved 406 Consistent Gets (down from the previous 64424 Consistent Gets with the FTS).

BUT if we were to manually create an index on BOTH CODE and CREATE_DATE columns:

SQL> create index pink_floyd1_code_create_date_i on pink_floyd1(code, create_date) compress advanced low;
Index created.
SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021';
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 3366491378
----------------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                           | Rows | Bytes | Cost (%CPU) | Time      |
----------------------------------------------------------------------------------------------------------------------
0 | SELECT STATEMENT                    |                                |    1 |    41 |       4 (0) | 00:00:01  |
1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1                    |    1 |    41 |       4 (0) | 00:00:01  |
|* 2 | INDEX RANGE SCAN                    | PINK_FLOYD1_CODE_CREATE_DATE_I |    1 |       |       3 (0) | 00:00:01  |
----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"CREATE_DATE" IS NOT NULL)
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
426 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed

Performance improves significantly further, by reducing Consistent Gets down to just 3.

So if you have SQL statements with a mixture of both Equality and Non-Equality predicates, you may encounter these 2 scenarios:

A potentially efficient index that is not created at all as the filtering on just the Equality based predicates are not sufficient to create a viable index, or

A potentially suboptimal Automatic Index that doesn’t contain useful filtering columns because they’re used in Non-Equality predicates…


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK