Non-Equality Predicates
source link: https://richardfoote.wordpress.com/2021/01/21/oracle-19c-automatic-indexing-non-equality-predicates-part-ii-lets-spend-the-night-together/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Oracle 19c Automatic Indexing: Non-Equality Predicates Part II (Let’s Spend The Night Together) January 21, 2021
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Non-Equality Predicates, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Performance Tuning.
trackback
In my previous post in this series, I discussed out Automatic Indexing currently does not consider Non-Equality predicates. Automatic Indexing will index columns based only on Equality predicates.
So how does Oracle handle the scenario when an SQL has a mixture of both Equality and Non-Equality predicates?
I’ll begin by creating two very similar tables, but with the second table having a more selective CODE column:
SQL> create table pink_floyd (id number,
code
number, create_date date, name varchar
2
(
42
));
Table created.
SQL> insert into pink_floyd select rownum, ceil(dbms_random.value(
0
,
5000
)), sysdate-mod(rownum,
50000
)+
1
,
'Dark Side of the Moon'
from dual connect by
level
<=
10000000
;
10000000
rows created.
SQL> commit;
Commit complete.
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>
'PINK_FLOYD'
);
PL/SQL procedure successfully completed.
SQL> create table pink_floyd
1
(id number,
code
number, create_date date, name varchar
2
(
42
));
Table created.
SQL> insert into pink_floyd
1
select rownum, ceil(dbms_random.value(
0
,
25000
)), sysdate-mod(rownum,
50000
)+
1
,
'Dark Side of the Moon'
from dual connect by
level
<=
10000000
;
10000000
rows created.
SQL> commit;
Commit complete.
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>
'PINK_FLOYD1'
);
PL/SQL procedure successfully completed.
So table PINK_FLOYD has 5,000 distinct CODE values, whereas table PINK_FLOYD1 has 25,000 distinct CODE values.
I’ll next run the following identical SQLs, which both use an Equality predicate on the CODE column and a Non-Equality predicate on the CREATE_DATE column. The CODE column provides some filtering (more so with the PINK_FLOYD1 table) but in combination with the CREATE_DATE column, results in the ultimate filtering with no rows returned:
SQL> select * from pink_floyd where
code
=
42
and create_date>
'19-JAN-2021'
;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value:
1152280033
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |
----------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
1
|
40
|
844
(
11
) |
00:
00:
01
|
|*
1
| TABLE ACCESS STORAGE FULL | PINK_FLOYD |
1
|
40
|
844
(
11
) |
00:
00:
01
|
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1
- storage(
"CREATE_DATE"
>TO_DATE(
' 2021-01-19 00:00:00'
,
'syyyy-mm-ddhh24:mi:ss'
) AND
"CODE"
=
42
)
filter(
"CREATE_DATE"
>TO_DATE(
' 2021-01-19 00:00:00'
,
'syyyy-mm-ddhh24:mi:ss'
) AND
"CODE"
=
42
)
Note
-----
- automatic DOP: Computed Degree of Parallelism is
1
Statistics
----------------------------------------------------------
0
recursive calls
0
db
block
gets
63660
consistent gets
63649
physical reads
0
redo size
426
bytes sent via SQL*Net to client
380
bytes received via SQL*Net from client
1
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
0
rows processed
SQL> select * from pink_floyd
1
where
code
=
42
and create_date>
'19-JAN-2021'
;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value:
564520720
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |
-----------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
1
|
41
|
856
(
11
) |
00:
00:
01
|
|*
1
| TABLE ACCESS STORAGE FULL | PINK_FLOYD
1
|
1
|
41
|
856
(
11
) |
00:
00:
01
|
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1
- storage(
"CODE"
=
42
AND
"CREATE_DATE"
>TO_DATE(
' 2021-01-19 00:00:00'
,
'syyyy-mm-dd hh24:mi:ss'
))
filter(
"CODE"
=
42
AND
"CREATE_DATE"
>TO_DATE(
' 2021-01-19 00:00:00'
,
'syyyy-mm-dd hh24:mi:ss'
))
Note
-----
- automatic DOP: Computed Degree of Parallelism is
1
Statistics
----------------------------------------------------------
0
recursive calls
0
db
block
gets
64424
consistent gets
64413
physical reads
0
redo size
426
bytes sent via SQL*Net to client
381
bytes received via SQL*Net from client
1
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
0
rows processed
So how does Automatic Indexing handle this scenario. If we look at the subsequent Automatic Indexing report (highlights only):
INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
-----------------------------------------------------------------------------
| Owner | Table | Index | Key | Type | Properties |
-----------------------------------------------------------------------------
| BOWIE | PINK_FLOYD
1
| SYS_AI_
96
snkmu
4
sk
44
g | CODE | B-TREE | NONE |
-----------------------------------------------------------------------------
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID :
7
wag
3
gbk
0
b
3
tm
SQL Text : select * from pink_floyd
1
where
code
=
42
and create_date>
'19-JAN-2021'
Improvement Factor :
64442.3
x
Execution Statistics:
-----------------------------
Original Plan Auto Index Plan
---------------------------- ----------------------------
Elapsed Time (s):
568513
2771
CPU Time (s):
275534
1874
Buffer Gets:
1031078
406
Optimizer Cost:
856
405
Disk Reads:
1030609
3
Direct Writes:
0
0
Rows Processed:
0
0
Executions:
16
1
PLANS SECTION
---------------------------------------------------------------------------------------------
- Original
-----------------------------
Plan Hash Value :
564520720
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | | | |
856
| |
|
1
| TABLE ACCESS STORAGE FULL | PINK_FLOYD
1
|
1
|
41
|
856
|
00:
00:
01
|
-----------------------------------------------------------------------------------
Notes
-----
- dop =
1
- px_in_memory_imc = no
- px_in_memory = no
- With Auto Indexes
-----------------------------
Plan Hash Value :
2703636439
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-------------------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
1
|
41
|
405
|
00:
00:
01
|
| *
1
| TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD
1
|
1
|
41
|
405
|
00:
00:
01
|
| *
2
| INDEX RANGE SCAN | SYS_AI_
96
snkmu
4
sk
44
g |
403
| |
3
|
00:
00:
01
|
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
*
1
- filter(
"CREATE_DATE"
>TO_DATE(
' 2021-01-19 00:00:00'
,
'syyyy-mm-dd hh24:mi:ss'
))
*
2
- access(
"CODE"
=
42
)
Notes
-----
- Dynamic sampling used for this statement (
level
=
11
)
If we look at the definitions of all indexes currently on these tables:
SQL> select index_name,
auto
, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name=
'PINK_FLOYD'
;
INDEX_NAME AUT VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dp
2
t
0
j
12
zux
49
YES INVISIBLE ADVANCED LOW UNUSABLE
10000000
21702
4161898
SQL> select index_name, column_name, column_position from user_ind_columns where table_name=
'PINK_FLOYD'
;
INDEX_NAME COLUMN_NAME COLUMN_POSITION
------------------------------ --------------- ---------------
SYS_AI_dp
2
t
0
j
12
zux
49
CODE
1
SQL> select index_name,
auto
, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name=
'PINK_FLOYD1'
;
INDEX_NAME AUT VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_
96
snkmu
4
sk
44
g YES VISIBLE ADVANCED LOW VALID
10000000
15400
9969473
SQL> select index_name, column_name, column_position from user_ind_columns where table_name=
'PINK_FLOYD1'
;
INDEX_NAME COLUMN_NAME COLUMN_POSITION
------------------------------ --------------- ---------------
SYS_AI_
96
snkmu
4
sk
44
g CODE
1
In both cases, Automatic Indexing only created an index on the CODE column, as it was the only column with an Equality predicate.
However, the Automatic Index on the table PINK_FLOYD remained in an INVISIBLE/UNUSABLE. That’s because an index on only the CODE column was not efficient enough to improve the performance of the SQL, due to the filtering not being sufficient enough and because of the relatively poor Clustering Factor.
The index on the table PINK_FLOYD1 was eventually created as a VISIBLE/VALID index, as its better filtering was sufficient to actually improve the performance of the SQL.
So if we re-run the first query:
SQL> select * from pink_floyd where
code
=
42
and create_date>
'19-JAN-2021'
;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value:
1152280033
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |
----------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
1
|
40
|
844
(
11
) |
00:
00:
01
|
|*
1
| TABLE ACCESS STORAGE FULL | PINK_FLOYD |
1
|
40
|
844
(
11
) |
00:
00:
01
|
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1
- storage(
"CREATE_DATE"
>TO_DATE(
' 2021-01-19 00:00:00'
,
'syyyy-mm-ddhh24:mi:ss'
) AND
"CODE"
=
42
)
filter(
"CREATE_DATE"
>TO_DATE(
' 2021-01-19 00:00:00'
,
'syyyy-mm-ddhh24:mi:ss'
) AND
"CODE"
=
42
)
Note
-----
- automatic DOP: Computed Degree of Parallelism is
1
Statistics
----------------------------------------------------------
0
recursive calls
0
db
block
gets
63660
consistent gets
63649
physical reads
0
redo size
426
bytes sent via SQL*Net to client
380
bytes received via SQL*Net from client
1
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
0
rows processed
It continues to use a Full Table Scan.
If we re-run the second query:
SQL> select * from pink_floyd
1
where
code
=
42
and create_date>
'19-JAN-2021'
;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value:
2703636439
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |
------------------------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
1
|
41
|
415
(
0
) |
00:
00:
01
|
|*
1
| TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD
1
|
1
|
41
|
415
(
0
) |
00:
00:
01
|
|*
2
| INDEX RANGE SCAN | SYS_AI_
96
snkmu
4
sk
44
g |
412
| |
3
(
0
) |
00:
00:
01
|
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1
- filter(
"CREATE_DATE"
>TO_DATE(
' 2021-01-19 00:00:00'
,
'syyyy-mm-dd hh24:mi:ss'
))
2
- access(
"CODE"
=
42
)
Note
-----
- automatic DOP: Computed Degree of Parallelism is
1
Statistics
----------------------------------------------------------
0
recursive calls
0
db
block
gets
406
consistent gets
0
physical reads
0
redo size
426
bytes sent via SQL*Net to client
381
bytes received via SQL*Net from client
1
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
0
rows processed
If now uses the newly created Automatic Index, with an improved 406 Consistent Gets (down from the previous 64424 Consistent Gets with the FTS).
BUT if we were to manually create an index on BOTH CODE and CREATE_DATE columns:
SQL> create index pink_floyd
1
_code_create_date_i on pink_floyd
1
(
code
, create_date) compress advanced
low
;
Index created.
SQL> select * from pink_floyd
1
where
code
=
42
and create_date>
'19-JAN-2021'
;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value:
3366491378
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time |
----------------------------------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
1
|
41
|
4
(
0
) |
00:
00:
01
|
|
1
| TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD
1
|
1
|
41
|
4
(
0
) |
00:
00:
01
|
|*
2
| INDEX RANGE SCAN | PINK_FLOYD
1
_CODE_CREATE_DATE_I |
1
| |
3
(
0
) |
00:
00:
01
|
----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2
- access(
"CODE"
=
42
AND
"CREATE_DATE"
>TO_DATE(
' 2021-01-19 00:00:00'
,
'syyyy-mm-dd hh24:mi:ss'
) AND
"CREATE_DATE"
IS NOT NULL)
Note
-----
- automatic DOP: Computed Degree of Parallelism is
1
Statistics
----------------------------------------------------------
0
recursive calls
0
db
block
gets
3
consistent gets
0
physical reads
0
redo size
426
bytes sent via SQL*Net to client
381
bytes received via SQL*Net from client
1
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
0
rows processed
Performance improves significantly further, by reducing Consistent Gets down to just 3.
So if you have SQL statements with a mixture of both Equality and Non-Equality predicates, you may encounter these 2 scenarios:
A potentially efficient index that is not created at all as the filtering on just the Equality based predicates are not sufficient to create a viable index, or
A potentially suboptimal Automatic Index that doesn’t contain useful filtering columns because they’re used in Non-Equality predicates…
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK