

Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part I (The Pretties...
source link: https://richardfoote.wordpress.com/2020/09/25/oracle-19c-automatic-indexing-data-skew-fixed-by-baselines-part-i-the-prettiest-star/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part I (The Prettiest Star)) September 25, 2020
Posted by Richard Foote in 19c, 19c New Features, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Baselines, CBO, Data Skew, Exadata, Full Table Scans, Histograms, Index Access Path, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning.
trackback
In my previous few blog posts, I’ve been discussing some issues in relation to how Automatic Indexes handle SQL statements that accesses skewed data. In this post, I’m going to setup the scenario in which Automatic Indexing can potentially use Baselines to help address some of these issues. BUT, as we’ll see, I’m having to manufacture things somewhat to make this work due to the problem of the Automatic Indexing task using Dynamic Sampling of level 11, whereas most usual database sessions do not.
To set things up, I’m going recap what I’ve previously discussed (but with a slight difference), by creating a table that has significant data skew on the CODE column, with most values very uncommon, but with a handful of values being very common:
SQL> create table space_oddity (id number constraint space_oddity_pk primary key,
code
number, name varchar
2
(
142
));
Table created.
SQL> begin
2
for i in
1
..
2000000
loop
3
if mod(i,
2
) =
0
then
4
insert into space_oddity values(i, ceil(dbms_random.value(
0
,
1000000
)),
'David Bowie is really Ziggy Stardust and his band are called The Spiders From Mars. Then came Aladdin Sane and the rest is history'
);
5
else
6
insert into space_oddity values(i, mod(i,
20
)*
10000
,
'Ziggy Stardust is really David Bowie and his band are called The Spiders From Mars. Then came Aladdin Sane and the rest is history.'
);
7
end if;
8
end loop;
9
commit;
10
end;
11
/
PL/SQL procedure successfully completed.
So most CODE values will only occur a few times if at all, but a few values divisible by 10000 have many many occurrences within the table.
Importantly, we will initially collect statistics with NO histograms on the CODE column, which is the default behaviour anyways if no SQL has previous run with predicates on the column:
SQL> exec dbms_stats.gather_table_stats(null,
'SPACE_ODDITY'
, method_opt=>
'FOR ALL COLUMNS SIZE 1'
);
PL/SQL procedure successfully completed.
If we run a query based on a rare value for CODE:
SQL> set arraysize
5000
SQL> select * from space_oddity where
code
=
25
;
Execution Plan
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
3
|
423
|
11356
(
1
)|
00:
00:
01
|
|*
1
| TABLE ACCESS FULL| SPACE_ODDITY |
3
|
423
|
11356
(
1
)|
00:
00:
01
|
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1
- filter(
"CODE"
=
25
)
Statistics
----------------------------------------------------------
0
recursive calls
0
db
block
gets
40974
consistent gets
0
physical reads
0
redo size
1018
bytes sent via SQL*Net to client
402
bytes received via SQL*Net from client
2
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
2
rows processed
Without an index, the CBO has no choice at this point but to perform a FTS. BUT note that the 2 rows returned is very similar to the 3 estimated rows, which would make an index likely the way to go if such an index existed.
However, the following SQL accesses many of the common values of CODE and returns many rows:
SQL> select * from space_oddity where
code
in (
10000
,
30000
,
50000
,
70000
,
90000
,
110000
,
130000
,
150000
,
170000
,
190000
);
1000011
rows selected.
Execution Plan
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
32
|
4512
|
11425
(
2
)|
00:
00:
01
|
|*
1
| TABLE ACCESS FULL| SPACE_ODDITY |
32
|
4512
|
11425
(
2
)|
00:
00:
01
|
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1
- filter(
"CODE"
=
10000
OR
"CODE"
=
30000
OR
"CODE"
=
50000
OR
"CODE"
=
70000
OR
"CODE"
=
90000
OR
"CODE"
=
110000
OR
"CODE"
=
130000
OR
"CODE"
=
150000
OR
"CODE"
=
170000
OR
"CODE"
=
190000
)
Statistics
----------------------------------------------------------
0
recursive calls
0
db
block
gets
41169
consistent gets
0
physical reads
0
redo size
13535504
bytes sent via SQL*Net to client
2678
bytes received via SQL*Net from client
202
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
1000011
rows processed
Again, without an index in place, the CBO has no choice but to perform a FTS but this is almost certainly the way to go regardless. BUT without a histogram on the CODE column, the CBO has got the cardinality estimate way way off and thinks only 32 rows are to be returned and not the actual 1000011 rows.
So what does Automatic Indexing make of things. Let’s wait and have a look at the next Automatic Indexing Report:
SQL> select dbms_auto_index.report_last_activity() report from dual;
REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start :
18
-AUG
-2020
15:
57:
14
Activity end :
18
-AUG
-2020
15:
58:
10
Executions completed :
1
Executions interrupted :
0
Executions with fatal error :
0
-------------------------------------------------------------------------------
SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates :
1
Indexes created (
visible
/ invisible) :
1
(
1
/
0
)
Space used (
visible
/ invisible) :
35.65
MB (
35.65
MB /
0
B)
Indexes dropped :
0
SQL statements verified :
1
SQL statements improved (improvement factor) :
1
(
40984.3
x)
SQL plan baselines created :
0
Overall improvement factor :
40984.3
x
-------------------------------------------------------------------------------
SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes :
0
Space used :
0
B
Unusable indexes :
0
INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
----------------------------------------------------------------------------
| Owner | Table | Index | Key | Type | Properties |
----------------------------------------------------------------------------
| BOWIE | SPACE_ODDITY | SYS_AI_
82
bdnqs
7
q
8
rtm | CODE | B-TREE | NONE |
----------------------------------------------------------------------------
So Automatic Indexing has indeed created the index (SYS_AI_82bdnqs7q8rtm) on the CODE column BUT this is based on only the one SQL statement:
VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID :
19
sv
1
g
6
tt
0
g
1
y
SQL Text : select * from space_oddity where
code
=
25
Improvement Factor :
40984.3
x
Execution Statistics:
-----------------------------
Original Plan Auto Index Plan
---------------------------- ----------------------------
Elapsed Time (s):
5417408
139265
CPU Time (s):
1771880
7797
Buffer Gets:
327876
5
Optimizer Cost:
11356
5
Disk Reads:
649
2
Direct Writes:
0
0
Rows Processed:
16
2
Executions:
8
1
The Automatic Indexing task has correctly identified a significant improvement of 40984.3x when using an index on the SQL statement that returned just the 2 rows. The other SQL statement that returns many rows IS NOT MENTIONED.
This is because the Automatic Indexing tasks uses Dynamic Sampling Level=11, meaning it determines the more accurate cardinality estimate on the fly and correctly identifies that a vast number of rows are going to be returned. As a result, it correctly determines that the new Automatic Indexing if used would be detrimental to performance and would not be used by the CBO.
BUT most importantly, it also makes the assumption that the CBO would automatically likewise make this same decision to NOT use any such index in other database sessions and so there’s nothing to protect.
BUT this assumption is incorrect IF other database sessions don’t likewise use Dynamic Sampling with Level=11.
BUT by default, including in Oracle’s Autonomous Database Transaction Processing Cloud environment, the Dynamic Sampling Level is NOT set to 11, but the 2.
Therefore, most database sessions will not be able to determine the correct cardinality estimate on the fly and so will incorrectly assume the number of returned rows is much less than in reality and potentially use any such new Automatic Index inappropriately…
So if we look at the Plans Section of the Automatic Indexing report:
PLANS SECTION
---------------------------------------------------------------------------------------------
- Original
-----------------------------
Plan Hash Value :
2301175572
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------------
|
0
| SELECT STATEMENT | | | |
11356
| |
|
1
| TABLE ACCESS FULL | SPACE_ODDITY |
3
|
423
|
11356
|
00:
00:
01
|
-----------------------------------------------------------------------------
- With Auto Indexes
-----------------------------
Plan Hash Value :
54782313
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-------------------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
3
|
423
|
5
|
00:
00:
01
|
|
1
| TABLE ACCESS BY INDEX ROWID BATCHED | SPACE_ODDITY |
3
|
423
|
5
|
00:
00:
01
|
| *
2
| INDEX RANGE SCAN | SYS_AI_
82
bdnqs
7
q
8
rtm |
2
| |
3
|
00:
00:
01
|
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
*
2
- access(
"CODE"
=
25
)
Notes
-----
- Dynamic sampling used for this statement (
level
=
11
)
The new plan for the SQL returning 2 rows when using the new Automatic Index and is much more efficient with a significantly reduced cost (just 3 down from 11356).
But again, the plans for the SQL that returns many rows are not listed as the Automatic Indexing task has already determined that an index would make such a plan significantly less efficient.
If we now rerun the SQL the returns many rows (and BEFORE High Frequency Collection Statistics potentially kicks in):
SQL> select * from space_oddity where
code
in (
10000
,
30000
,
50000
,
70000
,
90000
,
110000
,
130000
,
150000
,
170000
,
190000
);
1000011
rows selected.
Execution Plan
-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
32
|
4512
|
35
(
0
)|
00:
00:
01
|
|
1
| INLIST ITERATOR | | | | | |
|
2
| TABLE ACCESS BY INDEX ROWID BATCHED| SPACE_ODDITY |
32
|
4512
|
35
(
0
)|
00:
00:
01
|
|*
3
| INDEX RANGE SCAN | SYS_AI_
82
bdnqs
7
q
8
rtm |
32
| |
12
(
0
)|
00:
00:
01
|
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3
- access(
"CODE"
=
10000
OR
"CODE"
=
30000
OR
"CODE"
=
50000
OR
"CODE"
=
70000
OR
"CODE"
=
90000
OR
"CODE"
=
110000
OR
"CODE"
=
130000
OR
"CODE"
=
150000
OR
"CODE"
=
170000
OR
"CODE"
=
190000
)
Statistics
----------------------------------------------------------
0
recursive calls
0
db
block
gets
410422
consistent gets
0
physical reads
0
redo size
145536076
bytes sent via SQL*Net to client
2678
bytes received via SQL*Net from client
202
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
1000011
rows processed
Note that the cardinality estimate is still way way wrong, thinking that just 32 rows are to be returned, when is fact 1000011 rows are returned.
As a result, the CBO has decided to incorrectly use the new Automatic Index. Incorrectly, in that the number of consistent gets has increased 10x from the previous FTS plan (410,422 now, up from 41,169).
One way to resolve this is to collect histograms on the CODE column (or wait for the High Frequency Stats Collection to kick in):
SQL> exec dbms_stats.gather_table_stats(null,
'SPACE_ODDITY'
, method_opt=> 'FOR ALL COLUMNS SIZE
2048
’);
PL/SQL procedure successfully completed.
If we now re-run this SQL:
SQL> select * from space_oddity where
code
in (
190000
,
170000
,
150000
,
130000
,
110000
,
90000
,
70000
,
50000
,
30000
,
10000
);
1000011
rows selected.
Execution Plan
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
|
0
| SELECT STATEMENT | |
996
K|
133
M|
11411
(
1
)|
00:
00:
01
|
|*
1
| TABLE ACCESS FULL| SPACE_ODDITY |
996
K|
133
M|
11411
(
1
)|
00:
00:
01
|
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1
- filter(
"CODE"
=
10000
OR
"CODE"
=
30000
OR
"CODE"
=
50000
OR
"CODE"
=
70000
OR
"CODE"
=
90000
OR
"CODE"
=
110000
OR
"CODE"
=
130000
OR
"CODE"
=
150000
OR
"CODE"
=
170000
OR
"CODE"
=
190000
)
Statistics
----------------------------------------------------------
0
recursive calls
0
db
block
gets
41169
consistent gets
0
physical reads
0
redo size
13535504
bytes sent via SQL*Net to client
2678
bytes received via SQL*Net from client
202
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
1000011
rows processed
The cardinality estimate is now much more accurate and the the execution plan now uses the more efficient FTS.
In Part II, we’ll look at how the Automatic Indexing tasks can be made to identify the dangers of a new index to SQLs that might degrade in performance and how it will create a Baseline to protect against any such SQL regressions….
Recommend
-
10
Oracle 19c Automatic Indexing: Indexing Partitioned Tables Part I (Conversation Piece) October 14, 2020 Posted by Richard Foote in 19c,
-
6
Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part II (Sound And Vision) September 28, 2020 Posted by Richard Foote in 19c,...
-
6
Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part II ( Sleepwalk) September 21, 2020 Posted by Richard Foote in 19c,
-
9
Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part I (Neighborhood Threat) September 18, 2020 Posted by Richard Foote in 1...
-
10
Oracle 19c Automatic Indexing: Data Skew Part III (The Good Son) September 16, 2020 Posted by Richard Foote in 19c,
-
6
Oracle 19c Automatic Indexing: Data Skew Part I (A Saucerful of Secrets) September 10, 2020 Posted by Richard Foote in 19c,
-
22
Oracle 19c Automatic Indexing: DDL Statements With Auto Indexes (No Control) September 1, 2020 Posted by Richard Foote in 19c,
-
10
Oracle 19c Automatic Indexing: Dropping Automatic Indexes Part II (New Angels of Promise) August 25, 2020 Posted by Richard Foote in 19c,...
-
17
Oracle 19c Automatic Indexing: Currently Broken In Autonomous ATP Databases? (What In The World) December 17, 2020 Posted by Richard Foote in ...
-
5
Oracle 19c Automatic Indexing: Currently Broken In Autonomous ATP Databases Part II (Fix You) January 12, 2021 Posted by Richard Foote in 19c,
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK