3

Automatic Indexing: JSON Expressions Part I (Making Plans For Nigel)

 2 years ago
source link: https://richardfoote.wordpress.com/2022/04/13/automatic-indexing-json-expressions-part-i-making-plans-for-nigel/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Automatic Indexing: JSON Expressions Part I (Making Plans For Nigel) April 13, 2022

Posted by Richard Foote in Automatic Indexing, Autonomous Database, CBO, Exadata, Function Based Indexes, Index statistics, JSON, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Virtual Columns.
trackback

When Automatic Indexing was first released, one of the restrictions was that automatic indexes on JSON expressions were NOT supported.

However, the Oracle Database 21c doco mentions:

Automatic indexes can be single or multi-column. They are considered for the following: Selected expressions (for example, JSON expressions)“.

So on my (admittedly dodgy) “Exadata” VM, I thought I’ll check out how AI now indeed deals with JSON expressions.

I start by creating a simple little table that uses the new 21c JSON datatype and populate it with some JSON documents (note the PONumber key has effectively unique numeric values assigned):

SQL> CREATE TABLE bowie_json
(id number,
bowie_date date,
bowie_order JSON);
SQL> insert into bowie_json
select
rownum,
sysdate,
'{"PONumber" : ' || rownum || ',
"Reference" : "2022' || rownum || 'DBOWIE",
"Requestor" : "David Bowie",
"User" : "DBOWIE",
"CostCenter" : "A42",
"ShippingInstructions" : {"name" : "David Bowie",
"Address": {"street" : "42 Ziggy Street",
"city" : "Canberra",
"state" : "ACT",
"zipCode" : 2601,
"country" : "Australia"},
"Phone" : [{"type" : "Office", "number" : "417-555-7777"},
{"type" : "Mobile", "number" : "417-555-1234"}]},
"Special Instructions" : null,
"AllowPartialShipment" : true,
"LineItems" : [{"ItemNumber" : 1,
"Part" : {"Description" : "Hunky Dory",
"UnitPrice" : 10.95},
"Quantity" : 5.0},
{"ItemNumber" : 2,
"Part" : {"Description" : "Pin-Ups",
"UnitPrice" : 10.95},
"Quantity" : 3.0}]}'
from dual connect by level <= 2000000;
2000000 rows created.
SQL> commit;
Commit complete
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_JSON');
PL/SQL procedure successfully completed.

As always, it’s important to ensure the table has statistics, as AI does not work properly without them.

I then run a number of SQL statements, with different JSON expression based predicates, including:

SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42';
SQL> select * from bowie_json z where z.bowie_order.PONumber.number()=4242;
SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber' returning number)=42;
Execution Plan
----------------------------------------------------------
Plan hash value: 1196930810
--------------------------------------------------------------------------------
| Id | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time      |
--------------------------------------------------------------------------------
0 | SELECT STATEMENT  |            | 20000 |   12M |  34476 (1) | 00:00:02  |
|* 1 | TABLE ACCESS FULL | BOWIE_JSON | 20000 |   12M |  34476 (1) | 00:00:02  |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON
, '$.PONumber' RETURNING NUMBER NULL ON ERROR)=42)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
259127 consistent gets
200279 physical reads
0 redo size
1595 bytes sent via SQL*Net to client
526 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

They all return just the one row, but must currently use a Full Table Scan with no indexes present.

So what does AI make of things?

The first thing to note is that running the AI last activity report generates the following error:

SQL> select dbms_auto_index.report_last_activity() report from dual;
ERROR:
ORA-30954: char 0 is invalid in json_value(BOWIE_ORDER, '$.PONumber' returning VA
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 177
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 107
ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 8676
ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 8676
ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 9226
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 89
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 167
ORA-06512: at line 1
no rows selected

If we look at the indexes now present with the table:

SQL> select index_name, index_type, auto, visibility, status, num_rows, leaf_blocks, clustering_factor from user_indexes
where table_name='BOWIE_JSON';
INDEX_NAME                INDEX_TYPE                AUT VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------- ------------------------- --- --------- -------- ---------- ----------- -----------------
SYS_IL0000081096C00003$$  LOB                       NO  VISIBLE   VALID
SYS_AI_ayvj257jd93cv      FUNCTION-BASED NORMAL     YES VISIBLE   VALID       2000000        5141            380000
SYS_AI_gpdkwzugdn055      FUNCTION-BASED NORMAL     YES VISIBLE   VALID       2000000        4596            200000
SQL> select index_name, column_expression from user_ind_expressions where table_name='BOWIE_JSON';
INDEX_NAME                COLUMN_EXPRESSION
------------------------- --------------------------------------------------------------------------------
SYS_AI_ayvj257jd93cv      JSON_VALUE("BOWIE_ORDER" FORMAT OSON , '$.PONumber' RETURNING VARCHAR2(4000) ERR
OR ON ERROR NULL ON EMPTY)
SYS_AI_gpdkwzugdn055      JSON_VALUE("BOWIE_ORDER" FORMAT OSON , '$.PONumber' RETURNING NUMBER ERROR ON ER
ROR NULL ON EMPTY)

We can see that AI has indeed created two new automatic indexes, one on the VARCHAR2 JSON expression and one on the NUMBER JSON expression.

If we re-run the SQLs, we notice 3 very important points. Note the following example was run soon after the automatic indexes were created:

SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42';
Execution Plan
----------------------------------------------------------
Plan hash value: 832017402
------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
0 | SELECT STATEMENT                    |                      | 20000 |   12M |    1524 (1) | 00:00:01 |
1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON           | 20000 |   12M |    1524 (1) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | SYS_AI_ayvj257jd93cv |  8000 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING
VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='42')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
234168 consistent gets
200279 physical reads
0 redo size
1595 bytes sent via SQL*Net to client
526 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

The first point to note is that the CBO now chooses to use the newly created automatic index. As only one row is return, this is as one would hope.

But there are two other very important points/issues worth making about the above execution plan and associated costs and statistics. One is associated with new AI behaviour introduced in 21c and the other is associated with an old trap in relation to function-based indexes.

I’ll leave it to the discernible reader to spot these issues, before I cover them in Part II in the coming days…

Comments»

cropped-bowie21.jpg?w=32 1. Automatic Indexing: JSON Expressions Part II (“Without You I’m Nothing” | Richard Foote's Oracle Blog - April 14, 2022

[…] my previous post on Automatic Indexing (AI) and JSON expressions, I mentioned two additional issues with the execution plan generated after the automatic indexes […]

Leave a Reply Cancel reply


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK