1

FORALL DML – why context isn’t everything

 1 year ago
source link: https://mikesmithers.wordpress.com/2023/03/14/forall-dml-why-context-isnt-everything/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

FORALL DML – why context isn’t everything

Posted on March 14, 2023

This post is the latest in an occasional series on the theme of stuff that doesn’t work quite how I thought it did.

It is the result of finding out the fun way that, rather than being “much faster” than using a humble Cursor For Loop , Bulk Collect/Forall for DML in PL/SQL can merely be a bit less slow.

Just in case my boss is reading this, I’d better get my excuses in right at the beginning.

This is what the Oracle PL/SQL Language Reference has to say about Forall :

A FORALL statement is usually much faster than an equivalent FOR LOOP statement. However, a FOR LOOP statement can contain multiple DML statements, while a FORALL statement can contain only one. The batch of DML statements that a FORALL statement sends to SQL differ only in their VALUES and WHERE clauses. The values in those clauses must come from existing, populated collections.

…and later in the same document ….

The FORALL statement runs one DML statement multiple times, with different values in the VALUES and WHERE clauses.

The different values come from existing, populated collections or host arrays. The FORALL statement is usually much faster than an equivalent FOR LOOP statement.

I’ll be testing that assertion in this post.

Using a simple test case, which involves performing different DML operations on different tables using a single record set, I’ll begin by comparing the relative performance of:

  • simple SQL statements inside a PL/SQL block
  • Cursor For Loops
  • Bulk Collect/Forall operations

I’ll then explore the potential performance gains available using database objects such as VARRAYs and Global Temporary Tables to hold the array being used in the DML.

NOTE – If you want to run these tests yourself on your own environment/database version to validate these findings, you can find the scripts I’ve used here on my Github Repo.

Test Scenario

I have a single set of records that I want to use in an insert, an update and a delete.
Each of these operations has a different target table.

I’ve created a simple test case, using this script ( called setup.sql) :

clear screen
set serverout on size unlimited
drop table input_records;
drop table record_statuses;
drop table target_table;
create table input_records
as
select rownum as id
from dual
connect by rownum <=100000;
create table record_statuses
as
select id, 'WAITING' as status
from input_records;
create table target_table ( id number);

The tests have all been run on Oracle Enterprise Edition 19c, although the behaviour described here appears to be consistent across Oracle Database versions from 11g upwards.

I’ve run each script twice in succession and then taken the fastest runtime. This is to ensure I’ve not been caught out by the effects of any caching. Unlikely as I drop and re-create the tables each time, but as I’ve already fallen foul of one assumption recently, I didn’t want to take any chances.

I’ve used tkprof to format tracefile information, so it’s probably worth bearing in mind the column heading definitions for tkprof output :

******************************************************************************
count    = number of times OCI procedure was executed
cpu      = cpu time in seconds executing 
elapsed  = elapsed time in seconds executing
disk     = number of physical reads of buffers from disk
query    = number of buffers gotten for consistent read
current  = number of buffers gotten in current mode (usually for update)
rows     = number of rows processed by the fetch or execute call
******************************************************************************

To generate the trace files, I’ve wrapped the anonymous blocks with the test code with the following statements.

Before the PL/SQL block containing the test code :

alter session set timed_statistics = true;
alter session set sql_trace = true;
alter session set tracefile_identifier = '<a string>';

…where <a string> is a string to include in the tracefile name to make it easier to find in the trace directory on the database server.

After the block :

alter session set sql_trace = false;

select value
from v$diag_info
where name = 'Default Trace File'
/

A Simple SQL Baseline

Let’s start by benchmarking just using SQL.
This is as fast as it gets in Oracle terms…generally speaking.

Therefore it should provide a useful indicator of what the RDBMS is capable of in terms of performing these DML operations ( straight_iud.sql ):

@setup.sql
set timing on
begin
insert into target_table(id)
select id from input_records;
update record_statuses
set status = 'LOADED'
where id in ( select id from input_records);
delete from input_records
where id in ( select id from record_statuses where status = 'LOADED');
commit;
end;
/
set timing off

Once the setup is done, the block completes in 1.878 seconds.

Whilst it may be faster than a fast thing on fast pills, a SQL-only solution is not always practical.
As Steven Feuerstein points out in his excellent article on this subject, there are times when you need PL/SQL to get the job done.

This is where the Bulk Collect/Forall and Cursor For Loop approaches come to the fore.
Let’s start by looking at how they compare when called upon to perform the first of our DML operations…

Inserts

Let’s start with the straight SQL option (straight_insert.sql ):

@setup.sql
set timing on
begin
insert into target_table(id)
select id from input_records;
commit;
end;
/
set timing off

This runs in a predictably rapid 0.128 seconds.

Now for the Cursor For Loop ( forloop_insert.sql ):

@setup.sql
set timing on
declare
v_count pls_integer := 0;
begin
for r_rec in(
select id
from input_records)
loop
insert into target_table(id)
values( r_rec.id);
end loop;   
commit;
end;
/
set timing off

…which completes in an, equally predictable, laggardly 1.445 seconds.

Incidentally, tracing this operation confirms that the select statement in a Cursor For Loop behaves in the same way as a Bulk Collect with the LIMIT set to 100. The Fetch count shows the 100K records have been fetched in batches of 100.

SELECT ID 
FROM
 INPUT_RECORDS


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch     1002      0.43       0.38        154     328059          0      100000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     1006      0.43       0.39        154     328059          0      100000

This brings us to the much vaunted Forall.
If we start by running with a limit of 100 on the Bulk Collect, we can get an idea of how much difference it makes when we minimize the context switches between the SQL and PL/SQL engine as compared to the humble Cursor For Loop ( forall_insert_100.sql ):

@../setup.sql
set timing on
declare
v_count pls_integer := 0;
cursor c_input is
select id
from input_records;
type typ_input is table of input_records%rowtype index by pls_integer;
v_arr_id typ_input;
begin
open c_input;
loop
fetch c_input
bulk collect into v_arr_id
limit 100;
forall i in 1..v_arr_id.count
insert into target_table(id)
values( v_arr_id(i).id);
exit when v_arr_id.count = 0;   
end loop;  
close c_input;
commit;
end;
/
set timing off

Yep, that’s much faster at 0.165 seconds.

We can conclude from this that most of the Cursor For Loop execution time is due to context switching between the SQL and PL/SQL engines.

However, after some trial and error, I’ve found that setting a limit of 1000 is about optimal for the test case at hand – not using a limit clause actually proves to be a bit slower – so I’ll be using that in the rest of the Bulk Collect tests in this post. Starting with this ( forall_insert.sql ):

@../setup.sql
set timing on
declare
v_count pls_integer := 0;
cursor c_input is
select id
from input_records;
type typ_input is table of input_records%rowtype index by pls_integer;
v_arr_id typ_input;
begin
open c_input;
loop
fetch c_input
bulk collect into v_arr_id
limit 1000;
forall i in 1..v_arr_id.count
insert into target_table(id)
values( v_arr_id(i).id);
exit when v_arr_id.count = 0;   
end loop;  
close c_input;
commit;
end;
/
set timing off

This runs in 0.145 seconds.

Let’s tabulate those execution times for ease of comparison :

RankMethodRuntime (seconds)
1Simple SQL0.128
2Forall (Limit 1000)0.145
3Cursor For Loop1.445

So far, it all seems to be pretty much as you might expect.
OK, let’s try some more DML statements…

Running an Insert, Update and Delete

We’re going to insert records into a table as before.
Then, using the same record set, we’ll update a second table then delete from a third table.

We’ve already seen the unadorned SQL script for this test, so let’s move on to the Cursor For Loop (forloop_iud.sql). Once I kick this off, you may want to take the opportunity to go and make a coffee…

@setup.sql
set timing on
declare
v_count pls_integer := 0;
begin
for r_rec in(
select id
from input_records)
loop
insert into target_table(id)
values( r_rec.id);
update record_statuses
set status = 'LOADED'
where id = r_rec.id;
delete from input_records
where id = r_rec.id;
end loop;   
commit;
end;
/
set timing off

That took almost four minutes ( 238.486 seconds).

Let’s see what “much faster” looks like with Forall (forall_iud.sql ):

@setup.sql
set timing on
declare
v_count pls_integer := 0;
cursor c_input is
select id
from input_records;
type typ_input is table of input_records%rowtype index by pls_integer;
v_arr_id typ_input;
begin
open c_input;
loop
fetch c_input
bulk collect into v_arr_id
limit 1000;
forall i in 1..v_arr_id.count
insert into target_table(id)
values( v_arr_id(i).id);
forall j in 1..v_arr_id.count
update record_statuses
set status = 'LOADED'
where id = v_arr_id(j).id;
forall k in 1..v_arr_id.count
delete from input_records
where id = v_arr_id(k).id;
exit when v_arr_id.count = 0;   
end loop;  
close c_input;
commit;
end;
/
set timing off

My coffee’s gone cold. That’s 224.162 seconds – around 6% faster if you’re interested.

I’m not sure that qualifies as “much faster”.

Remember, the simple SQL runtime for the same DML operations was 1.878 seconds.

Time for another table :

RankMethodRuntime (seconds)
1Simple SQL 1.878
2Forall (Limit 1000)224.162
3Cursor For Loop238.486

In order to find out just what’s going on, it may be worth isolating a single DML operation, as we did with the INSERT statement earlier. This time, we’ll try an UPDATE…

Tracing an Update

The simple SQL benchmark for the update is ( straight_update.sql ):

@../setup.sql
insert into target_table
select id from input_records;
commit;
set timing on
begin
update record_statuses
set status = 'LOADED'
where id in ( select id from input_records);
commit;
end;
/
set timing off

This runs in 0.798 seconds.

Now for the Cursor For Loop ( forloop_update.sql):

@setup.sql
insert into target_table
select id from input_records;
commit;
set timing on
declare
v_count pls_integer := 0;
begin
for r_rec in(
select id
from input_records)
loop
update record_statuses
set status = 'LOADED'
where id = r_rec.id;
end loop;   
commit;
end;
/
set timing off

144.856 seconds.

Now for the Forall (forall_update.sql ):

@setup.sql
insert into target_table(id)
select id from input_records;
commit;   
set timing on
declare
v_count pls_integer := 0;
cursor c_input is
select id
from input_records;
type typ_input is table of input_records%rowtype index by pls_integer;
v_arr_id typ_input;
begin
open c_input;
loop
fetch c_input
bulk collect into v_arr_id
limit 1000;
forall j in 1..v_arr_id.count
update record_statuses
set status = 'LOADED'
where id = v_arr_id(j).id;
exit when v_arr_id.count = 0;   
end loop;  
close c_input;
commit;
end;
/
set timing off

That’s a not-much-faster-at-all-really 141.449 seconds.

RankMethodRuntime (seconds)
1Simple SQL 0.798
2Forall (Limit 1000)141.449
3Cursor For Loop144.856

Once I’ve turned tracing on and re-executed these scripts, the reason for the performance discrepancy becomes clear.

Here’s the tkprof output for the Simple SQL Update. Remember, query is the number of buffers gotten for a consistent read :

UPDATE RECORD_STATUSES SET STATUS = 'LOADED' 
WHERE
 ID IN ( SELECT ID FROM INPUT_RECORDS)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          2           0
Execute      1      0.51       0.92        249        416     102416      100000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.51       0.92        249        416     102418      100000

As you’d expect, the Cursor For Loop performs a huge number of gets by comparison as it’s executing 100K update statements, rather than just one :

UPDATE RECORD_STATUSES SET STATUS = 'LOADED' 
WHERE
 ID = :B1 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute 100000    215.15     221.86        248   25104059     103919      100000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   100001    215.15     221.86        248   25104059     103919      100000

Now, although the Forall is only executing the insert 100 times, the number of gets is on a par with the Cursor For Loop :

UPDATE RECORD_STATUSES SET STATUS = 'LOADED' 
WHERE
 ID = :B1 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    100    217.41     220.58        248   25105754     103319      100000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      101    217.41     220.58        248   25105754     103319      100000

Further digging shows that it’s a similar story with the DELETEs.

At this point, maybe we should take another look at that documentation I quoted at the start. Specifically, the bit which says :

The FORALL statement runs one DML statement multiple times, with different values in the VALUES and WHERE clauses.

So, despite the tkprof execution count for the statement suggesting otherwise, it looks like Forall is actually running a seperate UPDATE statement for each value in the collection.

If we’re going to get this particular DML code to perform, we’ll need to look at storing our array in a structure other than a PL/SQL collection.

VARRAY

In their guise as a database object (as opposed to a PL/SQL collection), VARRAYs are directly visible to the SQL engine. This means that they can be directly referenced in SQL queries without the necessity for any context switching (varray_iud.sql) …

@setup.sql
-- If we try to declare and use the varray entirely within the PL/SQL block
-- we run into PLS-00642 : local collection types not allowed in SQL statements
-- so...
create or replace type prince_varray as varray(100000) of number
/
set timing on
declare
v_id_arr prince_varray := prince_varray();
begin
select id
bulk collect into v_id_arr
from input_records;
insert into target_table(id)
select * from table(v_id_arr);
update record_statuses
set status = 'LOADED'
where id in (select * from table(v_id_arr));
delete from input_records
where id in (select * from table(v_id_arr));
commit;
end;
/
set timing off

Side Note : some may consider it bad practice to name objects simply for the purposes of making a weak pun.

The big news is that the runtime is a rather respectable 1.846 seconds.

If you’re sure that you’re never going to need to process more than 32767 records in your VARRAY you can even save yourself the effort of coming up with a sensible name for one.
The following types are already created in Oracle for each of the base datatypes:

  • sys.ODCINumberList
  • sys.ODCIDateList
  • sys.ODCIVarchar2List

Incidentally, in case you’re wondering about the maximum size of a VARRAY

create or replace type varray_kane as varray(2147483647) of number
/

Type VARRAY_KANE compiled

Whilst a VARRAY is an option in this instance, in most cases, you’ll be dealing with a multi-dimensional array.
If only we could use a table…

Global Temporary Table

Using a Global Temporary Table (GTT) offers all of the advantages of using a VARRAY with the added bonus that it’ll work for multi-dimensional arrays (gtt_iud.sql).

@setup.sql
drop table gtt_array;
create global temporary table gtt_array (
id number)
on commit delete rows;
set timing on
begin
insert into gtt_array
select id from input_records;
insert into target_table(id)
select id from gtt_array;
update record_statuses
set status = 'LOADED'
where id in (select id from gtt_array);
delete from input_records
where id in (select id from gtt_array);
commit;
end;
/
set timing off

Runtime is 1.950 seconds.

If we look at our final results table for the runs that involved all three DML statements, we can see that the performance of the VARRAY and GTT methods is broadly similar to that of using simple SQL statements, mainly because they are also, effectively simple SQL statements.

I say “broadly similar” because I’ve not found to one method to be consistently faster than the other two. All of them are, however, much faster than Forall.

RankMethodRuntime (seconds)
1VARRAY 1.846
2Simple SQL 1.878
3GTT 1.950
4Forall224.162
5Cursor For Loop238.486

Conclusions

Using Forall for an INSERT is likely to be significantly faster than using a Cursor For Loop due to the reduction in context switching. However, this does not hold for other types of DML.

If you find yourself in circumstances where performance is an issue then you may want to consider using a database object, such as a VARRAY or GTT, in your array processing.

Loading...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK