FORALL DML – why context isn’t everything

Posted on March 14, 2023

This post is the latest in an occasional series on the theme of stuff that doesn’t work quite how I thought it did.

It is the result of finding out the fun way that, rather than being “much faster” than using a humble Cursor For Loop , Bulk Collect/Forall for DML in PL/SQL can merely be a bit less slow.

Just in case my boss is reading this, I’d better get my excuses in right at the beginning.

This is what the Oracle PL/SQL Language Reference has to say about Forall :

A FORALL statement is usually much faster than an equivalent FOR LOOP statement. However, a FOR LOOP statement can contain multiple DML statements, while a FORALL statement can contain only one. The batch of DML statements that a FORALL statement sends to SQL differ only in their VALUES and WHERE clauses. The values in those clauses must come from existing, populated collections.

…and later in the same document ….

The FORALL statement runs one DML statement multiple times, with different values in the VALUES and WHERE clauses.

The different values come from existing, populated collections or host arrays. The FORALL statement is usually much faster than an equivalent FOR LOOP statement.

I’ll be testing that assertion in this post.

Using a simple test case, which involves performing different DML operations on different tables using a single record set, I’ll begin by comparing the relative performance of:

simple SQL statements inside a PL/SQL block
Cursor For Loops
Bulk Collect/Forall operations

I’ll then explore the potential performance gains available using database objects such as VARRAYs and Global Temporary Tables to hold the array being used in the DML.

NOTE – If you want to run these tests yourself on your own environment/database version to validate these findings, you can find the scripts I’ve used here on my Github Repo.

Test Scenario

I have a single set of records that I want to use in an insert, an update and a delete.
Each of these operations has a different target table.

I’ve created a simple test case, using this script ( called setup.sql) :

clear screen

set serverout on size unlimited

drop table input_records;

drop table record_statuses;

drop table target_table;

create table input_records

as

select rownum as id

from dual

connect by rownum <=100000;

create table record_statuses

as

select id, 'WAITING' as status

from input_records;

create table target_table ( id number);

The tests have all been run on Oracle Enterprise Edition 19c, although the behaviour described here appears to be consistent across Oracle Database versions from 11g upwards.

I’ve run each script twice in succession and then taken the fastest runtime. This is to ensure I’ve not been caught out by the effects of any caching. Unlikely as I drop and re-create the tables each time, but as I’ve already fallen foul of one assumption recently, I didn’t want to take any chances.

I’ve used tkprof to format tracefile information, so it’s probably worth bearing in mind the column heading definitions for tkprof output :

******************************************************************************
count    = number of times OCI procedure was executed
cpu      = cpu time in seconds executing 
elapsed  = elapsed time in seconds executing
disk     = number of physical reads of buffers from disk
query    = number of buffers gotten for consistent read
current  = number of buffers gotten in current mode (usually for update)
rows     = number of rows processed by the fetch or execute call
******************************************************************************

To generate the trace files, I’ve wrapped the anonymous blocks with the test code with the following statements.

Before the PL/SQL block containing the test code :

alter session set timed_statistics = true;
alter session set sql_trace = true;
alter session set tracefile_identifier = '<a string>';

…where <a string> is a string to include in the tracefile name to make it easier to find in the trace directory on the database server.

After the block :

alter session set sql_trace = false;

select value
from v$diag_info
where name = 'Default Trace File'
/

A Simple SQL Baseline

Let’s start by benchmarking just using SQL.
This is as fast as it gets in Oracle terms…generally speaking.

Therefore it should provide a useful indicator of what the RDBMS is capable of in terms of performing these DML operations ( straight_iud.sql ):

@setup.sql

set timing on

begin

insert into target_table(id)

select id from input_records;

update record_statuses

set status = 'LOADED'

where id in ( select id from input_records);

delete from input_records

where id in ( select id from record_statuses where status = 'LOADED');

commit;

end;

/

set timing off

Once the setup is done, the block completes in 1.878 seconds.

Whilst it may be faster than a fast thing on fast pills, a SQL-only solution is not always practical.
As Steven Feuerstein points out in his excellent article on this subject, there are times when you need PL/SQL to get the job done.

This is where the Bulk Collect/Forall and Cursor For Loop approaches come to the fore.
Let’s start by looking at how they compare when called upon to perform the first of our DML operations…

Inserts

Let’s start with the straight SQL option (straight_insert.sql ):

@setup.sql

set timing on

begin

insert into target_table(id)

select id from input_records;

commit;

end;

/

set timing off

This runs in a predictably rapid 0.128 seconds.

Now for the Cursor For Loop ( forloop_insert.sql ):

@setup.sql

set timing on

declare

v_count pls_integer := 0;

begin

for r_rec in(

select id

from input_records)

loop

insert into target_table(id)

values( r_rec.id);

end loop;

commit;

end;

/

set timing off

…which completes in an, equally predictable, laggardly 1.445 seconds.

Incidentally, tracing this operation confirms that the select statement in a Cursor For Loop behaves in the same way as a Bulk Collect with the LIMIT set to 100. The Fetch count shows the 100K records have been fetched in batches of 100.

SELECT ID 
FROM
 INPUT_RECORDS


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch     1002      0.43       0.38        154     328059          0      100000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     1006      0.43       0.39        154     328059          0      100000

This brings us to the much vaunted Forall.
If we start by running with a limit of 100 on the Bulk Collect, we can get an idea of how much difference it makes when we minimize the context switches between the SQL and PL/SQL engine as compared to the humble Cursor For Loop ( forall_insert_100.sql ):

@../setup.sql

set timing on

declare

v_count pls_integer := 0;

cursor c_input is

select id

from input_records;

type typ_input is table of input_records%rowtype index by pls_integer;

v_arr_id typ_input;

begin

open c_input;

loop

fetch c_input

bulk collect into v_arr_id

limit 100;

forall i in 1..v_arr_id.count

insert into target_table(id)

values( v_arr_id(i).id);

exit when v_arr_id.count = 0;

end loop;

close c_input;

commit;

end;

/

set timing off

Yep, that’s much faster at 0.165 seconds.

We can conclude from this that most of the Cursor For Loop execution time is due to context switching between the SQL and PL/SQL engines.

However, after some trial and error, I’ve found that setting a limit of 1000 is about optimal for the test case at hand – not using a limit clause actually proves to be a bit slower – so I’ll be using that in the rest of the Bulk Collect tests in this post. Starting with this ( forall_insert.sql ):

@../setup.sql

set timing on

declare

v_count pls_integer := 0;

cursor c_input is

select id

from input_records;

type typ_input is table of input_records%rowtype index by pls_integer;

v_arr_id typ_input;

begin

open c_input;

loop

fetch c_input

bulk collect into v_arr_id

limit 1000;

forall i in 1..v_arr_id.count

insert into target_table(id)

values( v_arr_id(i).id);

exit when v_arr_id.count = 0;

end loop;

close c_input;

commit;

end;

/

set timing off

This runs in 0.145 seconds.

Let’s tabulate those execution times for ease of comparison :

Rank	Method	Runtime (seconds)
1	Simple SQL	0.128
2	Forall (Limit 1000)	0.145
3	Cursor For Loop	1.445

So far, it all seems to be pretty much as you might expect.
OK, let’s try some more DML statements…

Running an Insert, Update and Delete

We’re going to insert records into a table as before.
Then, using the same record set, we’ll update a second table then delete from a third table.

We’ve already seen the unadorned SQL script for this test, so let’s move on to the Cursor For Loop (forloop_iud.sql). Once I kick this off, you may want to take the opportunity to go and make a coffee…

@setup.sql

set timing on

declare

v_count pls_integer := 0;

begin

for r_rec in(

select id

from input_records)

loop

insert into target_table(id)

values( r_rec.id);

update record_statuses

set status = 'LOADED'

where id = r_rec.id;

delete from input_records

where id = r_rec.id;

end loop;

commit;

end;

/

set timing off

That took almost four minutes ( 238.486 seconds).

Let’s see what “much faster” looks like with Forall (forall_iud.sql ):

@setup.sql

set timing on

declare

v_count pls_integer := 0;

cursor c_input is

select id

from input_records;

type typ_input is table of input_records%rowtype index by pls_integer;

v_arr_id typ_input;

begin

open c_input;

loop

fetch c_input

bulk collect into v_arr_id

limit 1000;

forall i in 1..v_arr_id.count

insert into target_table(id)

values( v_arr_id(i).id);

forall j in 1..v_arr_id.count

update record_statuses

set status = 'LOADED'

where id = v_arr_id(j).id;

forall k in 1..v_arr_id.count

delete from input_records

where id = v_arr_id(k).id;

exit when v_arr_id.count = 0;

end loop;

close c_input;

commit;

end;

/

set timing off

My coffee’s gone cold. That’s 224.162 seconds – around 6% faster if you’re interested.

I’m not sure that qualifies as “much faster”.

Remember, the simple SQL runtime for the same DML operations was 1.878 seconds.

Time for another table :

Rank	Method	Runtime (seconds)
1	Simple SQL	1.878
2	Forall (Limit 1000)	224.162
3	Cursor For Loop	238.486

In order to find out just what’s going on, it may be worth isolating a single DML operation, as we did with the INSERT statement earlier. This time, we’ll try an UPDATE…

Tracing an Update

The simple SQL benchmark for the update is ( straight_update.sql ):

@../setup.sql

insert into target_table

select id from input_records;

commit;

set timing on

begin

update record_statuses

set status = 'LOADED'

where id in ( select id from input_records);

commit;

end;

/

set timing off

This runs in 0.798 seconds.

Now for the Cursor For Loop ( forloop_update.sql):

@setup.sql

insert into target_table

select id from input_records;

commit;

set timing on

declare

v_count pls_integer := 0;

begin

for r_rec in(

select id

from input_records)

loop

update record_statuses

set status = 'LOADED'

where id = r_rec.id;

end loop;

commit;

end;

/

set timing off

144.856 seconds.

Now for the Forall (forall_update.sql ):

@setup.sql

insert into target_table(id)

select id from input_records;

commit;

set timing on

declare

v_count pls_integer := 0;

cursor c_input is

select id

from input_records;

type typ_input is table of input_records%rowtype index by pls_integer;

v_arr_id typ_input;

begin

open c_input;

loop

fetch c_input

bulk collect into v_arr_id

limit 1000;

forall j in 1..v_arr_id.count

update record_statuses

set status = 'LOADED'

where id = v_arr_id(j).id;

exit when v_arr_id.count = 0;

end loop;

close c_input;

commit;

end;

/

set timing off

That’s a not-much-faster-at-all-really 141.449 seconds.

Rank	Method	Runtime (seconds)
1	Simple SQL	0.798
2	Forall (Limit 1000)	141.449
3	Cursor For Loop	144.856

Once I’ve turned tracing on and re-executed these scripts, the reason for the performance discrepancy becomes clear.

Here’s the tkprof output for the Simple SQL Update. Remember, query is the number of buffers gotten for a consistent read :

UPDATE RECORD_STATUSES SET STATUS = 'LOADED' 
WHERE
 ID IN ( SELECT ID FROM INPUT_RECORDS)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          2           0
Execute      1      0.51       0.92        249        416     102416      100000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.51       0.92        249        416     102418      100000

As you’d expect, the Cursor For Loop performs a huge number of gets by comparison as it’s executing 100K update statements, rather than just one :

UPDATE RECORD_STATUSES SET STATUS = 'LOADED' 
WHERE
 ID = :B1 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute 100000    215.15     221.86        248   25104059     103919      100000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   100001    215.15     221.86        248   25104059     103919      100000

Now, although the Forall is only executing the insert 100 times, the number of gets is on a par with the Cursor For Loop :

UPDATE RECORD_STATUSES SET STATUS = 'LOADED' 
WHERE
 ID = :B1 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    100    217.41     220.58        248   25105754     103319      100000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      101    217.41     220.58        248   25105754     103319      100000

Further digging shows that it’s a similar story with the DELETEs.

At this point, maybe we should take another look at that documentation I quoted at the start. Specifically, the bit which says :

The FORALL statement runs one DML statement multiple times, with different values in the VALUES and WHERE clauses.

So, despite the tkprof execution count for the statement suggesting otherwise, it looks like Forall is actually running a seperate UPDATE statement for each value in the collection.

If we’re going to get this particular DML code to perform, we’ll need to look at storing our array in a structure other than a PL/SQL collection.

VARRAY

In their guise as a database object (as opposed to a PL/SQL collection), VARRAYs are directly visible to the SQL engine. This means that they can be directly referenced in SQL queries without the necessity for any context switching (varray_iud.sql) …

@setup.sql

-- If we try to declare and use the varray entirely within the PL/SQL block

-- we run into PLS-00642 : local collection types not allowed in SQL statements

-- so...

create or replace type prince_varray as varray(100000) of number

/

set timing on

declare

v_id_arr prince_varray := prince_varray();

begin

select id

bulk collect into v_id_arr

from input_records;

insert into target_table(id)

select * from table(v_id_arr);

update record_statuses

set status = 'LOADED'

where id in (select * from table(v_id_arr));

delete from input_records

where id in (select * from table(v_id_arr));

commit;

end;

/

set timing off

Side Note : some may consider it bad practice to name objects simply for the purposes of making a weak pun.

The big news is that the runtime is a rather respectable 1.846 seconds.

If you’re sure that you’re never going to need to process more than 32767 records in your VARRAY you can even save yourself the effort of coming up with a sensible name for one.
The following types are already created in Oracle for each of the base datatypes:

sys.ODCINumberList
sys.ODCIDateList
sys.ODCIVarchar2List

Incidentally, in case you’re wondering about the maximum size of a VARRAY…

create or replace type varray_kane as varray(2147483647) of number
/

Type VARRAY_KANE compiled

Whilst a VARRAY is an option in this instance, in most cases, you’ll be dealing with a multi-dimensional array.
If only we could use a table…

Global Temporary Table

Using a Global Temporary Table (GTT) offers all of the advantages of using a VARRAY with the added bonus that it’ll work for multi-dimensional arrays (gtt_iud.sql).

@setup.sql

drop table gtt_array;

create global temporary table gtt_array (

id number)

on commit delete rows;

set timing on

begin

insert into gtt_array

select id from input_records;

insert into target_table(id)

select id from gtt_array;

update record_statuses

set status = 'LOADED'

where id in (select id from gtt_array);

delete from input_records

where id in (select id from gtt_array);

commit;

end;

/

set timing off

Runtime is 1.950 seconds.

If we look at our final results table for the runs that involved all three DML statements, we can see that the performance of the VARRAY and GTT methods is broadly similar to that of using simple SQL statements, mainly because they are also, effectively simple SQL statements.

I say “broadly similar” because I’ve not found to one method to be consistently faster than the other two. All of them are, however, much faster than Forall.

Rank	Method	Runtime (seconds)
1	VARRAY	1.846
2	Simple SQL	1.878
3	GTT	1.950
4	Forall	224.162
5	Cursor For Loop	238.486

Conclusions

Using Forall for an INSERT is likely to be significantly faster than using a Cursor For Loop due to the reduction in context switching. However, this does not hold for other types of DML.

If you find yourself in circumstances where performance is an issue then you may want to consider using a database object, such as a VARRAY or GTT, in your array processing.

FORALL DML – why context isn’t everything

FORALL DML – why context isn’t everything

Test Scenario

A Simple SQL Baseline

Inserts

Running an Insert, Update and Delete

Tracing an Update

VARRAY

Global Temporary Table

Conclusions

Recommend

Breachers Secures An April 13 Release On Quest, Pico & PC VR

欧雷说：「那些什么实战课、源码分析课等眼瞅着就要成为废章，还去花钱学的真是敢死队...

想买破壁机，一定要记住“3买3不买”，都是过来人的经验！

New study says 87 percent of UK online adults have seen content that could be sc...

文心一言算力从哪来？自家最大智算中心：算力规模4EFLOPS，base李彦宏老家

把女模改装成电脑，看得我背后发凉|机箱|主机|图吧|配件|电磁炉|电子设备_网易订阅

Get this ASUS Wi-Fi 6 dual band gaming router for 31 percent off right now

为什么 Apple 移除了 MacBook 的 Touch Bar？

Lawsuit: Cop pulled over driver for TikTok livestream—and shared driver’s ID

WordPress 设置回复可见

About Joyk