FORALL DML – why context isn’t everything
source link: https://mikesmithers.wordpress.com/2023/03/14/forall-dml-why-context-isnt-everything/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
FORALL DML – why context isn’t everything
This post is the latest in an occasional series on the theme of stuff that doesn’t work quite how I thought it did.
It is the result of finding out the fun way that, rather than being “much faster” than using a humble Cursor For Loop , Bulk Collect/Forall for DML in PL/SQL can merely be a bit less slow.
Just in case my boss is reading this, I’d better get my excuses in right at the beginning.
This is what the Oracle PL/SQL Language Reference has to say about Forall :
A
FORALL
statement is usually much faster than an equivalentFOR
LOOP
statement. However, aFOR
LOOP
statement can contain multiple DML statements, while aFORALL
statement can contain only one. The batch of DML statements that aFORALL
statement sends to SQL differ only in theirVALUES
andWHERE
clauses. The values in those clauses must come from existing, populated collections.
…and later in the same document ….
The
FORALL
statement runs one DML statement multiple times, with different values in theVALUES
andWHERE
clauses.The different values come from existing, populated collections or host arrays. The
FORALL
statement is usually much faster than an equivalentFOR
LOOP
statement.
I’ll be testing that assertion in this post.
Using a simple test case, which involves performing different DML operations on different tables using a single record set, I’ll begin by comparing the relative performance of:
- simple SQL statements inside a PL/SQL block
- Cursor For Loops
- Bulk Collect/Forall operations
I’ll then explore the potential performance gains available using database objects such as VARRAYs and Global Temporary Tables to hold the array being used in the DML.
NOTE – If you want to run these tests yourself on your own environment/database version to validate these findings, you can find the scripts I’ve used here on my Github Repo.
Test Scenario
I have a single set of records that I want to use in an insert, an update and a delete.
Each of these operations has a different target table.
I’ve created a simple test case, using this script ( called setup.sql) :
clear screen set serverout on size unlimited drop table input_records; drop table record_statuses; drop table target_table; create table input_records as select rownum as id from dual connect by rownum <=100000; create table record_statuses as select id, 'WAITING' as status from input_records; create table target_table ( id number); |
The tests have all been run on Oracle Enterprise Edition 19c, although the behaviour described here appears to be consistent across Oracle Database versions from 11g upwards.
I’ve run each script twice in succession and then taken the fastest runtime. This is to ensure I’ve not been caught out by the effects of any caching. Unlikely as I drop and re-create the tables each time, but as I’ve already fallen foul of one assumption recently, I didn’t want to take any chances.
I’ve used tkprof to format tracefile information, so it’s probably worth bearing in mind the column heading definitions for tkprof output :
****************************************************************************** count = number of times OCI procedure was executed cpu = cpu time in seconds executing elapsed = elapsed time in seconds executing disk = number of physical reads of buffers from disk query = number of buffers gotten for consistent read current = number of buffers gotten in current mode (usually for update) rows = number of rows processed by the fetch or execute call ******************************************************************************
To generate the trace files, I’ve wrapped the anonymous blocks with the test code with the following statements.
Before the PL/SQL block containing the test code :
alter session set timed_statistics = true;
alter session set sql_trace = true;
alter session set tracefile_identifier = '<a string>';
…where <a string> is a string to include in the tracefile name to make it easier to find in the trace directory on the database server.
After the block :
alter session set sql_trace = false;
select value
from v$diag_info
where name = 'Default Trace File'
/
A Simple SQL Baseline
Let’s start by benchmarking just using SQL.
This is as fast as it gets in Oracle terms…generally speaking.
Therefore it should provide a useful indicator of what the RDBMS is capable of in terms of performing these DML operations ( straight_iud.sql ):
@setup.sql set timing on begin insert into target_table(id) select id from input_records; update record_statuses set status = 'LOADED' where id in ( select id from input_records); delete from input_records where id in ( select id from record_statuses where status = 'LOADED' ); commit ; end ; / set timing off |
Once the setup is done, the block completes in 1.878 seconds.
Whilst it may be faster than a fast thing on fast pills, a SQL-only solution is not always practical.
As Steven Feuerstein points out in his excellent article on this subject, there are times when you need PL/SQL to get the job done.
This is where the Bulk Collect/Forall and Cursor For Loop approaches come to the fore.
Let’s start by looking at how they compare when called upon to perform the first of our DML operations…
Inserts
Let’s start with the straight SQL option (straight_insert.sql ):
@setup.sql set timing on begin insert into target_table(id) select id from input_records; commit ; end ; / set timing off |
This runs in a predictably rapid 0.128 seconds.
Now for the Cursor For Loop ( forloop_insert.sql ):
@setup.sql set timing on declare v_count pls_integer := 0; begin for r_rec in ( select id from input_records) loop insert into target_table(id) values ( r_rec.id); end loop; commit ; end ; / set timing off |
…which completes in an, equally predictable, laggardly 1.445 seconds.
Incidentally, tracing this operation confirms that the select statement in a Cursor For Loop behaves in the same way as a Bulk Collect with the LIMIT set to 100. The Fetch count shows the 100K records have been fetched in batches of 100.
SELECT ID FROM INPUT_RECORDS call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 2 0.00 0.00 0 0 0 0 Execute 2 0.00 0.00 0 0 0 0 Fetch 1002 0.43 0.38 154 328059 0 100000 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 1006 0.43 0.39 154 328059 0 100000
This brings us to the much vaunted Forall.
If we start by running with a limit of 100 on the Bulk Collect, we can get an idea of how much difference it makes when we minimize the context switches between the SQL and PL/SQL engine as compared to the humble Cursor For Loop ( forall_insert_100.sql ):
@../setup.sql set timing on declare v_count pls_integer := 0; cursor c_input is select id from input_records; type typ_input is table of input_records%rowtype index by pls_integer; v_arr_id typ_input; begin open c_input; loop fetch c_input bulk collect into v_arr_id limit 100; forall i in 1..v_arr_id. count insert into target_table(id) values ( v_arr_id(i).id); exit when v_arr_id. count = 0; end loop; close c_input; commit ; end ; / set timing off |
Yep, that’s much faster at 0.165 seconds.
We can conclude from this that most of the Cursor For Loop execution time is due to context switching between the SQL and PL/SQL engines.
However, after some trial and error, I’ve found that setting a limit of 1000 is about optimal for the test case at hand – not using a limit clause actually proves to be a bit slower – so I’ll be using that in the rest of the Bulk Collect tests in this post. Starting with this ( forall_insert.sql ):
@../setup.sql set timing on declare v_count pls_integer := 0; cursor c_input is select id from input_records; type typ_input is table of input_records%rowtype index by pls_integer; v_arr_id typ_input; begin open c_input; loop fetch c_input bulk collect into v_arr_id limit 1000; forall i in 1..v_arr_id. count insert into target_table(id) values ( v_arr_id(i).id); exit when v_arr_id. count = 0; end loop; close c_input; commit ; end ; / set timing off |
This runs in 0.145 seconds.
Let’s tabulate those execution times for ease of comparison :
Rank | Method | Runtime (seconds) |
---|---|---|
1 | Simple SQL | 0.128 |
2 | Forall (Limit 1000) | 0.145 |
3 | Cursor For Loop | 1.445 |
So far, it all seems to be pretty much as you might expect.
OK, let’s try some more DML statements…
Running an Insert, Update and Delete
We’re going to insert records into a table as before.
Then, using the same record set, we’ll update a second table then delete from a third table.
We’ve already seen the unadorned SQL script for this test, so let’s move on to the Cursor For Loop (forloop_iud.sql). Once I kick this off, you may want to take the opportunity to go and make a coffee…
@setup.sql set timing on declare v_count pls_integer := 0; begin for r_rec in ( select id from input_records) loop insert into target_table(id) values ( r_rec.id); update record_statuses set status = 'LOADED' where id = r_rec.id; delete from input_records where id = r_rec.id; end loop; commit ; end ; / set timing off |
That took almost four minutes ( 238.486 seconds).
Let’s see what “much faster” looks like with Forall (forall_iud.sql ):
@setup.sql set timing on declare v_count pls_integer := 0; cursor c_input is select id from input_records; type typ_input is table of input_records%rowtype index by pls_integer; v_arr_id typ_input; begin open c_input; loop fetch c_input bulk collect into v_arr_id limit 1000; forall i in 1..v_arr_id. count insert into target_table(id) values ( v_arr_id(i).id); forall j in 1..v_arr_id. count update record_statuses set status = 'LOADED' where id = v_arr_id(j).id; forall k in 1..v_arr_id. count delete from input_records where id = v_arr_id(k).id; exit when v_arr_id. count = 0; end loop; close c_input; commit ; end ; / set timing off |
My coffee’s gone cold. That’s 224.162 seconds – around 6% faster if you’re interested.
I’m not sure that qualifies as “much faster”.
Remember, the simple SQL runtime for the same DML operations was 1.878 seconds.
Time for another table :
Rank | Method | Runtime (seconds) |
---|---|---|
1 | Simple SQL | 1.878 |
2 | Forall (Limit 1000) | 224.162 |
3 | Cursor For Loop | 238.486 |
In order to find out just what’s going on, it may be worth isolating a single DML operation, as we did with the INSERT statement earlier. This time, we’ll try an UPDATE…
Tracing an Update
The simple SQL benchmark for the update is ( straight_update.sql ):
@../setup.sql insert into target_table select id from input_records; commit ; set timing on begin update record_statuses set status = 'LOADED' where id in ( select id from input_records); commit ; end ; / set timing off |
This runs in 0.798 seconds.
Now for the Cursor For Loop ( forloop_update.sql):
@setup.sql insert into target_table select id from input_records; commit ; set timing on declare v_count pls_integer := 0; begin for r_rec in ( select id from input_records) loop update record_statuses set status = 'LOADED' where id = r_rec.id; end loop; commit ; end ; / set timing off |
144.856 seconds.
Now for the Forall (forall_update.sql ):
@setup.sql insert into target_table(id) select id from input_records; commit ; set timing on declare v_count pls_integer := 0; cursor c_input is select id from input_records; type typ_input is table of input_records%rowtype index by pls_integer; v_arr_id typ_input; begin open c_input; loop fetch c_input bulk collect into v_arr_id limit 1000; forall j in 1..v_arr_id. count update record_statuses set status = 'LOADED' where id = v_arr_id(j).id; exit when v_arr_id. count = 0; end loop; close c_input; commit ; end ; / set timing off |
That’s a not-much-faster-at-all-really 141.449 seconds.
Rank | Method | Runtime (seconds) |
---|---|---|
1 | Simple SQL | 0.798 |
2 | Forall (Limit 1000) | 141.449 |
3 | Cursor For Loop | 144.856 |
Once I’ve turned tracing on and re-executed these scripts, the reason for the performance discrepancy becomes clear.
Here’s the tkprof output for the Simple SQL Update. Remember, query is the number of buffers gotten for a consistent read :
UPDATE RECORD_STATUSES SET STATUS = 'LOADED' WHERE ID IN ( SELECT ID FROM INPUT_RECORDS) call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 2 0 Execute 1 0.51 0.92 249 416 102416 100000 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 2 0.51 0.92 249 416 102418 100000
As you’d expect, the Cursor For Loop performs a huge number of gets by comparison as it’s executing 100K update statements, rather than just one :
UPDATE RECORD_STATUSES SET STATUS = 'LOADED' WHERE ID = :B1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 100000 215.15 221.86 248 25104059 103919 100000 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 100001 215.15 221.86 248 25104059 103919 100000
Now, although the Forall is only executing the insert 100 times, the number of gets is on a par with the Cursor For Loop :
UPDATE RECORD_STATUSES SET STATUS = 'LOADED' WHERE ID = :B1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 100 217.41 220.58 248 25105754 103319 100000 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 101 217.41 220.58 248 25105754 103319 100000
Further digging shows that it’s a similar story with the DELETEs.
At this point, maybe we should take another look at that documentation I quoted at the start. Specifically, the bit which says :
The FORALL statement runs one DML statement multiple times, with different values in the VALUES and WHERE clauses.
So, despite the tkprof execution count for the statement suggesting otherwise, it looks like Forall is actually running a seperate UPDATE statement for each value in the collection.
If we’re going to get this particular DML code to perform, we’ll need to look at storing our array in a structure other than a PL/SQL collection.
VARRAY
In their guise as a database object (as opposed to a PL/SQL collection), VARRAYs are directly visible to the SQL engine. This means that they can be directly referenced in SQL queries without the necessity for any context switching (varray_iud.sql) …
@setup.sql -- If we try to declare and use the varray entirely within the PL/SQL block -- we run into PLS-00642 : local collection types not allowed in SQL statements -- so... create or replace type prince_varray as varray(100000) of number / set timing on declare v_id_arr prince_varray := prince_varray(); begin select id bulk collect into v_id_arr from input_records; insert into target_table(id) select * from table (v_id_arr); update record_statuses set status = 'LOADED' where id in ( select * from table (v_id_arr)); delete from input_records where id in ( select * from table (v_id_arr)); commit ; end ; / set timing off |
Side Note : some may consider it bad practice to name objects simply for the purposes of making a weak pun.
The big news is that the runtime is a rather respectable 1.846 seconds.
If you’re sure that you’re never going to need to process more than 32767 records in your VARRAY you can even save yourself the effort of coming up with a sensible name for one.
The following types are already created in Oracle for each of the base datatypes:
- sys.ODCINumberList
- sys.ODCIDateList
- sys.ODCIVarchar2List
Incidentally, in case you’re wondering about the maximum size of a VARRAY…
create or replace type varray_kane as varray(2147483647) of number / Type VARRAY_KANE compiled
Whilst a VARRAY is an option in this instance, in most cases, you’ll be dealing with a multi-dimensional array.
If only we could use a table…
Global Temporary Table
Using a Global Temporary Table (GTT) offers all of the advantages of using a VARRAY with the added bonus that it’ll work for multi-dimensional arrays (gtt_iud.sql).
@setup.sql drop table gtt_array; create global temporary table gtt_array ( id number) on commit delete rows ; set timing on begin insert into gtt_array select id from input_records; insert into target_table(id) select id from gtt_array; update record_statuses set status = 'LOADED' where id in ( select id from gtt_array); delete from input_records where id in ( select id from gtt_array); commit ; end ; / set timing off |
Runtime is 1.950 seconds.
If we look at our final results table for the runs that involved all three DML statements, we can see that the performance of the VARRAY and GTT methods is broadly similar to that of using simple SQL statements, mainly because they are also, effectively simple SQL statements.
I say “broadly similar” because I’ve not found to one method to be consistently faster than the other two. All of them are, however, much faster than Forall.
Rank | Method | Runtime (seconds) |
---|---|---|
1 | VARRAY | 1.846 |
2 | Simple SQL | 1.878 |
3 | GTT | 1.950 |
4 | Forall | 224.162 |
5 | Cursor For Loop | 238.486 |
Conclusions
Using Forall for an INSERT is likely to be significantly faster than using a Cursor For Loop due to the reduction in context switching. However, this does not hold for other types of DML.
If you find yourself in circumstances where performance is an issue then you may want to consider using a database object, such as a VARRAY or GTT, in your array processing.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK