7

Generating CSV files from PL/SQL the Very Easy Way.

 3 years ago
source link: https://mikesmithers.wordpress.com/2020/02/13/generating-csv-files-from-pl-sql-the-very-easy-way/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Generating CSV files from PL/SQL the Very Easy Way.

Posted on February 13, 2020

This post is dedicated to Morten Braten and William Robertson.
Thanks to both of you for saving me a lot of time (and typing)…

XML, YAML, JSON. When it comes to transferring data between systems, they’ve all been flavour of the month at one time or another. However, good old CSV is still the format of choice when the batch window opens.
Not only is it simple, it adds comparatively little overhead to the size of the data file being generated.

OK – CSV used to mean “Comma-delimited” but these days it’s become synonymous with delimited data.
Whatever separator character you use, generating CSV is considerably easier these days…unless you’re using PL/SQL in a batch (as opposed to interactive) process.
That said, Connor MacDonald does have a clever way of utilising the functionality available in a command line tool such as SQL*Plus by means of a DBMS_SCHEDULER job.

If for some reason that won’t work for you, it looks like you’re going to have to hack out some custom PL/SQL to get the job done…or you could have a look at a couple of the options available in what other technologies would refer to as “the ecosystem”.
What I’m going to cover is :

The pain of hand-cranking delimited SQL A solution offered in Morten Braten’s Alexandria PL/SQL Utility Library An alternative solution made available by William Robertson The final PL/SQL procedure

Hand-cranking delimited code

We’re working in a Data Warehouse running on Oracle. We need to provide a data feed to a downstream system.
The feed is comma separated with values enclosed by quotes.
The query to generate the data for the feed is reasonably straight forward :

select dept.department_id, dept.department_name,
loc.city, coun.country_id
from departments dept
inner join locations loc
on dept.location_id = loc.location_id
inner join countries coun
on loc.country_id = coun.country_id 
inner join regions reg
on reg.region_id = coun.region_id
where reg.region_name = 'Americas'-- this will be passed as a parameter in the run. 

All ok so far. However, when it comes to tweaking the query to generate the output, things get a bit more fiddly.
The query in the package will have to change to something like this :

select '"'||dept.department_id||'",'
||'"'||dept.department_name||'",'
||'"'||loc.city||'",'
||'"'||coun.country_id||'"'
from departments dept
inner join locations loc
on dept.location_id = loc.location_id
inner join countries coun
on loc.country_id = coun.country_id 
inner join regions reg
on reg.region_id = coun.region_id
where reg.region_name = 'Americas';   

Typically, these sorts of files are generated with dozens (sometimes hundreds) of attributes. If looking at all of those quotes and commas is going to give you a headache, you might seek to mitigate the worst effects by replacing the characters with their ASCII values :

select chr(34)||dept.department_id||chr(34)||chr(44)
||chr(34)||dept.department_name||chr(34)||chr(44)
||chr(34)||loc.city||chr(34)||chr(44)
||chr(34)||coun.country_id||chr(34)
from departments dept
inner join locations loc
on dept.location_id = loc.location_id
inner join countries coun
on loc.country_id = coun.country_id 
inner join regions reg
on reg.region_id = coun.region_id
where reg.region_name = 'Americas';   

The double-quotes surrounding the data values are designed to escape any commas in the data and prevent them from being read as field terminators. Unfortunately, this doesn’t help much if the data also contains double-quotes. For this reason, I tend to prefer a pipe character as a delimiter. This allows us to dispense with the double-quotes. As a double-pipe is the SQL concatenation operator, let’s use the ASCII value instead for the delimiter itself :

select dept.department_id||chr(124)
||dept.department_name||chr(124)
||loc.city||chr(124)
||coun.country_id
from departments dept
inner join locations loc
on dept.location_id = loc.location_id
inner join countries coun
on loc.country_id = coun.country_id 
inner join regions reg
on reg.region_id = coun.region_id
where reg.region_name = 'Americas';     

Wouldn’t it be nice if there was some way we could just persuade Oracle to magically write our delimiters for us.
Well, I don’t know about Oracle, but Morten Braten will have a go for you…

The Alexandria CSV_UTIL_PKG package

As with most Alexandria packages, there’s a demo on how to use CSV_UTIL_PKG in demos/csv_util_pkg_demo.sql.
However, as explained in Morten’s post on this subject, you need to create some types first …

create type t_str_array as table of varchar2(4000);
/
create type t_csv_line as object (
line_number  number,
line_raw     varchar2(4000),
c001         varchar2(4000),
c002         varchar2(4000),
c003         varchar2(4000),
c004         varchar2(4000),
c005         varchar2(4000),
c006         varchar2(4000),
c007         varchar2(4000),
c008         varchar2(4000),
c009         varchar2(4000),
c010         varchar2(4000),
c011         varchar2(4000),
c012         varchar2(4000),
c013         varchar2(4000),
c014         varchar2(4000),
c015         varchar2(4000),
c016         varchar2(4000),
c017         varchar2(4000),
c018         varchar2(4000),
c019         varchar2(4000),
c020         varchar2(4000)
);
/
create type t_csv_tab as table of t_csv_line;
/

With these types in place, we can install the package from the Alexandria GitHub repo.
The files we’re looking for are under the ora directory:

  • csv_util_pkg.pks
  • csv_util_pkg.pkb

Download them and run them in the order they are listed here ( the .pks is the header and the .pkb is the body).

Now, we can take some of the commas out of our code…

select csv_util_pkg.array_to_csv(
t_str_array(
dept.department_id, dept.department_name,
loc.city, coun.country_id))
from departments dept
inner join locations loc
on dept.location_id = loc.location_id
inner join countries coun
on loc.country_id = coun.country_id 
inner join regions reg
on reg.region_id = coun.region_id
where reg.region_name = 'Americas';

By default, the results are output using a comma as the separator. However, we can change this easily enough:

select csv_util_pkg.array_to_csv(
t_str_array(dept.department_id, dept.department_name, loc.city, coun.country_id),
chr(124))
from departments dept
inner join locations loc
on dept.location_id = loc.location_id
inner join countries coun
on loc.country_id = coun.country_id 
inner join regions reg
on reg.region_id = coun.region_id
where reg.region_name = 'Americas';

The output is pretty much what we’re looking for :

60|IT|Southlake|US
50|Shipping|South San Francisco|US
10|Administration|Seattle|US
30|Purchasing|Seattle|US
90|Executive|Seattle|US
100|Finance|Seattle|US
110|Accounting|Seattle|US
120|Treasury|Seattle|US
130|Corporate Tax|Seattle|US
140|Control And Credit|Seattle|US
150|Shareholder Services|Seattle|US
160|Benefits|Seattle|US
170|Manufacturing|Seattle|US
180|Construction|Seattle|US
190|Contracting|Seattle|US
200|Operations|Seattle|US
210|IT Support|Seattle|US
220|NOC|Seattle|US
230|IT Helpdesk|Seattle|US
240|Government Sales|Seattle|US
250|Retail Sales|Seattle|US
260|Recruiting|Seattle|US
270|Payroll|Seattle|US
20|Marketing|Toronto|CA

Now all you need to do is put this into your PL/SQL procedure and write some code to save it to a file ( possibly using the Alexandria FILE_UTIL_PKG package).
If that seems like too much effort though…

William Robertson’s CSV package

Whilst the Alexandria package is a bit of an all-rounder, William Robertson has produced something that’s rather more tailored to producing CSV files.

The package – simply called CSV – has a function that produces delimited format from a refcursor. It also contains a procedure to write csv data to a file.
Let’s take a closer look…

Installation of the package simply involves downloading the code from here, and running it. Both the package header and body are in a single file – csv.pkg.

The package’s REPORT function takes a slightly different approach in that it takes a ref cursor as an argument :

select column_value
from table(csv.report(cursor(
select dept.department_id, dept.department_name,
loc.city, coun.country_id
from departments dept
inner join locations loc
on dept.location_id = loc.location_id
inner join countries coun
on loc.country_id = coun.country_id 
inner join regions reg
on reg.region_id = coun.region_id
where reg.region_name = 'Americas')))
/   

Once again, the delimiter character is comma by default. Once again, we can override this :

select column_value
from table(csv.report(cursor(
select dept.department_id, dept.department_name,
loc.city, coun.country_id
from departments dept
inner join locations loc
on dept.location_id = loc.location_id
inner join countries coun
on loc.country_id = coun.country_id 
inner join regions reg
on reg.region_id = coun.region_id
where reg.region_name = 'Americas'),
chr(124)))
/   

We can go even further. You can add common things that you may need to include in a csv that’s serving as a feed file for another system. These include :

  • a header record with a delimited list of attributes
  • a label attribute for each row of data to make these rows easier to identify for the program loading the csv
  • a rowcount as a trailer record

All of which can be accomplished thus :

select column_value
from table(csv.report(cursor(
select dept.department_id, dept.department_name,
loc.city, coun.country_id
from departments dept
inner join locations loc
on dept.location_id = loc.location_id
inner join countries coun
on loc.country_id = coun.country_id 
inner join regions reg
on reg.region_id = coun.region_id
where reg.region_name = 'Americas'),
p_separator => chr(124),
p_label => 'DATA',
p_heading => 'Y',
p_rowcount => 'Y'))
/   
[show output]

Not only that, but the package also facilitates creating the file itself…

Generating a file from the CSV package

First of all, we need access to a directory. So, the DBA needs to do something like :

grant read, write on directory app_files to hr;

Now, connected as HR, we can create a PL/SQL procedure to generate our csv files like this :

create or replace procedure depts_by_region( i_region regions.region_name%type)
as
l_fname varchar2(100);
rc sys_refcursor;
begin
l_fname := i_region||'_depts.csv';
open rc for
select dept.department_id, dept.department_name,
loc.city, coun.country_id
from departments dept
inner join locations loc
on dept.location_id = loc.location_id
inner join countries coun
on loc.country_id = coun.country_id 
inner join regions reg
on reg.region_id = coun.region_id
where reg.region_name = i_region;
csv.write_file(
p_dataset => rc,
p_separator => chr(124),
p_label => 'DATA',
p_heading => 'Y',
p_rowcount => 'Y',
p_directory => 'APP_FILES',
p_filename => l_fname);
end;
/

…and run it like this :

begin
depts_by_region('Americas');
end;
/

Sure enough, when we look on the server, we can see :

[mike@frea ~]$ cd /opt/oracle/app_files/
[mike@frea app_files]$ ls -l *.csv
-rw-r--r-- 1 oracle oinstall 840 Feb 11 20:56 Americas_depts.csv

…and the file looks like this :

HEADING|DATA|DEPARTMENT_ID|DEPARTMENT_NAME|CITY|COUNTRY_ID
DATA|60|IT|Southlake|US
DATA|50|Shipping|South San Francisco|US
DATA|10|Administration|Seattle|US
DATA|30|Purchasing|Seattle|US
DATA|90|Executive|Seattle|US
DATA|100|Finance|Seattle|US
DATA|110|Accounting|Seattle|US
DATA|120|Treasury|Seattle|US
DATA|130|Corporate Tax|Seattle|US
DATA|140|Control And Credit|Seattle|US
DATA|150|Shareholder Services|Seattle|US
DATA|160|Benefits|Seattle|US
DATA|170|Manufacturing|Seattle|US
DATA|180|Construction|Seattle|US
DATA|190|Contracting|Seattle|US
DATA|200|Operations|Seattle|US
DATA|210|IT Support|Seattle|US
DATA|220|NOC|Seattle|US
DATA|230|IT Helpdesk|Seattle|US
DATA|240|Government Sales|Seattle|US
DATA|250|Retail Sales|Seattle|US
DATA|260|Recruiting|Seattle|US
DATA|270|Payroll|Seattle|US
DATA|20|Marketing|Toronto|CA
ROW_COUNT|DATA|24

There are some limitations. For example, William points out that any row that’s longer than 4000 characters will cause the package to break. However, unless you’re generating fairly “wide” lines, he’s pretty much written your code for you.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK