2

Using a full outer join to “diff” two tables

 1 year ago
source link: https://mikesmithers.wordpress.com/2023/02/21/using-a-full-outer-join-to-diff-two-tables/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Using a full outer join to “diff” two tables

Posted on February 21, 2023

I was inspired to write this post by Joshua Ottwell’s thoughts on finding rows present in one table but not another.

What follows is an exploration of how we can use a Full Outer Join to perform a “diff” on the data in two tables. We’ll also look at doing something similar for two distinct result sets from the same table.

In this instance, we want to identify :

  • records that exist in the first table but not the second
  • records that exist in the second table but not the first
  • records that exist in both tables but where some values differ

Before going any further, I should say that the example that follows will make more sense if you consider Terry Pratchett’s observation that :

“In ancient times cats were worshipped as gods; they have not forgotten this.”

cleo_on_laptop1.jpg?w=942

You’ll need an offering if you want to use this computer !

The Family Firm

Let’s say I have a report that I want to change.
I’ve simulated the output of the report and captured it in a table like this :

create table employee_report_baseline( 
    id number,
    emp_name varchar2(100),
    job_title varchar2(100) )
/

insert into employee_report_baseline( id, emp_name, job_title)
values(1,'DEBBIE', 'CEO');

insert into employee_report_baseline( id, emp_name, job_title)
values(2, 'MIKE', 'MINION');

insert into employee_report_baseline( id, emp_name, job_title)
values(3, 'TEDDY', 'DOG');

commit;

The output of the new version of the report is simulated like this :

create table employee_report_new 
    as 
    select * 
    from employee_report_baseline 
    where emp_name != 'MIKE';

update employee_report_new
set job_title = 'LADY-IN-WAITING'
where emp_name = 'DEBBIE';

insert into employee_report_new( id, emp_name, job_title)
values(4, 'CLEO', 'CAT');

commit;

Identifying missing or new records

First of all, let’s see if the records returned are the same for both reports. Happily ID is a unique key for the dataset, which makes the comparison fairly simple :

select bsl.id, nr.id,
case
when bsl.id is null then 'Added in New Report'
when nr.id is null then 'Missing from New Report'
end as status
from employee_report_baseline bsl
full outer join employee_report_new nr
on bsl.id = nr.id
where bsl.id is null
or nr.id is null
order by 1 nulls last;   

The absence of the key value from either table indicates that the record is in one table but not the other.

Sure enough, when we run this query we get :

BASELINE_ID NEW_ID STATUS                        
----------- ------ ------------------------------
          2        Missing from New Report       
                 4 Added in New Report           

All Differences

If we want to identify all of the differences, including changed records, we can do so with just a little extra typing :

select
nvl(bsl.id, nr.id) as id,
bsl.emp_name as old_emp_name,
nr.emp_name as new_emp_name,
bsl.job_title as old_job_title,
nr.job_title as new_job_title,
case
when bsl.id is null then 'Added in New Report'
when nr.id is null then 'Missing from New Report'
else 'Different between Report Versions'
end as status   
from employee_report_baseline bsl
full outer join employee_report_new nr
on bsl.id = nr.id
where bsl.id is null
or nr.id is null
or nvl(bsl.emp_name, 'X') != nvl(nr.emp_name, 'Y')
or nvl(bsl.job_title, 'X') != nvl(nr.job_title, 'Y')
order by 1 nulls last
/

As well as the two records that are missing from either result set, we can see that Debbie has been demoted ( or possibly usurped). Cleo is now the top dog(!) :

diff_tabs.png?w=995

Comparing table definitions

One thing to bear in mind when using a full outer join is that it will match any row in the tables being joined.
This gets a bit annoying when you want to do something with a subset of data such as comparing the column definitions of two tables in USER_TAB_COLUMNS in Oracle.

To demonstrate :

create table old_firm(
    id number,
    first_name varchar2(500),
    last_name varchar2(500),
    job_title varchar2(100),
    start_date date,
    end_date date)
/

create table new_firm(
    id number,
    first_name varchar2(250),
    last_name varchar2(500),
    salary number,
    start_date date,
    end_date varchar2(20))
/

If we want to compare only the records relating to these two tables then we’ll need a couple of in-line-views to restrict the result sets that our full outer join will look at.
Incidentally, as we’re running this on Oracle 19c, we can throw in an in-line function as well to save a bit of typing…

with function col_diff(
i_old_name in varchar2 default null,
i_new_name in varchar2 default null,
i_old_type in varchar2 default null,
i_new_type in varchar2 default null,
i_old_len in number default null,
i_new_len in number default null)
return varchar2
is
begin
if i_old_name is null then
return 'NEW';
elsif i_new_name is null then   
return 'MISSING';
end if;   
-- compare the attributes for the columns
if i_old_type != i_new_type then
return 'TYPE';
elsif i_old_len != i_new_len then
return 'LENGTH';
end if;
end;   
ofm as (
select column_name, data_type, data_length
from user_tab_columns
where table_name = 'OLD_FIRM'),
nfm as (   
select column_name, data_type, data_length
from user_tab_columns
where table_name = 'NEW_FIRM')
select
nvl(ofm.column_name, nfm.column_name) as column_name,
case col_diff( ofm.column_name, nfm.column_name, ofm.data_type, nfm.data_type, ofm.data_length, nfm.data_length)
when 'NEW' then 'New Column'
when 'MISSING' then 'Missing Column in NEW_FIRM'
when 'TYPE' then 'Type Mismatch. OLD_FIRM type is '||ofm.data_type||' NEW_FIRM is '||nfm.data_type
when 'LENGTH' then 'Length Mismatch. OLD_FIRM length is '||ofm.data_length||' NEW_FIRM is '||nfm.data_length
end as status   
from ofm
full outer join nfm
on ofm.column_name = nfm.column_name
where (ofm.column_name is null or nfm.column_name is null)
or( ofm.data_type != nfm.data_type or ofm.data_length != nfm.data_length)
/

user_tab_col_diffs.png?w=878

I’ll have to leave it there. The “boss” is demanding Dreamies to appease her wrath for being referred to as “top dog” earlier.

Loading...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK