3

The Ultimate Question of Life, the Universe, and… how big is my Oracle Table ?

 11 months ago
source link: https://mikesmithers.wordpress.com/2023/05/15/the-ultimate-question-of-life-the-universe-and-how-big-is-my-oracle-table/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

The Ultimate Question of Life, the Universe, and… how big is my Oracle Table ?

Posted on May 15, 2023

At a time when the debate rages about how much you should trust what AI tells you, it’s probably worth recalling Deep Thought’s wildly incorrect assertion that the answer to the Ultimate Question of Life, the Universe, and Everything is forty-two.
As any Database specialist will know, the answer is the same as it is to the question “How big is my Oracle Table ?” which is, of course, “It depends”.

What it depends on is whether you want to know the volume of data held in the table, or the amount of space the database is using to store the table and any associated segments (e.g. indexes).

Connecting to my trusty Free Tier OCI Oracle Instance ( running 19c Enterprise Edition), I’ve set out on my journey through (disk) space to see if I can find some more definitive answers…

How big is my table in terms of the raw data stored in it ?

Before going any further, I should be clear on the database language settings and character set that’s being used in the examples that follow. Note particularly that I’m not using a multi-byte character set :

select parameter, value    
from gv$nls_parameters
order by parameter
/


PARAMETER                      VALUE                         
------------------------------ ------------------------------
NLS_CALENDAR                   GREGORIAN                     
NLS_CHARACTERSET               AL32UTF8                      
NLS_COMP                       BINARY                        
NLS_CURRENCY                   £                             
NLS_DATE_FORMAT                DD-MON-YYYY                   
NLS_DATE_LANGUAGE              ENGLISH                       
NLS_DUAL_CURRENCY              €                             
NLS_ISO_CURRENCY               UNITED KINGDOM                
NLS_LANGUAGE                   ENGLISH                       
NLS_LENGTH_SEMANTICS           BYTE                          
NLS_NCHAR_CHARACTERSET         AL16UTF16                     
NLS_NCHAR_CONV_EXCP            FALSE                         
NLS_NUMERIC_CHARACTERS         .,                            
NLS_SORT                       BINARY                        
NLS_TERRITORY                  UNITED KINGDOM                
NLS_TIMESTAMP_FORMAT           DD-MON-RR HH24.MI.SSXFF       
NLS_TIMESTAMP_TZ_FORMAT        DD-MON-RR HH24.MI.SSXFF TZR   
NLS_TIME_FORMAT                HH24.MI.SSXFF                 
NLS_TIME_TZ_FORMAT             HH24.MI.SSXFF TZR             

19 rows selected. 

Now, let see if we can work out how much raw data is held in a table.
We’ll start with a very simple example :

create table marvin as
select 1000 + rownum as id
from dual
connect by rownum <= 1024;

Marvin may have a brain the size of a planet but his tabular namesake has more modest space requirements.
It’s 1024 records are all 4 digits long.
Therefore, the size of the table data should be 4096 bytes, right ?

“Hang on”, your thinking, “why not just lookup the size in USER_SEGMENTS and make this a really short post ?”

Well :

select bytes
from user_segments
where segment_name = 'MARVIN'
and segment_type = 'TABLE'
/

     BYTES
----------
     65536

USER_SEGMENTS will give you a size in bytes, but it’s not the same as the amount of raw data.
We’ll come back to this in a bit.

For now though, we can cross-check the size from elsewhere in the data dictionary, provided the stats on the table are up-to-date.

To ensure that this is so, I can run :

exec dbms_stats.gather_table_stats('MIKE', 'MARVIN');

This will ensure that statistics data is populated in the USER_TABLES view. This means that we can estimate the data volume by running the following query :

select num_rows, avg_row_len,
num_rows * avg_row_len as data_in_bytes
from user_tables
where table_name = 'MARVIN'
/

…which returns…

  NUM_ROWS AVG_ROW_LEN DATA_IN_BYTES
---------- ----------- -------------
      1024           4          4096

That looks promising.

We can further verify this by running :

select sum(length(id)) as data_in_bytes
from marvin
/

DATA_IN_BYTES
-------------
         4096

OK, now let’s see what happens with a slightly more complex data set, and an index as well …

create table hitchikers
(
id number generated always as identity,
character_name varchar2(250),
quote varchar2(4000),
constraint hitchikers_pk primary key (id)
)
/
declare
procedure ins( i_character in varchar2, i_quote in varchar2)
is
begin
insert into hitchikers( character_name, quote)
values(i_character, i_quote);
end;
begin
for i in 1..1024 loop
ins('Deep Thought', 'Forty-Two');
ins('Trillian', q'[we have normality... anything you can't cope with is, therefore, your own problem]');
ins('Ford Prefect', 'Time is an illusion. Lunchtime doubly so.');
ins('Zaphod Beeblebrox', q'[If there's anything more important than my ego around, I want it caught and shot right now!]');
ins(null, 'Anyone who is capable of getting themselves made President should on no account be allowed to do the job');
ins('Marvin', q'[Life! Loathe it or hate it, you can't ignore it.]');
ins('Arthur Dent', 'This must be Thursday. I never could get the hang of Thursdays');
ins('Slartibartfast', q'[I'd rather be happy than right any day]');
end loop;
commit;
end;
/
commit;

Once stats are present on the table, we can check the expected data size as before :

select num_rows, avg_row_length,
    num_rows * avg_row_length as data_in_bytes
from user_tables
where table_name = 'HITCHIKERS'
/

  NUM_ROWS AVG_ROW_LEN DATA_IN_BYTES
---------- ----------- -------------
      8192          75        614400

This time, the size in bytes figure we get back is not exact, as we can confirm with :

select 
    sum( length(id) + 
    nvl(length(character_name),0) +
    nvl(length(quote), 0)) as data_in_bytes
from hitchikers
/

DATA_IN_BYTES
-------------
       598957

To verify the actual size in bytes, we can dump the contents of a table into a csv file. In this case, I’m using SQLDeveloper :

sqld_tab_export.png?w=740

The resulting file is a different size again :

ls -l hitchikers.csv
-rw-rw-r-- 1 mike mike 656331 May 13 11:50 hitchikers.csv

This can be accounted for by the characters added as part of the csv formatting.

First, the csv file includes a header row :

head -1 hitchikers.csv
"ID","CHARACTER_NAME","QUOTE"

Including the line terminator this is 30 bytes :

head -1 hitchikers.csv |wc -c
30

The format in each of the 8192 data rows includes :

  • a comma after all but the last attribute on a row
  • a line terminator after the last attribute
  • double quotes enclosing each of the two VARCHAR attributes.

For example :

grep ^42, hitchikers.csv
42,"Trillian","we have normality... anything you can't cope with is, therefore, your own problem"

That’s a total of 7 extra bytes per data row.
Add all that up and it comes to 57374 bytes which are a consequence of csv formatting.

Subtract that from the file size and we get back to the calculated data size we started with :

656331 - 57374 = 598957

This confirms that the figures in USER_TABLES are approximate and you’ll need to bear this in mind if you’re relying on them to calculate the size of the data in a table.

Whilst were here, let’s see what effect compression might have on our ability to determine the raw data size.
We can do this by creating a table that has the same structure as HITCHIKERS and contains the same data, but which is compressed :

create table magrathea
(
id number,
character_name varchar2(250),
quote varchar2(4000),
constraint magrathea_pk primary key (id)
)
row store compress advanced
/
insert into magrathea( id, character_name, quote)
select id, character_name, quote
from hitchikers
/
commit;
exec dbms_stats.gather_table_stats(user, 'MAGRATHEA');

It turns out that, for the purposes of our raw data calculation, the effect of table compression is…none at all :

select num_rows, avg_row_len,
    num_rows * avg_row_len
from user_tables
where table_name = 'MAGRATHEA'
/


  NUM_ROWS AVG_ROW_LEN NUM_ROWS*AVG_ROW_LEN
---------- ----------- --------------------
      8192          75               614400

However, if you look at the number of blocks used to store the table, the effects of compression are more evident :

select table_name, blocks
from user_tables
where table_name in ('HITCHIKERS', 'MAGRATHEA')
order by 2
/

TABLE_NAME                         BLOCKS
------------------------------ ----------
MAGRATHEA                              20 
HITCHIKERS                             95

Incidentally, it’s worth noting that, as with the data size, the number of blocks reported in USER_TABLES are somewhat approximate.
USER_SEGMENTS reports the number of blocks for each table as :

select segment_name, blocks
from user_segments
where segment_name in ('HITCHIKERS', 'MAGRATHEA')
order by 2
/

SEGMENT_NAME                       BLOCKS
------------------------------ ----------
MAGRATHEA                              24
HITCHIKERS                            104

So it looks like compression will affect the amount of database space required to store an object but not the size of the actual data. This brings us nicely on to…

How big is my table in terms of the amount of space it’s taking up in the database ?

Let’s go back to MARVIN. Remember, this table contains 4K of raw data, but USER_SEGMENTS claims that it’s quite a bit larger :

select bytes
from user_segments
where segment_name = 'MARVIN'
and segment_type = 'TABLE'
/

     BYTES
----------
     65536

To understand how Oracle has come up with this figure, you need to consider that :

  • the smallest unit of space that Oracle addresses is measured in blocks
  • the size of these blocks is defined at tablespace level.
  • any object that uses space is allocated that space in units of an extent – which is a number of contiguous blocks.

If we take a look at MARVIN, we can see that the table resides in the DATA tablespace and has been allocated a single extent of 8 blocks :

select tablespace_name, bytes, blocks, extents
from user_segments
where segment_name = 'MARVIN';

TABLESPACE_NAME                     BYTES     BLOCKS    EXTENTS
------------------------------ ---------- ---------- ----------
DATA                                65536          8          1

The block size is defined at the tablespace level and is held in USER_TABLESPACES in bytes :

select block_size
from user_tablespaces
where tablespace_name = 'DATA';

BLOCK_SIZE
----------
      8192

If we now multiply the number of blocks in the table by the size of those blocks, we get back to the size that USER_SEGMENTS is reporting :

select seg.blocks * tsp.block_size
from user_segments seg
inner join user_tablespaces tsp
    on seg.tablespace_name = tsp.tablespace_name
where seg.segment_name = 'MARVIN';

SEG.BLOCKS*TSP.BLOCK_SIZE
-------------------------
                    65536

MARVIN is a table with no ancillary segments, such as indexes.
To find the total space being used for the HITCHIKERS table, we’ll also need to consider the space being taken up by it’s index, HITCHIKERS_PK :

select seg.segment_name, seg.segment_type, seg.blocks, ts.block_size,
    seg.bytes
from user_segments seg
inner join user_tablespaces ts
    on ts.tablespace_name = seg.tablespace_name
where seg.segment_name in ('HITCHIKERS', 'HITCHIKERS_PK')
/

SEGMENT_NAME         SEGMENT_TYPE        BLOCKS BLOCK_SIZE      BYTES
-------------------- --------------- ---------- ---------- ----------
HITCHIKERS           TABLE                  104       8192     851968
HITCHIKERS_PK        INDEX                   24       8192     196608

…in other words…

select sum(seg.bytes)
from user_segments seg
where seg.segment_name in ('HITCHIKERS', 'HITCHIKERS_PK')
/

SUM(SEG.BYTES)
--------------
       1048576

On the subject of ancillary segments, what about LOBS ?

create table the_guide(
id number generated always as identity,
message clob);
declare
v_msg clob;
begin
for i in 1..1000 loop
v_msg := v_msg||q'[Don't Panic!]';
end loop;
insert into the_guide(message) values( v_msg);
end;
/
commit;

Unlike other segment types, LOBSEGMENT and LOBINDEX segments do not have their parent tables listed as the SEGMENT_NAME in USER_SEGMENTS.

Therefore, we need to look in USER_LOBS to identify it’s parent table for a LOBSEGMENT and USER_INDEXES for a LOBINDEX :

select segment_name, segment_type, bytes, blocks
from user_segments 
where(
    segment_name = 'THE_GUIDE'
    or
    segment_name in ( 
        select segment_name 
        from user_lobs 
        where table_name = 'THE_GUIDE'
        )
    or 
    segment_name in ( 
        select index_name 
        from user_indexes 
        where table_name = 'THE_GUIDE'
        )
    )
/   

SEGMENT_NAME                   SEGMENT_TYPE         BYTES     BLOCKS
------------------------------ --------------- ---------- ----------
THE_GUIDE                      TABLE                65536          8
SYS_IL0000145509C00002$$       LOBINDEX             65536          8
SYS_LOB0000145509C00002$$      LOBSEGMENT         1245184        152

In this instance, although the table segment itself is only taking up 65536 bytes, when you add in all of the supporting objects, the total space requirement increases to 1376256 bytes.

If you’ve managed to make this far then meet me at the Big Bang Burger Bar for a Pan Galactic Garble Blaster. I need a drink after that.

Loading...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK