Home > LOB > High-Performance Batch LOB Insert. Part 1

High-Performance Batch LOB Insert. Part 1

One would think that when you have a multi-CPU server it should be pretty straightforward in Oracle 10g to copy records from one table with LOB column to another in parallel nologging using all available CPU power.

As it turns out, it is not that easy.

One of my projects after number of years accumulated a big table which we wanted to trim. The table structure was rather simple – few regular columns, one LOB, several indexes. The table had 23 million records. The table, indexes, LOB segment and LOB index all together occupied 82 GB.

We needed to purge about half of the records.

With 82GB in size at hand the regular sql deletes were immediately ruled out due to undo and redo overhead. Instead of deletes we decided to copy the needed records to another table in parallel nologging using Create Table As Select ( CTAS ) or INSERT with Parallel/Append hint and then rename the table.

The database server had 24 CPUs , 56gb of RAM and high-performance EMC storage array. I expected the copy process to be vigorous and quick.

I was in for quite a surprise.

Initial Table structure and sizes

The existing table looks simple and straightforward. Application stores most of app-related data in LOB column with only few of other metadata kind of columns:

SQL> desc EWPTSADM.PORTALX_IMAGE
Name               Null?    Type
 ------------------ -------- -------------------
 ORDERNUM           NOT NULL VARCHAR2(12)
 SEGMENT                     VARCHAR2(2)
 DUEDATE                     DATE
 PORTAL_IMAGE                CLOB
 REGION                      VARCHAR2(2)
 CREATEDDATE                 DATE
 UPDATEDDATE                 DATE

select count(1) from EWPTSADM.PORTALX_IMAGE;
 23,103,552
 

Because LOB was defined with “enable storage in row”, most of the storage was in the table segment:

col size_gb for 99.99
select segment_name, segment_type, sum(bytes)/1024/1024/1024 as size_gb,
 count(1) as seg_count
from user_segments
where
(     ( segment_name in ( 'PORTALX_IMAGE' ) )
 or ( segment_name in ( select segment_name from user_lobs
 where TABLE_NAME = 'PORTALX_IMAGE') )
 or ( segment_name in ( select index_name from user_lobs
 where table_name = 'PORTALX_IMAGE' ) )
 or ( segment_name in ( select index_name from user_indexes
 where table_name = 'PORTALX_IMAGE' ) )
)
group by segment_name, segment_type
/
SEGMENT_NAME                   SEGMENT_TYPE    SIZE_GB  SEG_COUNT
------------------------------ --------------- ------- ----------
IND_PORTALX_IMAGE_ORDSEG       INDEX              2.23          1
SYS_LOB0000432336C00004$$      LOBSEGMENT        14.94          1
IND_PORTALIMAGE_ORDREG         INDEX              1.18          1
SYS_IL0000432336C00004$$       LOBINDEX            .04          1
IND_FUN_PORTALX_IMAGE_ORDSEG   INDEX              2.13          1
PORTALX_IMAGE                  TABLE             59.32          1
IND_PORTALX_IMAGE_ORDERSEGREG  INDEX              2.03          1
 

With 59GB table segment size it was difficult to run aggregate adhoc queries on the table. On average such a query took about 15 mins when making full table scan. This lead us to consider storing LOB out of line.

LOB Chunk size and space usage

When storing LOB out of line for a big table, there is a danger of wasting space comparing to storing it in line. Out of line LOBs allocate space in chunks multiple of tablespace block size. If LOB size smaller than chunk size, some space is wasted.

Here is the LOB size distribution for our table for 100,000 record sample:

select
 cnt,
 100*sum(cnt) over (order by chunk_1024)/sum(cnt) over () as percent_under,
 chunk_1024
from (
 select count( 1 ) as cnt,
 trunc(dbms_lob.getlength(PORTAL_IMAGE)/1024)+1 as chunk_1024
 from EWPTSADM.PORTALX_IMAGE where rownum < 100000
 group by trunc(dbms_lob.getlength(PORTAL_IMAGE)/1024)
 )
order by 3;

 CNT       PERCENT_UNDER CHUNK_1024
---------- ------------- ----------
  6751              6.75          1
 49202             55.95          2   <- proposed 2k LOB chunk size
 30951             86.90          3
  8338             95.24          4   <- proposed 4k LOB chunk size
  2225             97.46          5
   924             98.39          6
   596             98.98          7
   334             99.32          8   <- current 8k LOB chunk size
   199             99.52          9
   110             99.63         10
 <...cut here...>
     1             99.99        128
     2            100.00        267
 

This shows that 55% of the records have LOBs sizes under 2k,  95% under 4k and 99% under 8k.

Based on this, we can conclude that if LOB is to be stored in 8k tablespace with 8k chunk, for 95% records at least half of the 8k chunk will be wasted.

Similarly, if LOB is stored in 4k tablespace with chunk=4k, for 55% records half of the 4k chunk will be wasted.

Since we were trying to reduce storage and at the same time improve adhoc queries performance, we decided to try storing LOB out of line in a 2k tablespace.

And finally, we will take this reorganization opportunity and partition the table to help with future purge process.

LOB makes it hard

Next question was the logging.

With about 74GB between the table and LOB segments to deal with there was a concern with redo during reorganization. Everything written into redo gets eventually written into archival logs and then gets transferred onto DR server over network.

Because I wanted the process to run nologging, the only choice was NOCACHE NOLOGGING lob because the only other out of line option of CACHE assumes LOGGING.

With that I created an empty table first:

create table EWPTSADM.PORTALX_IMAGE_PART
...
nologging parallel
lob(PORTAL_IMAGE) store as PORTALX_IMAGE_LOB (tablespace EWPTSDB_LOB_2K chunk 2K disable storage in row nocache nologging retention )
partition by range (CREATEDDATE)
...

And then ran test insert with parallel hint to invoke parallel direct load operations:

alter system set parallel_max_servers=48 scope=memory;

alter session enable parallel dml;

set timi on
set autotrace on stat

insert /*+ parallel */ into EWPTSADM.PORTALX_IMAGE_PART
select /*+ parallel(b) */ * from EWPTSADM.PORTALX_IMAGE b
where (ORDERNUM, SEGMENT, REGION) in
 (select /*+ parallel(c) */ ORDERNUM, SEGMENT, REGION
 from EWPTSADM.PORTALX_DATA c
 )
and rownum < 100000
 

To my surprise the process ran for over 45 mins before I had to interrupt it because it became clear that it was too slow for a practical use. If 100,000 records take more than 45 mins to copy, then 10,000,000 will take over 3 days.

Needless to say I was disappointed and started to wonder what was going on.

While it was running I took a Grid Control snapshot:

We can see that initially there were 8 processes actively working and then the count dropped to 4 and stayed flat for prolonged time.
Then I looked into one of the spawned Parallel Query processes.
Below is Grid Control snapshot on its wait events:

We can see that most frequent wait was for “control file sequential read”.
In this particular screenshot there are 8 control file reads and only one direct file write !

/* Later on I ran across Oracle support document which suggested that when writing NOCACHE LOBs stored out of line Oracle makes IO write call for each chunk separately. On top of that, it also updates control file on each write. And apparently it also reads control file even more frequently than that, possible to build a read consistent image of the control file block being updated at the same time by other parallel processes working on the same load. */

Well, this test basically rendered NOCACHE NOLOGGING lob option as not practical.

I then switched to CACHE (logging) lob option.
I started from scratch and repeated everything the same except this time using

...
lob(PORTAL_IMAGE) store as PORTALX_IMAGE_LOB (tablespace EWPTSDB_LOB_2K chunk 2K disable storage in row cache retention )
...
 

This time the process was much quicker.

It finished the test in 11 mins and created following objects:

SEGMENT_NAME                   SEGMENT_TYPE       SIZE_MB TS_NAME
------------------------------ --------------- ---------- ---------------   
SYS_IL0000949744C00004$$       INDEX PARTITION     7.4375 EWPTSDB_LOB_2K
PORTALX_IMAGE_LOB              LOB PARTITION     313.3125 EWPTSDB_LOB_2K
PORTALX_IMAGE_PART             TABLE PARTITION         75 EWPTSSS_ARCH
 

It also produced 337 mb of redo, which was unfortunate but seemed like unavoidable considering impracticality of “nocache nologging” option.

While test was running, Grid Control showed an interesting picture:

We can see that initially about 24 Parallel processes were happy to work.
However later on all the processes except one started to wait on “PX Deq Credit: send block”. The initial phase must be due to selecting the necessary rows and the second phase must be actual insert. This snapshot shows that while insert into CACHE lob runs much faster than into NOCACHE lob, there is still some serialization bottleneck.

As a final test I decided to do “enable storage in row”.
Of course, the table segment will balloon and this option is not something I wanted to use, but why not do it for the purpose of test.

So I went back to the beginning and did everything the same but this time using

...
lob(PORTAL_IMAGE) store as PORTALX_IMAGE_LOB (tablespace EWPTSDB_LOB_2K chunk 2K enable storage in row retention )
...
 

This option was a clear winner.

It finished in 2:28 mins and produced following segments with only 0.1mb redo :

SEGMENT_NAME                   SEGMENT_TYPE       SIZE_MB TS_NAME
 ------------------------------ --------------- ---------- --------------
 PORTALX_IMAGE_LOB              LOB PARTITION        38.94 EWPTSDB_LOB_2K
 SYS_IL0000949866C00004$$       INDEX PARTITION        .94 EWPTSDB_LOB_2K
 PORTALX_IMAGE_PART             TABLE PARTITION     315.00 EWPTSSS_ARCH
 

As expected, most of the storage went into table segment.

Surprisingly, the Grid Control registered very little effort:

Clearly whatever code Oracle uses for inserting LOBs inline makes the job much faster and with much less effort. According to Grid Control the insert ran almost serially but with speed leaving any of out of line options in the dust.

Having achieved such a great speed on 100,000 record test, I decided to remove 100,000 records limitation in the insert sql and let the whole thing run for a while before interrupting the process so that I could see activity in Grid Control more clearly.

Below is the snapshot (without rownum):

As we can see, in the second part there were only two processes working while other 24 were blocked on “PX Deq Credit: send blkd”.

With that I prepared following final comparison table for my reference and reluctantly decided to revert to using in-line LOBs

-- Test Summary for 100,000 records:

                       Option 1              Option 2        Option 3
 Sizes                 disable               disable          enable
                   storage in row         storage in row   storage in row
                   nocache nologging          cache       (cache) nologging

Elapsed Time,min    >45(interrupted)         10:59            2:28
Table, MB            N/A                      75              315
LOB Segment,MB       N/A                      313              39
LOB Index,MB         N/A                      7                 1
Redo,MB              N/A                      337             0.1
 

But then I happen to run across somebody on the internet mentioning subpartitions when loading lobs.
And what a difference this has made …

Read on to next post High-Performance Batch LOB Insert. Part 2. Subpartitions

Advertisements
  1. May 18, 2014 at 1:55 pm

    Very helpful detailed example. Thanks.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: