2
votes

I have a Composite-List-List partitioned table with 19 Columns and about 400 million rows. Once a week new data is inserted in this table and before the insert I need to set the values of 2 columns to null for specific partitions.

Obvious approach would be something like the following where COLUMN_1 is the partition criteria:

UPDATE BLABLA_TABLE 
SET COLUMN_18 = NULL, SET COLUMN_19 = NULL 
WHERE COLUMN_1 IN (VALUE1, VALUE2…)

Of course this would be awfully slow.

My second thought was to use CTAS for every partition that I need to set those two columns to null and then use EXCHANGE PARTITION to update the data in my big table. Unfortunately that wouldn’t work because it´s a Composite-Partition.

I could use the same approach with subpartitions but then I would have to use CATS about 8000 times and drop those tables afterwards every week. I guess that would not pass the upcoming code-review.

May somebody has another idea how to performantly solve this?

PS: I’m using ORACLE 11g as database. PPS: Sorry for my bad English…..

2
How many rows would be updated? Would you update whole subpartition? Also how large is the data in these column before update (to estimate undo log)?Vincent Malgrat
I would update the whole partition respectively all rows of every subpartition in this partition that are NOT NULL (about 90 %). Approximately 120.000.000 million rows would be updated. COLUMN_18 is defined as NUMBER(10,4) and COLUMN_19 as NUMBER(10).jon sumisu

2 Answers

3
votes

You've ruled out updating through DDL (switch partitions), so this lets us with only DML to consider.

I don't think that it's actually that bad an update with a table so heavily partitioned. You can easily split the update in 8k mini updates (each a single tiny partition):

UPDATE BLABLA_TABLE SUBPARTITION (partition1) SET COLUMN_18 = NULL...

Each subpartition would contain 15k rows to be updated on average so the update would be relatively tiny.

While it still represents a very big amount of work, it should be easy to set to run in parallel, hopefully during hours where database activity is very light. Also the individual updates are easy to restart if one of them fails (rows locked?) whereas a 120M update would take such a long time to rollback in case of error.

0
votes

If I were to update almost 90% of rows in table, I would check feasibility/duration of just inserting to another table of same structure (less redo, no row chaining/migration, bypass cache and so on via direct insert. drop indexes and triggers first. exclude columns to leave them null in target table), rename the tables to "swap" them, rebuild indexes and triggers, then drop the old table.

From my experience in data warehousing, plain direct insert is better than update/delete. More steps needed but it's done in less time overall. I agree, partition swap is easier said than done when you have to process most of the table and just makes it more complex for the ETL developer (logic/algorithm bound to what's in the physical layer), we haven't encountered need to do partition swaps so far.

I would also isolate this table in its own tablespaces, then alternate storage between these two tablespaces (insert to 2nd drop table from 1st, vice-versa in next run, resize empty tablespace to reclaim space).