SP using table/index with volatile statistics that differ at compile and run time

Question

I’m a longtime MSSQL developer who finds himself back in PL/SQL for the first time since Oracle 7. I’m looking for some tuning advice re a large export stored procedure, which is sporadically and not very reproducably running slow at certain points. This happens around some static working tables which it truncates, fills and uses as part of the export. The code in outline typically looks like this:

create or replace Procedure BigMultiPurposeExport as (

-- about 2000 lines of other code

INSERT WORK_TABLE_5 SELECT WHATEVER1 FROM WHEREVER1;
INSERT WORK_TABLE_5 SELECT WHATEVER2 FROM WHEREVER2;
INSERT WORK_TABLE_5 SELECT WHATEVER3 FROM WHEREVER3;
INSERT WORK_TABLE_5 SELECT WHATEVER4 FROM WHEREVER4;
-- WORK_TABLE_5 now has 0 to ~500k rows whose content can vary drastically from run to run
-- e.g. one hourly run exports 3 whale sightings, next exports all tourist visits to Kenya this decade

-- about 1000 lines of other code

INSERT OUTPUT_TABLE_3
SELECT THIS, THAT, THE_OTHER
FROM BUSINESS_TABLE_1 BT1
INNER JOIN BUSINESS_TABLE_2 ON etc -- typical join on indexed columns
INNER JOIN BUSINESS_TABLE_3 ON etc -- typical join on indexed columns
INNER JOIN BUSINESS_TABLE_4 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_1 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_2 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_3 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_4 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_5 WT5 ON BT1.ID = WT5.BT1_ID AND WT5.RECORD_TYPE = 21 
-- join above is now supported by indexes on BUSINESS_TABLE_1 (ID) and WORK_TABLE_5 (BT1_ID, RECORD_TYPE), originally wasn't
LEFT OUTER JOIN WORK_TABLE_6 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_7 ON etc -- typical join on indexed columns

-- about 4000 lines of other code
)

That final insert into OUTPUT_TABLE_3 usually runs in under 10 seconds, but once in a while on certain customer servers it times out at our default 99 minutes. Then we have them take the tiemout off and run it on Friday night, and it finishes but takes 16 hours.

I narrowed the problem down to the join to WORK_TABLE_5, which had no index support, and put an index on the join terms. The next run took 4 seconds. But success has been intermittent, the customer occasionally gets some slow runs when they drastically change their export selection (i.e. drastically change the data in WORK_TABLE_5). And if we update statistics and rebuild indexes after a timed out export, it runs fine at the next attempt.

So, I am wondering about how best to handle truncating/filling static work tables with static indexes, statistics updated overnight, and a stored procedure compiled when the statistics are nothing like runtime.

I have a few general questions about things I'd like to understand better:

Is the nature of the data in the work table going to substantially effect the query plan? Does Oracle form its query plan when you compile the stored procedure? Could we get a highly inappropriate query plan if we compile the stored procedure with the table empty then use a table with 500k rows at runtime?
I expect that if this were an ad-hoc script then updating statistics on the problem table just before selecting from it would eliminate the sporadic slowdowns. But what if I were to update statistics inside the stored procedure, which is compiled with different statistics from runtime?
Anything else you'd like to add...

Thanks for any advice. I hope my MSSQL preconceptions haven't made me too far off base.

This is happening in Oracle 11g, but the code is deployed to assorted customers using Oracle 10 through 12 and I'd like to cater to all of those if possible.

-- Joel

Is there anything suspicious when you compare explain plans for 99 minutes run and 4 seconds run? — Vlad
The first time around, before I put the index on, yes -- it did nested table scans and took 16 hours. Adding the index fixed that on my tests. Later runs with the index, which gave more sporadic errors were on customer systems, and I can't reproduce them or see the query plans. But presumably they must have had something dodgy in their query plans, or they wouldn't have timed out. What I'm really trying to understand here is: can a huge difference in table/index stats between SP compile time and SP run time cause bad execution? And if so, what do I do about that? — Joel Benford

Jon Heller Jon Heller · Accepted Answer · 2016-11-12T17:04:15

Huge differences in table or index sizes can most definitely cause performance problems. The solution is to add statistics gathering to the procedure instead of relying on the default statistics jobs.

If you've been away from Oracle since version 7, the most important new feature is the Cost Based Optimizer. Oracle now builds query execution plans based on the optimizer statistics of tables, indexes, columns, expressions, system statistics, outlines, directives, dynamic sampling, etc. If you're a full time Oracle developer you should probably spend a day reading about optimizer statistics. Start with Managing Optimizer Statistics and DBMS_STATS in the official documentation.

Eventually the stored procedure should look like this:

--1: Insert into working tables.
insert into work_table...

--2: Gather statistics on working tables.
dbms_stats.gather_table_stats('SCHEMA_NAME', 'WORK_TABLE', ...);

--3: Use working tables.
insert into other_table select * from work_table...

There are so many statistics features it's hard to know exactly what parameters to use in that second step above. Here are some guesses about some features you might find useful:

DEGREE - One reason people avoid gathering statistics inside a process is the time is takes. You can significantly improve the run time by setting the degree. Although this also uses significantly more resources.
NO_INVALIDATE - It can be tricky to know when exactly are the statistics "set" for a query. Gathering statistics usually quickly invalidates execution plans that were based on old statistics. But not always. If you want to be 100% sure that the next query is using the latest statistics you want to set NO_INVALIDATE=>FALSE.
ESTIMATE_PERCENT In 11g and above you definitely want to use the default, which uses a faster algorithm. In 10g and below you may need to set the value to something low to make the gathering fast enough.

Although Oracle 10g and above comes with default statistics gathering jobs you cannot rely on them for a few reasons:

They are scheduled and may not run at the right time. If a process significantly changes the data then new stats are needed right away, not at 10 PM. If there are a lot of tables that need to be analyzed the job may not get to them all in one day.
Many DBAs disable the jobs. This is ridiculous and almost always a mistake. But you'll find many DBAs that disabled the job because they think they can do it better. Instead of working with the auto tasks, and settings preferences, many DBAs like to throw the whole thing out and replace it with a custom procedure that rots over time.

SP using table/index with volatile statistics that differ at compile and run time

1 Answers