Some time ago I came across a query that was using dynamic sampling on a partitioned table even though statistics were present at global level. After some time spent investigating this issue I discovered an interesting but unexpected behaviour of the Oracle CBO when dealing with the GLOBAL_STATS statistic (observed on Oracle RDBMS ver. 11.2.0.2.0, 11.2.0.3.0, 12.1.0.2.0).
Let me go through a simple demo so I can explain what I’ve found.
First of all let’s create our test table MYTAB partitioned by RANGE and subpartitioned by LIST and populate it with 100k rows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
create table mytab(id number(10), run_date date, run_region number(2)) partition by range(run_date) subpartition by list(run_region) ( partition M201601 values less than (date '2016-02-01') ( subpartition M201601_R01 values (1), subpartition M201601_R02 values (2), subpartition M201601_R03 values (3) ), partition M201602 values less than (date '2016-03-01') ( subpartition M201602_R01 values (1), subpartition M201602_R02 values (2), subpartition M201602_R03 values (3) ), partition M201603 values less than (date '2016-04-01') ( subpartition M201603_R01 values (1), subpartition M201603_R02 values (2), subpartition M201603_R03 values (3) ) ); -- Populate data insert into mytab select rownum id, date '2016-01-01' + round(dbms_random.value(0, 90)) run_date, round(dbms_random.value(1,3)) run_region from dual connect by level <= 100000; commit; |
In order to show details about statistics and generate the explain plans I need for the demo, I will use the following code (for every test I will only mention the query I’m going to analyze and replace it inside this script):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
clear select table_name, partition_name, subpartition_name, object_type, num_rows, blocks, sample_size, last_analyzed, global_stats from user_tab_statistics where table_name = 'MYTAB' and (partition_name = 'M201601' or partition_name is null); -- Explain set linesize 160 set pagesize 999 trimspool on heading off feed off verify off timing off set serverout off arraysize 200 ttitle off alter session set statistics_level = 'ALL'; col plan_table_output clear col plan_table_output format 160 explain plan for -- ***** Query starts select * from mytab where run_date = date '2016-01-01'; -- ********************** Query ends select * from table(dbms_xplan.display(format => '')); |
At the moment the table has no statistics so if we run any query like
1 2 3 |
select * from mytab where run_date = date '2016-01-01'; |
the Oracle CBO uses dynamic sampling as expected
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
TABLE_NAME PARTITION_NAME SUBPARTITION_NAME OBJECT_TYPE NUM_ROWS BLOCKS SAMPLE_SIZE LAST_ANALYZED GLOBAL_STATS ------------------------------ ------------------------------ ------------------------------ ------------ ---------- ---------- ----------- ------------- ------------ MYTAB TABLE NO MYTAB M201601 PARTITION NO MYTAB M201601 M201601_R02 SUBPARTITION NO MYTAB M201601 M201601_R03 SUBPARTITION NO MYTAB M201601 M201601_R01 SUBPARTITION NO Session altered Explained PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 1937660305 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 603 | 21105 | 57 (2)| 00:00:01 | | 1 | PARTITION RANGE SINGLE| | 603 | 21105 | 57 (2)| 00:00:01 | | 2 | PARTITION LIST ALL | | 603 | 21105 | 57 (2)| 00:00:01 | |* 3 | TABLE ACCESS FULL | MYTAB | 603 | 21105 | 57 (2)| 00:00:01 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("RUN_DATE"=TO_DATE(' 2016-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss Note ----- - dynamic sampling used for this statement (level=2) |
Let’s gather statistics at table level executing
1 2 3 4 5 6 7 8 |
begin dbms_stats.gather_table_stats(ownname => user, tabname => 'MYTAB', partname => '', granularity => 'GLOBAL', cascade => true); end; / |
and run explain for the same query. This query accesses a specific partition and the explain plan shows that dynamic sampling is not being used anymore because statistics are present at global level
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
TABLE_NAME PARTITION_NAME SUBPARTITION_NAME OBJECT_TYPE NUM_ROWS BLOCKS SAMPLE_SIZE LAST_ANALYZED GLOBAL_STATS ------------------------------ ------------------------------ ------------------------------ ------------ ---------- ---------- ----------- ------------- ------------ MYTAB TABLE 100000 610 100000 24/07/2016 16 YES MYTAB M201601 PARTITION NO MYTAB M201601 M201601_R02 SUBPARTITION NO MYTAB M201601 M201601_R03 SUBPARTITION NO MYTAB M201601 M201601_R01 SUBPARTITION NO Session altered Explained PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 1937660305 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 532 | 8512 | 57 (0)| 00:00:01 | | 1 | PARTITION RANGE SINGLE| | 532 | 8512 | 57 (0)| 00:00:01 | | 2 | PARTITION LIST ALL | | 532 | 8512 | 57 (0)| 00:00:01 | |* 3 | TABLE ACCESS FULL | MYTAB | 532 | 8512 | 57 (0)| 00:00:01 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("RUN_DATE"=TO_DATE(' 2016-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss |
The same happens if we run a query that selects a specific subpartition or scans all partitions like
1 2 3 |
select * from mytab where run_region = 2; |
that generates the following plan
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
TABLE_NAME PARTITION_NAME SUBPARTITION_NAME OBJECT_TYPE NUM_ROWS BLOCKS SAMPLE_SIZE LAST_ANALYZED GLOBAL_STATS ------------------------------ ------------------------------ ------------------------------ ------------ ---------- ---------- ----------- ------------- ------------ MYTAB TABLE 100000 610 100000 24/07/2016 16 YES MYTAB M201601 PARTITION NO MYTAB M201601 M201601_R02 SUBPARTITION NO MYTAB M201601 M201601_R03 SUBPARTITION NO MYTAB M201601 M201601_R01 SUBPARTITION NO Session altered Explained PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 1972260289 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 33333 | 520K| 91 (0)| 00:00:02 | | 1 | PARTITION RANGE ALL | | 33333 | 520K| 91 (0)| 00:00:02 | | 2 | PARTITION LIST SINGLE| | 33333 | 520K| 91 (0)| 00:00:02 | | 3 | TABLE ACCESS FULL | MYTAB | 33333 | 520K| 91 (0)| 00:00:02 | -------------------------------------------------------------------------------- |
Until now nothing unexpected.
Now let’s delete all statistics and gather them again at partition level
1 2 3 4 5 6 7 8 9 10 |
begin dbms_stats.delete_table_stats(ownname => user, tabname => 'MYTAB'); dbms_stats.gather_table_stats(ownname => user, tabname => 'MYTAB', partname => '', granularity => 'PARTITION', cascade => true); end; / |
When we do so Oracle sets the table level statistics by aggregating those at partition level. This can be observed by the GLOBAL_STATS flag set to NO on the USER_TAB_STATISTICS view
1 2 3 4 5 6 7 8 9 10 11 12 13 |
select table_name, partition_name, subpartition_name, object_type, num_rows, blocks, sample_size, last_analyzed, global_stats from user_tab_statistics where table_name = 'MYTAB' and (partition_name = 'M201601' or partition_name is null); TABLE_NAME PARTITION_NAME SUBPARTITION_NAME OBJECT_TYPE NUM_ROWS BLOCKS SAMPLE_SIZE LAST_ANALYZED GLOBAL_STATS ------------------------------ ------------------------------ ------------------------------ ------------ ---------- ---------- ----------- ------------- ------------ MYTAB TABLE 100000 610 0 24/07/2016 16 NO MYTAB M201601 PARTITION 33846 202 33846 24/07/2016 16 YES MYTAB M201601 M201601_R02 SUBPARTITION NO MYTAB M201601 M201601_R03 SUBPARTITION NO MYTAB M201601 M201601_R01 SUBPARTITION NO |
Now let’s see what happens when we run few queries that access the table at partition, subpartition and global level.
The first query
1 2 3 |
select * from mytab where run_date = date '2016-01-01'; |
accesses a specific partition and Oracle correctly uses the statistics we gathered
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
TABLE_NAME PARTITION_NAME SUBPARTITION_NAME OBJECT_TYPE NUM_ROWS BLOCKS SAMPLE_SIZE LAST_ANALYZED GLOBAL_STATS ------------------------------ ------------------------------ ------------------------------ ------------ ---------- ---------- ----------- ------------- ------------ MYTAB TABLE 100000 610 0 24/07/2016 16 NO MYTAB M201601 PARTITION 33846 202 33846 24/07/2016 16 YES MYTAB M201601 M201601_R02 SUBPARTITION NO MYTAB M201601 M201601_R03 SUBPARTITION NO MYTAB M201601 M201601_R01 SUBPARTITION NO Session altered Explained PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 1937660305 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 559 | 8944 | 56 (0)| 00:00:01 | | 1 | PARTITION RANGE SINGLE| | 559 | 8944 | 56 (0)| 00:00:01 | | 2 | PARTITION LIST ALL | | 559 | 8944 | 56 (0)| 00:00:01 | |* 3 | TABLE ACCESS FULL | MYTAB | 559 | 8944 | 56 (0)| 00:00:01 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("RUN_DATE"=TO_DATE(' 2016-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss |
The second query
1 2 3 4 |
select * from mytab where run_date = date '2016-01-01' and run_region = 2; |
accesses a specific subpartition of the partition that has statistics but the CBO surprisingly decides to use dynamic sampling!
It’s clearly visible from the plan
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
TABLE_NAME PARTITION_NAME SUBPARTITION_NAME OBJECT_TYPE NUM_ROWS BLOCKS SAMPLE_SIZE LAST_ANALYZED GLOBAL_STATS ------------------------------ ------------------------------ ------------------------------ ------------ ---------- ---------- ----------- ------------- ------------ MYTAB TABLE 100000 610 0 24/07/2016 16 NO MYTAB M201601 PARTITION 33846 202 33846 24/07/2016 16 YES MYTAB M201601 M201601_R02 SUBPARTITION NO MYTAB M201601 M201601_R03 SUBPARTITION NO MYTAB M201601 M201601_R01 SUBPARTITION NO Session altered Explained PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 2188202224 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 274 | 9590 | 32 (0)| 00:00:01 | | 1 | PARTITION RANGE SINGLE| | 274 | 9590 | 32 (0)| 00:00:01 | | 2 | PARTITION LIST SINGLE| | 274 | 9590 | 32 (0)| 00:00:01 | |* 3 | TABLE ACCESS FULL | MYTAB | 274 | 9590 | 32 (0)| 00:00:01 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("RUN_DATE"=TO_DATE(' 2016-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss Note ----- - dynamic sampling used for this statement (level=2) |
This means that the Oracle CBO simply ignores the partition level statistics (note that they have GLOBAL_STATS=YES). Table level stats are not used either because they have GLOBAL_STATS=NO.
If we execute a third query that scans all partitions
1 2 3 |
select * from mytab where run_region = 2; |
we can observe that dynamic sampling is not used. Moreover, having a look at rows estimates it is clear that table level statistics are being used here
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
TABLE_NAME PARTITION_NAME SUBPARTITION_NAME OBJECT_TYPE NUM_ROWS BLOCKS SAMPLE_SIZE LAST_ANALYZED GLOBAL_STATS ------------------------------ ------------------------------ ------------------------------ ------------ ---------- ---------- ----------- ------------- ------------ MYTAB TABLE 100000 610 0 24/07/2016 16 NO MYTAB M201601 PARTITION 33846 202 33846 24/07/2016 16 YES MYTAB M201601 M201601_R02 SUBPARTITION NO MYTAB M201601 M201601_R03 SUBPARTITION NO MYTAB M201601 M201601_R01 SUBPARTITION NO Session altered Explained PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 1972260289 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 33333 | 520K| 91 (0)| 00:00:02 | | 1 | PARTITION RANGE ALL | | 33333 | 520K| 91 (0)| 00:00:02 | | 2 | PARTITION LIST SINGLE| | 33333 | 520K| 91 (0)| 00:00:02 | | 3 | TABLE ACCESS FULL | MYTAB | 33333 | 520K| 91 (0)| 00:00:02 | -------------------------------------------------------------------------------- |
even though GLOBAL_STATS=NO at table level!
Now let’s see what happens if we remove partition level statistics
1 2 3 4 5 6 7 8 9 10 11 12 |
begin dbms_stats.delete_table_stats(ownname => user, tabname => 'MYTAB', partname => 'M201601'); dbms_stats.delete_table_stats(ownname => user, tabname => 'MYTAB', partname => 'M201602'); dbms_stats.delete_table_stats(ownname => user, tabname => 'MYTAB', partname => 'M201603'); end; / |
If we run the query that accesses a specific partition
1 2 3 |
select * from mytab where run_date = date '2016-01-01'; |
we can see that dynamic sampling is used
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
TABLE_NAME PARTITION_NAME SUBPARTITION_NAME OBJECT_TYPE NUM_ROWS BLOCKS SAMPLE_SIZE LAST_ANALYZED GLOBAL_STATS ------------------------------ ------------------------------ ------------------------------ ------------ ---------- ---------- ----------- ------------- ------------ MYTAB TABLE 100000 610 0 24/07/2016 16 NO MYTAB M201601 PARTITION NO MYTAB M201601 M201601_R02 SUBPARTITION NO MYTAB M201601 M201601_R03 SUBPARTITION NO MYTAB M201601 M201601_R01 SUBPARTITION NO Cannot SET AUTOTRACE Session altered Explained PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 1937660305 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 603 | 21105 | 56 (0)| 00:00:01 | | 1 | PARTITION RANGE SINGLE| | 603 | 21105 | 56 (0)| 00:00:01 | | 2 | PARTITION LIST ALL | | 603 | 21105 | 56 (0)| 00:00:01 | |* 3 | TABLE ACCESS FULL | MYTAB | 603 | 21105 | 56 (0)| 00:00:01 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("RUN_DATE"=TO_DATE(' 2016-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss Note ----- - dynamic sampling used for this statement (level=2) |
Again the table level statistics are ignored because they have GLOBAL_STATS=NO. We would get the same exact result by accessing specific partition and subpartition.
As a last test let’s manually set the statistics at global level
1 2 3 4 5 6 7 8 |
begin dbms_stats.set_table_stats(ownname => user, tabname => 'MYTAB', numrows => '100000', numblks => 610, avgrlen => 95); end; / |
You’ll notice that the GLOBAL_STATS flag at table level has changed to YES and none of our queries will use dynamic sampling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
TABLE_NAME PARTITION_NAME SUBPARTITION_NAME OBJECT_TYPE NUM_ROWS BLOCKS SAMPLE_SIZE LAST_ANALYZED GLOBAL_STATS ------------------------------ ------------------------------ ------------------------------ ------------ ---------- ---------- ----------- ------------- ------------ MYTAB TABLE 100000 610 0 24/07/2016 16 YES MYTAB M201601 PARTITION NO MYTAB M201601 M201601_R02 SUBPARTITION NO MYTAB M201601 M201601_R03 SUBPARTITION NO MYTAB M201601 M201601_R01 SUBPARTITION NO Session altered Explained PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 1937660305 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1099 | 17584 | 57 (0)| 00:00:01 | | 1 | PARTITION RANGE SINGLE| | 1099 | 17584 | 57 (0)| 00:00:01 | | 2 | PARTITION LIST ALL | | 1099 | 17584 | 57 (0)| 00:00:01 | |* 3 | TABLE ACCESS FULL | MYTAB | 1099 | 17584 | 57 (0)| 00:00:01 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("RUN_DATE"=TO_DATE(' 2016-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss |
The bottom line is that when table level stats are obtained by aggregation from underlying partitions, Oracle sets the GLOBAL_STATS flag to NO and queries accessing a specific partition use dynamic sampling unless they have local stats available.
It’s also quite surprising that if partition level statistics are available, a query that accesses a specific subpartition without stats uses table level ones completely ignoring the partition layer.
You can find some other interesting considerations on Tony Hasler’s blog where he ran several tests on the same topic using bind variables.