SQL Tuning Enhancements in Oracle 12c

13 min read

Background

Performance Tuning is one of the most critical area of Oracle databases and having a good knowledge on SQL tuning helps DBAs in tuning production databases on a daily basis. Over the years Oracle optimizer has gone through several enhancements and each release presents a best among all optimizer versions. Oracle 12c is no different. Oracle has improved the optimizer and added new features in this release to make it better than previous release.

In this article we are going to see some of the explicit new features of Oracle optimizer which helps us in tuning our queries.

Objective

In this article, Advait Deo and Indira Karnati, authors of the book OCP Upgrade 1Z0-060 Exam guide discusses new features of Oracle 12c optimizer and how it helps in improving the SQL plan. It also discusses some of the limitations of optimizer in previous release and how Oracle has overcome those limitations in this release. Specifically, we are going to discuss about dynamic plan and how it works

(For more resources related to this topic, see here.)

SQL Tuning

Before we go into the details of each of these new features, let us rewind and check what we used to have in Oracle 11g.

Behavior in Oracle 11g R1

Whenever an SQL is executed for the first time, an optimizer will generate an execution plan for the SQL based on the statistics available for the different objects used in the plan. If statistics are not available, or if the optimizer thinks that the existing statistics are of low quality, or if we have complex predicates used in the SQL for which the optimizer cannot estimate the cardinality, the optimizer may choose to use dynamic sampling for those tables. So, based on the statistics values, the optimizer generates the plan and executes the SQL. But, there are two problems with this approach:

Statistics generated by dynamic sampling may not be of good quality as they are generated in limited time and are based on a limited sample size. But a trade-off is made to minimize the impact and try to approach a higher level of accuracy.
The plan generated using this approach may not be accurate, as the estimated cardinality may differ a lot from the actual cardinality. The next time the query executes, it goes for soft parsing and picks the same plan.

Behavior in Oracle 11g R2

To overcome these drawbacks, Oracle enhanced the dynamic sampling feature further in Oracle11g Release 2. In the 11.2 release, Oracle will automatically enable dynamic sample when the query is run if statistics are missing, or if the optimizer thinks that current statistics are not up to the mark. The optimizer also decides the level of the dynamic sample, provided the user does not set the non-default value of the OPTIMIZER_DYNAMIC_SAMPLING parameter (default value is 2). So, if this parameter has a default value in Oracle11g R2, the optimizer will decide when to spawn dynamic sampling in a query and at what level to spawn the dynamic sample.

Oracle also introduced a new feature in Oracle11g R2 called cardinality feedback. This was in order to further improve the performance of SQLs, which are executed repeatedly and for which the optimizer does not have the correct cardinality, perhaps because of missing statistics, or complex predicate conditions, or because of some other reason. In such cases, cardinality feedback was very useful.

The way cardinality feedback works is, during the first execution, the plan for the SQL is generated using the traditional method without using cardinality feedback. However, during the optimization stage of the first execution, the optimizer notes down all the estimates that are of low quality (due to missing statistics, complex predicates, or some other reason) and monitoring is enabled for the cursor that is created. If this monitoring is enabled during the optimization stage, then, at the end of the first execution, some cardinality estimates in the plan are compared with the actual estimates to understand how significant the variation is. If the estimates vary significantly, then the actual estimates for such predicates are stored along with the cursor, and these estimates are used directly for the next execution instead of being discarded and calculated again. So when the query executes the next time, it will be optimized again (hard parse will happen), but this time it will use the actual statistics or predicates that were saved in the first execution, and the optimizer will come up with better plan.

But even with these improvements, there are drawbacks:

With cardinality feedback, any missing cardinality or correct estimates are available for the next execution only and not for the first execution. So the first execution always go for regression.
The dynamic sample improvements (that is, the optimizer deciding whether dynamic sampling should be used and the level of the dynamic sampling) are only applicable to parallel queries. It is not applicable to queries that aren’t running in parallel.
Dynamic sampling does not include joins and groups by columns.

Oracle 12c has provided new improvements, which eliminates the drawbacks of Oracle11g R2.

Adaptive execution plans – dynamic plans

The Oracle optimizer chooses the best execution plan for a query based on all the information available to it. Sometimes, the optimizer may not have sufficient statistics or good quality statistics available to it, making it difficult to generate optimal plans. In Oracle 12c, the optimizer has been enhanced to adapt a poorly performing execution plan at run time and prevent a poor plan from being chosen on subsequent executions. An adaptive plan can change the execution plan in the current run when the optimizer estimates prove to be wrong. This is made possible by collecting the statistics at critical places in a plan when the query starts executing. A query is internally split into multiple steps, and the optimizer generates multiple sub-plans for every step. Based on the statistics collected at critical points, the optimizer compares the collected statistics with estimated cardinality. If the optimizer finds a deviation in statistics beyond the set threshold, it picks a different sub-plan for those steps. This improves the ability of the query-processing engine to generate better execution plans.

What happens in adaptive plan execution?

In Oracle12c, the optimizer generates dynamic plans. A dynamic plan is an execution plan that has many built-in sub-plans. A sub-plan is a portion of plan that the optimizer can switch to as an alternative at run time. When the first execution starts, the optimizer observes statistics at various critical stages in the plan. An optimizer makes a final decision about the sub-plan based on observations made during the execution up to this point.

Going deeper into the logic for the dynamic plan, the optimizer actually places the statistics collected at various critical stages in the plan. These critical stages are the places in the plan where the optimizer has to join two tables or where the optimizer has to decide upon the optimal degree of parallelism. During the execution of the plan, the statistics collector buffers a portion of the rows. The portion of the plan preceding the statistics collector can have alternative sub-plans, each of which is valid for the subset of possible values returned by the collector. This means that each of the sub-plans has a different threshold value. Based on the data returned by the statistics collector, a sub-plan is chosen which falls in the required threshold.

For example, an optimizer can insert a code to collect statistics before joining two tables, during the query plan building phase. It can have multiple sub-plans based on the type of join it can perform between two tables. If the number of rows returned by the statistics collector on the first table is less than the threshold value, then the optimizer might go with the sub-plan containing the nested loop join. But if the number of rows returned by the statistics collector is above the threshold values, then the optimizer might choose the second sub-plan to go with the hash join.

After the optimizer chooses a sub-plan, buffering is disabled and the statistics collector stops collecting rows and passes them through instead. On subsequent executions of the same SQL, the optimizer stops buffering and chooses the same plan instead.

With dynamic plans, the optimizer adapts to poor plan choices and correct decisions are made at various steps during runtime. Instead of using predetermined execution plans, adaptive plans enable the optimizer to postpone the final plan decision until statement execution time.

Consider the following simple query:

SELECT  a.sales_rep, b.product, sum(a.amt)
FROM    sales a, product b
WHERE   a.product_id = b.product_id
GROUP BY a.sales_rep, b.product

When the query plan was built initially, the optimizer will put the statistics collector before making the join. So it will scan the first table (SALES) and, based on the number of rows returned, it might make a decision to select the correct type of join.

The following figure shows the statistics collector being put in at various stages:

Enabling adaptive execution plans

To enable adaptive execution plans, you need to fulfill the following conditions:

optimizer_features_enable should be set to the minimum of 12.1.0.1
optimizer_adapive_reporting_only should be set to FALSE (default)

If you set the OPTIMIZER_ADAPTIVE_REPORTING_ONLY parameter to TRUE, the adaptive execution plan feature runs in the reporting-only mode—it collects the information for adaptive optimization, but doesn’t actually use this information to change the execution plans.

You can find out if the final plan chosen was the default plan by looking at the column IS_RESOLVED_ADAPTIVE_PLAN in the view V$SQL. Join methods and parallel distribution methods are two areas where adaptive plans have been implemented by Oracle12c.

Adaptive execution plans and join methods

Here is an example that shows how the adaptive execution plan will look. Instead of simulating a new query in the database and checking if the adaptive plan has worked, I used one of the queries in the database that is already using the adaptive plan.

You can get many such queries if you check V$SQL with is_resolved_adaptive_plan = ‘Y’. The following queries will list all SQLs that are going for adaptive plans.

Select sql_id from v$sql where is_resolved_adaptive_plan = 'Y';

While evaluating the plan, the optimizer uses the cardinality of the join to select the superior join method. The statistics collector starts buffering the rows from the first table, and if the number of rows exceeds the threshold value, the optimizer chooses to go for a hash join. But if the rows are less than the threshold value, the optimizer goes for a nested loop join. The following is the resulting plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'dhpn35zupm8ck',cursor_child_no=>0;

Plan hash value: 3790265618

-------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name    | Rows  | Bytes | Cost (%CPU)| Time    |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |         |       |       |   445 (100)|         |
|   1 |  SORT ORDER BY                         |         |     1 |    73 |   445   (1)| 00:00:01|
|   2 |   NESTED LOOPS                         |         |     1 |    73 |   444   (0)| 00:00:01|
|   3 |    NESTED LOOPS                        |         |   151 |    73 |   444   (0)| 00:00:01|
|*  4 |     TABLE ACCESS BY INDEX ROWID BATCHED| OBJ$    |   151 |  7701 |   293   (0)| 00:00:01|
|*  5 |      INDEX FULL SCAN                   | I_OBJ3  |     1 |       |    20   (0)| 00:00:01|
|*  6 |     INDEX UNIQUE SCAN                  | I_TYPE2 |     1 |       |     0   (0)|         |
|*  7 |    TABLE ACCESS BY INDEX ROWID         | TYPE$   |     1 |    22 |     1   (0)| 00:00:01|
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter(SYSDATE@!-"O"."CTIME">.0007)
   5 - filter("O"."OID$" IS NOT NULL)
   6 - access("O"."OID$"="T"."TVOID")
   7 - filter(BITAND("T"."PROPERTIES",8388608)=8388608)

Note
-----
   - this is an adaptive plan

If we check this plan, we can see the notes section, and it tells us that this is an adaptive plan. It tells us that the optimizer must have started with some default plan based on the statistics in the tables and indexes, and during run time execution it changed the join method for a sub-plan. You can actually check which step optimizer has changed and at what point it has collected the statistics. You can display this using the new format of DBMS_XPLAN.DISPLAY_CURSOR – format => ‘adaptive’, resulting in the following:

DEO>SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'dhpn35zupm8ck',cursor_child_no=>0,format=>'adaptive'));

Plan hash value: 3790265618

------------------------------------------------------------------------------------------------------
|   Id  | Operation                                | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                         |         |       |       |   445 (100)|          |
|     1 |  SORT ORDER BY                           |         |     1 |    73 |   445   (1)| 00:00:01 |
|- *  2 |   HASH JOIN                              |         |     1 |    73 |   444   (0)| 00:00:01 |
|     3 |    NESTED LOOPS                          |         |     1 |    73 |   444   (0)| 00:00:01 |
|     4 |     NESTED LOOPS                         |         |   151 |    73 |   444   (0)| 00:00:01 |
|-    5 |      STATISTICS COLLECTOR                |         |       |       |            |          |
|  *  6 |       TABLE ACCESS BY INDEX ROWID BATCHED| OBJ$    |   151 |  7701 |   293   (0)| 00:00:01 |
|  *  7 |        INDEX FULL SCAN                   | I_OBJ3  |     1 |       |    20   (0)| 00:00:01 |
|  *  8 |      INDEX UNIQUE SCAN                   | I_TYPE2 |     1 |       |     0   (0)|          |
|  *  9 |     TABLE ACCESS BY INDEX ROWID          | TYPE$   |     1 |    22 |     1   (0)| 00:00:01 |
|- * 10 |    TABLE ACCESS FULL                     | TYPE$   |     1 |    22 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("O"."OID$"="T"."TVOID")
   6 - filter(SYSDATE@!-"O"."CTIME">.0007)
   7 - filter("O"."OID$" IS NOT NULL)
   8 - access("O"."OID$"="T"."TVOID")
   9 - filter(BITAND("T"."PROPERTIES",8388608)=8388608)
  10 - filter(BITAND("T"."PROPERTIES",8388608)=8388608)

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)

In this output, you can see that it has given three extra steps. Steps 2, 5, and 10 are extra. But these steps were present in the original plan when the query started. Initially, the optimizer generated a plan with a hash join on the outer tables. During runtime, the optimizer started collecting rows returned from OBJ$ table (Step 6), as we can see the STATISTICS COLLECTOR at step 5. Once the rows are buffered, the optimizer came to know that the number of rows returned by the OBJ$ table are less than the threshold and so it can go for a nested loop join instead of a hash join. The rows indicated by – in the beginning belong to the original plan, and they are removed from the final plan. Instead of those records, we have three new steps added—Steps 3, 8, and 9. Step 10 of the full table scan on the TYPE$ table is changed to an index unique scan of I_TYPE2, followed by the table accessed by index rowed at Step 9.

Adaptive plans and parallel distribution methods

Adaptive plans are also useful in adapting from bad distributing methods when running the SQL in parallel. Parallel execution often requires data redistribution to perform parallel sorts, joins, and aggregates. The database can choose from among multiple data distribution methods to perform these options. The number of rows to be distributed determines the data distribution method, along with the number of parallel server processes.

If many parallel server processes distribute only a few rows, the database chooses a broadcast distribution method and sends the entire result set to all the parallel server processes. On the other hand, if a few processes distribute many rows, the database distributes the rows equally among the parallel server processes by choosing a “hash” distribution method.

In adaptive plans, the optimizer does not commit to a specific broadcast method. Instead, the optimizer starts with an adaptive parallel data distribution technique called hybrid data distribution. It places a statistics collector to buffer rows returned by the table. Based on the number of rows returned, the optimizer decides the distribution method. If the rows returned by the result are less than the threshold, the data distribution method switches to broadcast distribution. If the rows returned by the table are more than the threshold, the data distribution method switches to hash distribution.