2011-11-23

Filtering outer join to hash join plan

I gave a SQL presentation a couple of years ago. There I mentioned the possibility to restrict an outer join when using ANSI joins. Every now and then seen such an join condition. Maybe even written myself. This week such a query has been bothering me with a long running nested loops access path. This was a simpler query than the one with outer join exists predicate I have mentioned earlier. While dealing with it I saw a quite funny looking predicate.

drop table aa;

drop table bb;

create table aa as 
select rownum id
     , nullif(mod(rownum,3),1) val 
  from dual connect  by level < 12;

create table bb as 
select rownum id
     , nullif(mod(rownum,3),0)+10 val 
  from dual connect by level < 9;

alter table aa modify id not null;

alter table bb modify id not null;

select aa.* from aa;

        ID        VAL
---------- ----------
         1
         2          2
         3          0
         4
         5          2
         6          0
         7
         8          2
         9          0
        10
        11          2

select bb.* from bb;

        ID        VAL
---------- ----------
         1         11
         2         12
         3
         4         11
         5         12
         6
         7         11
         8         12

Desired results:
   
        ID        VAL         ID        VAL
---------- ---------- ---------- ----------
         1
         2          2          2         12
         3          0
         4
         5          2          5         12
         6          0
         7
         8          2          8         12
         9          0
        10
        11          2
A normal outer join uses hash join plan. Adding the predicate aa.val = 2 to the join condition I get the result I want.
select * 
  from aa 
  left outer join bb 
    on aa.id=bb.id
;

-----------------------------------
| Id  | Operation          | Name |
-----------------------------------
|   0 | SELECT STATEMENT   |      |
|*  1 |  HASH JOIN OUTER   |      |
|   2 |   TABLE ACCESS FULL| AA   |
|   3 |   TABLE ACCESS FULL| BB   |
-----------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("AA"."ID"="BB"."ID")
   
   
select * 
  from aa 
  left outer join bb 
    on aa.id=bb.id and aa.val = 2
;

-------------------------------------
| Id  | Operation            | Name |
-------------------------------------
|   0 | SELECT STATEMENT     |      |
|   1 |  NESTED LOOPS OUTER  |      |
|   2 |   TABLE ACCESS FULL  | AA   |
|   3 |   VIEW               |      |
|*  4 |    FILTER            |      |
|*  5 |     TABLE ACCESS FULL| BB   |
-------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("AA"."VAL"=2)
   5 - filter("AA"."ID"="BB"."ID")

But the 11.1.0.7 optimizer goes to a nested loops path. My problem is that the table BB is kind of big. Together with plan row 4 filter is not filtering out so many rows the execution is full table scanning the table bb too many times. Throwing the same query to 11.2.0.3 optimizer I get a hash join access path. Nice. That was something I was hoping for. But look at the access predicate.
-----------------------------------
| Id  | Operation          | Name |
-----------------------------------
|   0 | SELECT STATEMENT   |      |
|*  1 |  HASH JOIN OUTER   |      |
|   2 |   TABLE ACCESS FULL| AA   |
|   3 |   TABLE ACCESS FULL| BB   |
-----------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("AA"."ID"="BB"."ID" AND "AA"."VAL"=CASE  WHEN ("BB"."ID"
              IS NOT NULL) THEN 2 ELSE 2 END )

Obviously the outer join (+) are missing next to BB.ID. You need to read it from the word OUTER next to the HASH JOIN. Sqldeveloper is able to visualize the predicate differently.

AA.ID=BB.ID(+) AND AA.VAL=CASE  WHEN (BB.ID(+) IS NOT NULL) THEN 2 ELSE 2 END

But that case statement seems kind of weird. "If something then constant otherwise the same constant". Should that case statement not be possible to be optimized to be a simple constant 2? I rewrote my query to use the old style join. And with 11.2.0.3 the plan and results stay the same. But going back to 11.1.0.7 things change.

select aa.*,bb.* 
 from aa, bb 
where aa.id=bb.id(+) 
  and aa.val = case when (bb.id(+) is not null) then 2 else 2 end
;

-----------------------------------
| Id  | Operation          | Name |
-----------------------------------
|   0 | SELECT STATEMENT   |      |
|*  1 |  HASH JOIN OUTER   |      |
|*  2 |   TABLE ACCESS FULL| AA   |
|   3 |   TABLE ACCESS FULL| BB   |
-----------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("AA"."ID"="BB"."ID")
   2 - filter("AA"."VAL"=2)
   
The case statement is optimized as a simple constant and used as a filter staright to the table AA. Query results change compared to 11.2.0.3 results. Actually the thing I wanted out is without the else part in the predicate.
select aa.*,bb.* 
 from aa, bb 
where aa.id=bb.id(+) 
  and aa.val = case when (bb.id(+) is not null) then 2 end
;
-----------------------------------
| Id  | Operation          | Name |
-----------------------------------
|   0 | SELECT STATEMENT   |      |
|*  1 |  HASH JOIN OUTER   |      |
|   2 |   TABLE ACCESS FULL| AA   |
|   3 |   TABLE ACCESS FULL| BB   |
-----------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("AA"."ID"="BB"."ID" AND "AA"."VAL"=CASE  WHEN ("BB"."ID"
              IS NOT NULL) THEN 2 END )
The results are what I was requesting. This may be rewoten back to ANSI join that is understood also by 11.1.0.7 optimizer and the execution goes to a hash join path.
select aa.*,bb.* 
  from aa 
  left outer join bb 
    on aa.id=bb.id 
   and aa.val = case when (bb.id is not null) then 2 end
;
I am satisfied with the results. Well throw another problem to the optimizers. How about nulls. Change the "and aa.val =" to "and aa.val is null" and even 11.2.0.3 is not able to go to a hash join plan. Here is an problematic nested loops access path query and a dirty trick on a way to a hash path.

select aa.*,bb.*
  from aa 
  left outer join bb 
    on aa.id=bb.id 
   and aa.val is null
;

select * 
  from aa 
  left outer join bb 
    on aa.id=bb.id 
   and coalesce(nullif(aa.val,bb.id),aa.val) is null
;


No comments:

Post a Comment

About Me

My photo
I am Timo Raitalaakso. I have been working since 2001 at Solita Oy as a Senior Database Specialist. My main focus is on projects involving Oracle database. Oracle ACE alumni 2012-2018. In this Rafu on db blog I write some interesting issues that evolves from my interaction with databases. Mainly Oracle.