I gave a SQL presentation a couple of years ago.
There I mentioned the possibility to restrict an outer join when using ANSI joins.
Every now and then seen such an join condition. Maybe even written myself.
This week such a query has been bothering me with a long running nested loops access path. This was a simpler query than the one with
outer join exists predicate I have mentioned earlier.
While dealing with it I saw a quite funny looking predicate.
drop table aa;
drop table bb;
create table aa as
select rownum id
, nullif(mod(rownum,3),1) val
from dual connect by level < 12;
create table bb as
select rownum id
, nullif(mod(rownum,3),0)+10 val
from dual connect by level < 9;
alter table aa modify id not null;
alter table bb modify id not null;
select aa.* from aa;
ID VAL
---------- ----------
1
2 2
3 0
4
5 2
6 0
7
8 2
9 0
10
11 2
select bb.* from bb;
ID VAL
---------- ----------
1 11
2 12
3
4 11
5 12
6
7 11
8 12
Desired results:
ID VAL ID VAL
---------- ---------- ---------- ----------
1
2 2 2 12
3 0
4
5 2 5 12
6 0
7
8 2 8 12
9 0
10
11 2
A normal outer join uses hash join plan. Adding the predicate aa.val = 2 to the join condition I get the result I want.
select *
from aa
left outer join bb
on aa.id=bb.id
;
-----------------------------------
| Id | Operation | Name |
-----------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | HASH JOIN OUTER | |
| 2 | TABLE ACCESS FULL| AA |
| 3 | TABLE ACCESS FULL| BB |
-----------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("AA"."ID"="BB"."ID")
select *
from aa
left outer join bb
on aa.id=bb.id and aa.val = 2
;
-------------------------------------
| Id | Operation | Name |
-------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | NESTED LOOPS OUTER | |
| 2 | TABLE ACCESS FULL | AA |
| 3 | VIEW | |
|* 4 | FILTER | |
|* 5 | TABLE ACCESS FULL| BB |
-------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("AA"."VAL"=2)
5 - filter("AA"."ID"="BB"."ID")
But the 11.1.0.7 optimizer goes to a nested loops path.
My problem is that the table BB is kind of big.
Together with plan row 4 filter is not filtering out so many rows the execution is full table scanning the table bb too many times.
Throwing the same query to 11.2.0.3 optimizer I get a hash join access path. Nice. That was something I was hoping for. But look at the access predicate.
-----------------------------------
| Id | Operation | Name |
-----------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | HASH JOIN OUTER | |
| 2 | TABLE ACCESS FULL| AA |
| 3 | TABLE ACCESS FULL| BB |
-----------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("AA"."ID"="BB"."ID" AND "AA"."VAL"=CASE WHEN ("BB"."ID"
IS NOT NULL) THEN 2 ELSE 2 END )
Obviously the outer join (+) are missing next to BB.ID. You need to read it from the word OUTER next to the HASH JOIN.
Sqldeveloper is able to visualize the predicate differently.
AA.ID=BB.ID(+) AND AA.VAL=CASE WHEN (BB.ID(+) IS NOT NULL) THEN 2 ELSE 2 END
But that case statement seems kind of weird. "If something then constant otherwise the same constant".
Should that case statement not be possible to be optimized to be a simple constant 2?
I rewrote my query to use the old style join. And with 11.2.0.3 the plan and results stay the same. But going back to 11.1.0.7 things change.
select aa.*,bb.*
from aa, bb
where aa.id=bb.id(+)
and aa.val = case when (bb.id(+) is not null) then 2 else 2 end
;
-----------------------------------
| Id | Operation | Name |
-----------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | HASH JOIN OUTER | |
|* 2 | TABLE ACCESS FULL| AA |
| 3 | TABLE ACCESS FULL| BB |
-----------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("AA"."ID"="BB"."ID")
2 - filter("AA"."VAL"=2)
The case statement is optimized as a simple constant and used as a filter staright to the table AA.
Query results change compared to 11.2.0.3 results.
Actually the thing I wanted out is without the else part in the predicate.
select aa.*,bb.*
from aa, bb
where aa.id=bb.id(+)
and aa.val = case when (bb.id(+) is not null) then 2 end
;
-----------------------------------
| Id | Operation | Name |
-----------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | HASH JOIN OUTER | |
| 2 | TABLE ACCESS FULL| AA |
| 3 | TABLE ACCESS FULL| BB |
-----------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("AA"."ID"="BB"."ID" AND "AA"."VAL"=CASE WHEN ("BB"."ID"
IS NOT NULL) THEN 2 END )
The results are what I was requesting.
This may be rewoten back to ANSI join that is understood also by 11.1.0.7 optimizer and the execution goes to a hash join path.
select aa.*,bb.*
from aa
left outer join bb
on aa.id=bb.id
and aa.val = case when (bb.id is not null) then 2 end
;
I am satisfied with the results. Well throw another problem to the optimizers. How about nulls.
Change the "and aa.val =" to "and aa.val is null" and even 11.2.0.3 is not able to go to a hash join plan. Here is an problematic nested loops access path query and a dirty trick on a way to a hash path.
select aa.*,bb.*
from aa
left outer join bb
on aa.id=bb.id
and aa.val is null
;
select *
from aa
left outer join bb
on aa.id=bb.id
and coalesce(nullif(aa.val,bb.id),aa.val) is null
;
No comments:
Post a Comment