Rafu on db: 2010/12

At my last post I talked about different ways to write SQL. Here is an example of such situation. There are three different queries written to get the similar results out of two tables. Here I am driving a road. So the information here is kind of spatial stored in relational way.



create table roadpoint(roadpoint_id number constraint roadpoint_pk primary key, roadnumber number not null, distance number not null);

create table region(region_id number constraint region_pk primary key, startpoint references roadpoint not null, endpoint references roadpoint not null);

insert into roadpoint select level,1,level*10 from dual connect by level<=20000;  


insert into region values (1,1,2);  
insert into region values (2,4,6);  
insert into region values (3,7,10);  
insert into region values (4,40,50);  
insert into region values (5,4006,4010);  

exec dbms_stats.gather_table_Stats(user,'REGION');  
exec dbms_stats.gather_table_Stats(user,'ROADPOINT');    

select * 
  from region reg, roadpoint st, roadpoint en 
 where reg.startpoint = st.roadpoint_id
   and reg.endpoint = en.roadpoint_id ;

The problem here is that roadpoints inside region are not selected.

The needed rows may be impressed with a query like following.




select reg.region_id, rp.roadpoint_id,rp.roadnumber,rp.distance
from region reg, roadpoint rp
where exists (select null from roadpoint st, roadpoint en where reg.startpoint = st.roadpoint_id
 and reg.endpoint = en.roadpoint_id
 and rp.distance between st.distance and en.distance
);

SQL_ID  2q2uj40br0df2, child number 0
-------------------------------------
select reg.region_id, rp.roadpoint_id,rp.roadnumber,rp.distance from
region reg, roadpoint rp where exists (select null from roadpoint st,
roadpoint en where reg.startpoint = st.roadpoint_id   and reg.endpoint
= en.roadpoint_id   and rp.distance between st.distance and en.distance
)

Plan hash value: 2380388577

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |      1 |        |     25 |00:00:00.55 |     208K|       |       |          |
|*  1 |  FILTER                       |              |      1 |        |     25 |00:00:00.55 |     208K|       |       |          |
|   2 |   MERGE JOIN CARTESIAN        |              |      1 |    100K|    100K|00:00:00.01 |      60 |       |       |          |
|   3 |    TABLE ACCESS FULL          | REGION       |      1 |      5 |      5 |00:00:00.01 |       7 |       |       |          |
|   4 |    BUFFER SORT                |              |      5 |  20000 |    100K|00:00:00.01 |      53 |   690K|   486K|  613K (0)|
|   5 |     TABLE ACCESS FULL         | ROADPOINT    |      1 |  20000 |  20000 |00:00:00.01 |      53 |       |       |          |
|   6 |   NESTED LOOPS                |              |    100K|      1 |     25 |00:00:00.35 |     208K|       |       |          |
|*  7 |    TABLE ACCESS BY INDEX ROWID| ROADPOINT    |    100K|      1 |   4078 |00:00:00.31 |     200K|       |       |          |
|*  8 |     INDEX UNIQUE SCAN         | ROADPOINT_PK |    100K|      1 |    100K|00:00:00.10 |     100K|       |       |          |
|*  9 |    TABLE ACCESS BY INDEX ROWID| ROADPOINT    |   4078 |      1 |     25 |00:00:00.01 |    8167 |       |       |          |
|* 10 |     INDEX UNIQUE SCAN         | ROADPOINT_PK |   4078 |      1 |   4078 |00:00:00.01 |    4089 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  1 - filter( IS NOT NULL)
  7 - filter("EN"."DISTANCE">=:B1)
  8 - access("EN"."ROADPOINT_ID"=:B1)
  9 - filter("ST"."DISTANCE"<=:B1)
  10 - access("ST"."ROADPOINT_ID"=:B1)

Taking a look the generated plan one might consider something else. Here are two alternative cursors c_join and c_drivetheroad. Those are measured with Tom Kytes runstats



DECLARE
  CURSOR c_join
  IS
select reg.region_id, rp.roadpoint_id,rp.roadnumber,rp.distance
from region reg, roadpoint st, roadpoint en, roadpoint rp
where reg.startpoint = st.roadpoint_id
 and reg.endpoint = en.roadpoint_id
 and rp.distance between st.distance and en.distance
;
--
  CURSOR c_drivetheroad
  IS
select lv region_id,roadpoint_id,roadnumber,distance
from (
select region_id
,last_value(region_id ignore nulls)over(partition by roadnumber order by rp.distance) lv
,last_value(region_id ignore nulls)over(partition by roadnumber order by rp.distance desc) fv
,staend
,point
,rp.roadnumber
,roadpoint_id
,distance
from (
select *
from (
select reg.region_id, st.distance startdistance, en.distance enddistance
from region reg, roadpoint st, roadpoint en
where reg.startpoint = st.roadpoint_id
 and reg.endpoint = en.roadpoint_id
)
unpivot (point for staend in (startdistance as '1', enddistance as '-1'))
) re, roadpoint rp
where re.point (+)= rp.distance
) qw
where lv=fv
;
--
BEGIN
   runstats_pkg.rs_start;
  FOR i IN 1 .. 100 LOOP
     FOR rec IN c_join LOOP
        NULL;
     END LOOP;
  END LOOP;
   runstats_pkg.rs_middle;
  FOR i IN 1 .. 100 LOOP
     FOR rec IN c_drivetheroad LOOP
        NULL;
     END LOOP;
  END LOOP;
   runstats_pkg.rs_stop;
END;
/

Before looking at runstats results lets see the plans used. Does it seem like the first join method look like a bit easier than the second alternative. A-rows in the first one are 25 at most as those are 20000 in the second alternative.



SQL_ID  fu5n5fqtksa7b, child number 0
-------------------------------------
select reg.region_id, rp.roadpoint_id,rp.roadnumber,rp.distance from
region reg, roadpoint st, roadpoint en, roadpoint rp where
reg.startpoint = st.roadpoint_id   and reg.endpoint = en.roadpoint_id  
and rp.distance between st.distance and en.distance

Plan hash value: 2085902882

---------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |              |      1 |        |     25 |00:00:00.02 |     296 |
|   1 |  NESTED LOOPS                  |              |      1 |  25003 |     25 |00:00:00.02 |     296 |
|   2 |   NESTED LOOPS                 |              |      1 |      5 |      5 |00:00:00.01 |      31 |
|   3 |    NESTED LOOPS                |              |      1 |      5 |      5 |00:00:00.01 |      19 |
|   4 |     TABLE ACCESS FULL          | REGION       |      1 |      5 |      5 |00:00:00.01 |       7 |
|   5 |     TABLE ACCESS BY INDEX ROWID| ROADPOINT    |      5 |      1 |      5 |00:00:00.01 |      12 |
|*  6 |      INDEX UNIQUE SCAN         | ROADPOINT_PK |      5 |      1 |      5 |00:00:00.01 |       7 |
|   7 |    TABLE ACCESS BY INDEX ROWID | ROADPOINT    |      5 |      1 |      5 |00:00:00.01 |      12 |
|*  8 |     INDEX UNIQUE SCAN          | ROADPOINT_PK |      5 |      1 |      5 |00:00:00.01 |       7 |
|*  9 |   TABLE ACCESS FULL            | ROADPOINT    |      5 |   5001 |     25 |00:00:00.02 |     265 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  6 - access("REG"."STARTPOINT"="ST"."ROADPOINT_ID")
  8 - access("REG"."ENDPOINT"="EN"."ROADPOINT_ID")
  9 - filter(("RP"."DISTANCE">="ST"."DISTANCE" AND "RP"."DISTANCE"<="EN"."DISTANCE"))    



SQL_ID  8as0wh80pucty, child number 0
-------------------------------------
select lv region_id,roadpoint_id,roadnumber,distance  from (  select 
region_id  ,last_value(region_id ignore nulls)over(partition by 
roadnumber order by rp.distance) lv  ,last_value(region_id ignore 
nulls)over(partition by roadnumber order by rp.distance desc) fv  
,staend  ,point  ,rp.roadnumber  ,roadpoint_id  ,distance  from (  
select *  from (  select reg.region_id, st.distance startdistance, 
en.distance enddistance  from region reg, roadpoint st, roadpoint en  
where reg.startpoint = st.roadpoint_id    and reg.endpoint = 
en.roadpoint_id  )  unpivot (point for staend in (startdistance as '1', 
enddistance as '-1'))  ) re, roadpoint rp  where re.point (+)= 
rp.distance  ) qw  where lv=fv
 
Plan hash value: 1653289237
 
------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |              |      1 |        |     25 |00:00:00.08 |      84 |       |       |          |
|*  1 |  VIEW                                |              |      1 |  20000 |     25 |00:00:00.08 |      84 |       |       |          |
|   2 |   WINDOW SORT                        |              |      1 |  20000 |  20000 |00:00:00.08 |      84 |   903K|   523K|  802K (0)|
|   3 |    WINDOW SORT                       |              |      1 |  20000 |  20000 |00:00:00.06 |      84 |   832K|   511K|  739K (0)|
|*  4 |     HASH JOIN RIGHT OUTER            |              |      1 |  20000 |  20000 |00:00:00.02 |      84 |   968K|   968K|  797K (0)|
|*  5 |      VIEW                            |              |      1 |     10 |     10 |00:00:00.01 |      31 |       |       |          |
|   6 |       UNPIVOT                        |              |      1 |        |     10 |00:00:00.01 |      31 |       |       |          |
|   7 |        NESTED LOOPS                  |              |      1 |        |      5 |00:00:00.01 |      31 |       |       |          |
|   8 |         NESTED LOOPS                 |              |      1 |      5 |      5 |00:00:00.01 |      26 |       |       |          |
|   9 |          NESTED LOOPS                |              |      1 |      5 |      5 |00:00:00.01 |      19 |       |       |          |
|  10 |           TABLE ACCESS FULL          | REGION       |      1 |      5 |      5 |00:00:00.01 |       7 |       |       |          |
|  11 |           TABLE ACCESS BY INDEX ROWID| ROADPOINT    |      5 |      1 |      5 |00:00:00.01 |      12 |       |       |          |
|* 12 |            INDEX UNIQUE SCAN         | ROADPOINT_PK |      5 |      1 |      5 |00:00:00.01 |       7 |       |       |          |
|* 13 |          INDEX UNIQUE SCAN           | ROADPOINT_PK |      5 |      1 |      5 |00:00:00.01 |       7 |       |       |          |
|  14 |         TABLE ACCESS BY INDEX ROWID  | ROADPOINT    |      5 |      1 |      5 |00:00:00.01 |       5 |       |       |          |
|  15 |      TABLE ACCESS FULL               | ROADPOINT    |      1 |  20000 |  20000 |00:00:00.02 |      53 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("LV"="FV")
   4 - access("unpivot_view_014"."POINT"="RP"."DISTANCE")
   5 - filter("unpivot_view_014"."POINT" IS NOT NULL)
  12 - access("REG"."STARTPOINT"="ST"."ROADPOINT_ID")
  13 - access("REG"."ENDPOINT"="EN"."ROADPOINT_ID")

And now some runstats numbers



Run1 ran in 126 hsecs
Run2 ran in 800 hsecs
run 1 ran in 15,75% of the time

Name                                  Run1        Run2        Diff
STAT...table scan blocks gotte      25,500       5,500     -20,000
STAT...no work - consistent re      25,955       5,714     -20,241
STAT...consistent gets from ca      27,661       6,221     -21,440
STAT...consistent gets              29,930       8,425     -21,505
STAT...consistent gets from ca      29,930       8,425     -21,505
STAT...session logical reads        29,990       8,459     -21,531
LATCH.cache buffers chains          57,767      15,172     -42,595
STAT...session uga memory max      123,452     410,072     286,620
STAT...session pga memory max      131,072     524,288     393,216
STAT...sorts (rows)                     33   4,000,000   3,999,967
STAT...table scan rows gotten   10,000,500   2,000,500  -8,000,000

Run1 latches total versus runs -- difference and pct
Run1        Run2        Diff       Pct
60,427      20,974     -39,453    288.10%

We actually read less rows and sort them and use memory a bit more in the c_drivetheroad version. The cursor c_join seems to be faster in this case. But the note that the number of a-rows vs rows gotten are not in sync. Also logical reads are one magnitude more in c_join run.

Iggy Fernandez is writing about SQL Which Query is Better?—Part III. I am checking here queries from the original article that he did not include in his post. And something else.

HASH JOIN plans


SELECT lnameFROM personnel
WHERE 199170 = any (
  SELECT salary 
  FROM payroll 
  WHERE personnel.empid = payroll.empid) ;

SELECT lname
FROM personnel
WHERE 199170 in ( 
  SELECT salary 
  FROM payroll 
  WHERE personnel.empid = payroll.empid) ;

SELECT lname
FROM personnel
WHERE empid = any (
  SELECT empid 
  FROM payroll
  WHERE salary = 199170);

The HASH JOIN RIGHT SEMI plan:


SELECT lname

FROM personnel

WHERE 0 < (

SELECT count(*)

FROM payroll

WHERE personnel.empid = payroll.empid AND salary = 199170);

Mr Date mentioned in EMEA Harmony 2010 about using any operator that it is behaving relationally. Even thou one should not start using it.

Two additional queries resulting HASH JOIN plan:

SELECT lname

FROM personnel, (select empid from payroll where salary = 199170) payr

WHERE personnel.empid = payr.empid;

SELECT lname

FROM personnel inner join payroll

ON personnel.empid = payroll.empid and salary = 199170;

HASH JOIN RIGHT ANTI plan also possible plan for the question here.

SELECT lname

FROM personnel

WHERE 0 = (

SELECT count(*)

FROM payroll

WHERE personnel.empid = payroll.empid

AND salary != 199170);

SQL_ID b90mkx99aux26, child number 1

-------------------------------------

SELECT lname FROM personnel WHERE 0 = (SELECT count(*) FROM payroll

WHERE personnel.empid = payroll.empid AND salary != 199170)

Plan hash value: 103534934

--------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | | 1004 |00:00:00.08 | 261 |

|* 1 | HASH JOIN RIGHT ANTI| | 1 | 9900 | 1004 |00:00:00.08 | 261 |

|* 2 | TABLE ACCESS FULL | PAYROLL | 1 | 8910 | 8896 |00:00:00.01 | 38 |

| 3 | TABLE ACCESS FULL | PERSONNEL | 1 | 9900 | 9900 |00:00:00.01 | 223 |

--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("PERSONNEL"."EMPID"="PAYROLL"."EMPID")

2 - filter("SALARY"!=199170)

Without this rehersal I would not have writen anti join this way. Actually I would like to see an alternative sql resulting this plan.
The original article talked also about indexing the salary column.


create index payroll_salary_idx on payroll(salary);

With this data - no changes in the query plans. Trying a fat one


create index payroll_salary_fat_idx on payroll(salary,empid);

The only change in plans is that FULL TABLE SCAN of payroll changes to INDEX payroll_salary_fat_idx RANGE SCAN in some queries.The join method stays always the same.

Rafu on db

2010-12-14

Driving a road

2010-12-09

Many ways writing a query

Blog Archive

About Me