2010-12-14

Driving a road

At my last post I talked about different ways to write SQL. Here is an example of such situation. There are three different queries written to get the similar results out of two tables. Here I am driving a road. So the information here is kind of spatial stored in relational way.



create table roadpoint(roadpoint_id number constraint roadpoint_pk primary key, roadnumber number not null, distance number not null);

create table region(region_id number constraint region_pk primary key, startpoint references roadpoint not null, endpoint references roadpoint not null);

insert into roadpoint select level,1,level*10 from dual connect by level<=20000;


insert into region values (1,1,2);
insert into region values (2,4,6);
insert into region values (3,7,10);
insert into region values (4,40,50);
insert into region values (5,4006,4010);

exec dbms_stats.gather_table_Stats(user,'REGION');
exec dbms_stats.gather_table_Stats(user,'ROADPOINT');

select *
from region reg, roadpoint st, roadpoint en
where reg.startpoint = st.roadpoint_id
and reg.endpoint = en.roadpoint_id ;


The problem here is that roadpoints inside region are not selected.

The needed rows may be impressed with a query like following.




select reg.region_id, rp.roadpoint_id,rp.roadnumber,rp.distance
from region reg, roadpoint rp
where exists (select null from roadpoint st, roadpoint en where reg.startpoint = st.roadpoint_id
and reg.endpoint = en.roadpoint_id
and rp.distance between st.distance and en.distance
);

SQL_ID 2q2uj40br0df2, child number 0
-------------------------------------
select reg.region_id, rp.roadpoint_id,rp.roadnumber,rp.distance from
region reg, roadpoint rp where exists (select null from roadpoint st,
roadpoint en where reg.startpoint = st.roadpoint_id and reg.endpoint
= en.roadpoint_id and rp.distance between st.distance and en.distance
)

Plan hash value: 2380388577

-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 25 |00:00:00.55 | 208K| | | |
|* 1 | FILTER | | 1 | | 25 |00:00:00.55 | 208K| | | |
| 2 | MERGE JOIN CARTESIAN | | 1 | 100K| 100K|00:00:00.01 | 60 | | | |
| 3 | TABLE ACCESS FULL | REGION | 1 | 5 | 5 |00:00:00.01 | 7 | | | |
| 4 | BUFFER SORT | | 5 | 20000 | 100K|00:00:00.01 | 53 | 690K| 486K| 613K (0)|
| 5 | TABLE ACCESS FULL | ROADPOINT | 1 | 20000 | 20000 |00:00:00.01 | 53 | | | |
| 6 | NESTED LOOPS | | 100K| 1 | 25 |00:00:00.35 | 208K| | | |
|* 7 | TABLE ACCESS BY INDEX ROWID| ROADPOINT | 100K| 1 | 4078 |00:00:00.31 | 200K| | | |
|* 8 | INDEX UNIQUE SCAN | ROADPOINT_PK | 100K| 1 | 100K|00:00:00.10 | 100K| | | |
|* 9 | TABLE ACCESS BY INDEX ROWID| ROADPOINT | 4078 | 1 | 25 |00:00:00.01 | 8167 | | | |
|* 10 | INDEX UNIQUE SCAN | ROADPOINT_PK | 4078 | 1 | 4078 |00:00:00.01 | 4089 | | | |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter( IS NOT NULL)
7 - filter("EN"."DISTANCE">=:B1)
8 - access("EN"."ROADPOINT_ID"=:B1)
9 - filter("ST"."DISTANCE"<=:B1)
10 - access("ST"."ROADPOINT_ID"=:B1)


Taking a look the generated plan one might consider something else. Here are two alternative cursors c_join and c_drivetheroad. Those are measured with Tom Kytes runstats



DECLARE
CURSOR c_join
IS
select reg.region_id, rp.roadpoint_id,rp.roadnumber,rp.distance
from region reg, roadpoint st, roadpoint en, roadpoint rp
where reg.startpoint = st.roadpoint_id
and reg.endpoint = en.roadpoint_id
and rp.distance between st.distance and en.distance
;
--
CURSOR c_drivetheroad
IS
select lv region_id,roadpoint_id,roadnumber,distance
from (
select region_id
,last_value(region_id ignore nulls)over(partition by roadnumber order by rp.distance) lv
,last_value(region_id ignore nulls)over(partition by roadnumber order by rp.distance desc) fv
,staend
,point
,rp.roadnumber
,roadpoint_id
,distance
from (
select *
from (
select reg.region_id, st.distance startdistance, en.distance enddistance
from region reg, roadpoint st, roadpoint en
where reg.startpoint = st.roadpoint_id
and reg.endpoint = en.roadpoint_id
)
unpivot (point for staend in (startdistance as '1', enddistance as '-1'))
) re, roadpoint rp
where re.point (+)= rp.distance
) qw
where lv=fv
;
--
BEGIN
runstats_pkg.rs_start;
FOR i IN 1 .. 100 LOOP
FOR rec IN c_join LOOP
NULL;
END LOOP;
END LOOP;
runstats_pkg.rs_middle;
FOR i IN 1 .. 100 LOOP
FOR rec IN c_drivetheroad LOOP
NULL;
END LOOP;
END LOOP;
runstats_pkg.rs_stop;
END;
/



Before looking at runstats results lets see the plans used. Does it seem like the first join method look like a bit easier than the second alternative. A-rows in the first one are 25 at most as those are 20000 in the second alternative.



SQL_ID fu5n5fqtksa7b, child number 0
-------------------------------------
select reg.region_id, rp.roadpoint_id,rp.roadnumber,rp.distance from
region reg, roadpoint st, roadpoint en, roadpoint rp where
reg.startpoint = st.roadpoint_id and reg.endpoint = en.roadpoint_id
and rp.distance between st.distance and en.distance

Plan hash value: 2085902882

---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 25 |00:00:00.02 | 296 |
| 1 | NESTED LOOPS | | 1 | 25003 | 25 |00:00:00.02 | 296 |
| 2 | NESTED LOOPS | | 1 | 5 | 5 |00:00:00.01 | 31 |
| 3 | NESTED LOOPS | | 1 | 5 | 5 |00:00:00.01 | 19 |
| 4 | TABLE ACCESS FULL | REGION | 1 | 5 | 5 |00:00:00.01 | 7 |
| 5 | TABLE ACCESS BY INDEX ROWID| ROADPOINT | 5 | 1 | 5 |00:00:00.01 | 12 |
|* 6 | INDEX UNIQUE SCAN | ROADPOINT_PK | 5 | 1 | 5 |00:00:00.01 | 7 |
| 7 | TABLE ACCESS BY INDEX ROWID | ROADPOINT | 5 | 1 | 5 |00:00:00.01 | 12 |
|* 8 | INDEX UNIQUE SCAN | ROADPOINT_PK | 5 | 1 | 5 |00:00:00.01 | 7 |
|* 9 | TABLE ACCESS FULL | ROADPOINT | 5 | 5001 | 25 |00:00:00.02 | 265 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - access("REG"."STARTPOINT"="ST"."ROADPOINT_ID")
8 - access("REG"."ENDPOINT"="EN"."ROADPOINT_ID")
9 - filter(("RP"."DISTANCE">="ST"."DISTANCE" AND "RP"."DISTANCE"<="EN"."DISTANCE"))



SQL_ID 8as0wh80pucty, child number 0
-------------------------------------
select lv region_id,roadpoint_id,roadnumber,distance from ( select
region_id ,last_value(region_id ignore nulls)over(partition by
roadnumber order by rp.distance) lv ,last_value(region_id ignore
nulls)over(partition by roadnumber order by rp.distance desc) fv
,staend ,point ,rp.roadnumber ,roadpoint_id ,distance from (
select * from ( select reg.region_id, st.distance startdistance,
en.distance enddistance from region reg, roadpoint st, roadpoint en
where reg.startpoint = st.roadpoint_id and reg.endpoint =
en.roadpoint_id ) unpivot (point for staend in (startdistance as '1',
enddistance as '-1')) ) re, roadpoint rp where re.point (+)=
rp.distance ) qw where lv=fv

Plan hash value: 1653289237

------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 25 |00:00:00.08 | 84 | | | |
|* 1 | VIEW | | 1 | 20000 | 25 |00:00:00.08 | 84 | | | |
| 2 | WINDOW SORT | | 1 | 20000 | 20000 |00:00:00.08 | 84 | 903K| 523K| 802K (0)|
| 3 | WINDOW SORT | | 1 | 20000 | 20000 |00:00:00.06 | 84 | 832K| 511K| 739K (0)|
|* 4 | HASH JOIN RIGHT OUTER | | 1 | 20000 | 20000 |00:00:00.02 | 84 | 968K| 968K| 797K (0)|
|* 5 | VIEW | | 1 | 10 | 10 |00:00:00.01 | 31 | | | |
| 6 | UNPIVOT | | 1 | | 10 |00:00:00.01 | 31 | | | |
| 7 | NESTED LOOPS | | 1 | | 5 |00:00:00.01 | 31 | | | |
| 8 | NESTED LOOPS | | 1 | 5 | 5 |00:00:00.01 | 26 | | | |
| 9 | NESTED LOOPS | | 1 | 5 | 5 |00:00:00.01 | 19 | | | |
| 10 | TABLE ACCESS FULL | REGION | 1 | 5 | 5 |00:00:00.01 | 7 | | | |
| 11 | TABLE ACCESS BY INDEX ROWID| ROADPOINT | 5 | 1 | 5 |00:00:00.01 | 12 | | | |
|* 12 | INDEX UNIQUE SCAN | ROADPOINT_PK | 5 | 1 | 5 |00:00:00.01 | 7 | | | |
|* 13 | INDEX UNIQUE SCAN | ROADPOINT_PK | 5 | 1 | 5 |00:00:00.01 | 7 | | | |
| 14 | TABLE ACCESS BY INDEX ROWID | ROADPOINT | 5 | 1 | 5 |00:00:00.01 | 5 | | | |
| 15 | TABLE ACCESS FULL | ROADPOINT | 1 | 20000 | 20000 |00:00:00.02 | 53 | | | |
------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("LV"="FV")
4 - access("unpivot_view_014"."POINT"="RP"."DISTANCE")
5 - filter("unpivot_view_014"."POINT" IS NOT NULL)
12 - access("REG"."STARTPOINT"="ST"."ROADPOINT_ID")
13 - access("REG"."ENDPOINT"="EN"."ROADPOINT_ID")




And now some runstats numbers



Run1 ran in 126 hsecs
Run2 ran in 800 hsecs
run 1 ran in 15,75% of the time

Name Run1 Run2 Diff
STAT...table scan blocks gotte 25,500 5,500 -20,000
STAT...no work - consistent re 25,955 5,714 -20,241
STAT...consistent gets from ca 27,661 6,221 -21,440
STAT...consistent gets 29,930 8,425 -21,505
STAT...consistent gets from ca 29,930 8,425 -21,505
STAT...session logical reads 29,990 8,459 -21,531
LATCH.cache buffers chains 57,767 15,172 -42,595
STAT...session uga memory max 123,452 410,072 286,620
STAT...session pga memory max 131,072 524,288 393,216
STAT...sorts (rows) 33 4,000,000 3,999,967
STAT...table scan rows gotten 10,000,500 2,000,500 -8,000,000

Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
60,427 20,974 -39,453 288.10%



We actually read less rows and sort them and use memory a bit more in the c_drivetheroad version. The cursor c_join seems to be faster in this case. But the note that the number of a-rows vs rows gotten are not in sync. Also logical reads are one magnitude more in c_join run.

2010-12-09

Many ways writing a query

Iggy Fernandez is writing about SQL Which Query is Better?—Part III. I am checking here queries from the original article that he did not include in his post. And something else.

HASH JOIN plans


SELECT lname
FROM personnel
WHERE 199170 = any (
SELECT salary
FROM payroll
WHERE personnel.empid = payroll.empid) ;

SELECT lname
FROM personnel
WHERE 199170 in (
SELECT salary
FROM payroll
WHERE personnel.empid = payroll.empid) ;

SELECT lname
FROM personnel
WHERE empid = any (
SELECT empid
FROM payroll
WHERE salary = 199170);


The HASH JOIN RIGHT SEMI plan:



SELECT lname
FROM personnel
WHERE 0 < (
SELECT count(*)
FROM payroll
WHERE personnel.empid = payroll.empid AND salary = 199170);


Mr Date mentioned in EMEA Harmony 2010 about using any operator that it is behaving relationally. Even thou one should not start using it.

Two additional queries resulting HASH JOIN plan:



SELECT lname
FROM personnel, (select empid from payroll where salary = 199170) payr
WHERE personnel.empid = payr.empid;

SELECT lname
FROM personnel inner join payroll
ON personnel.empid = payroll.empid and salary = 199170;




HASH JOIN RIGHT ANTI plan also possible plan for the question here.



SELECT lname
FROM personnel
WHERE 0 = (
SELECT count(*)
FROM payroll
WHERE personnel.empid = payroll.empid
AND salary != 199170);

SQL_ID b90mkx99aux26, child number 1
-------------------------------------
SELECT lname FROM personnel WHERE 0 = (SELECT count(*) FROM payroll
WHERE personnel.empid = payroll.empid AND salary != 199170)
Plan hash value: 103534934

--------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1004 |00:00:00.08 | 261 |
|* 1 | HASH JOIN RIGHT ANTI| | 1 | 9900 | 1004 |00:00:00.08 | 261 |
|* 2 | TABLE ACCESS FULL | PAYROLL | 1 | 8910 | 8896 |00:00:00.01 | 38 |
| 3 | TABLE ACCESS FULL | PERSONNEL | 1 | 9900 | 9900 |00:00:00.01 | 223 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("PERSONNEL"."EMPID"="PAYROLL"."EMPID")
2 - filter("SALARY"!=199170)



Without this rehersal I would not have writen anti join this way. Actually I would like to see an alternative sql resulting this plan.
The original article talked also about indexing the salary column.


create index payroll_salary_idx on payroll(salary);


With this data - no changes in the query plans. Trying a fat one


create index payroll_salary_fat_idx on payroll(salary,empid);


The only change in plans is that FULL TABLE SCAN of payroll changes to INDEX payroll_salary_fat_idx RANGE SCAN in some queries.The join method stays always the same.

About Me

My photo
I am Timo Raitalaakso. I have been working since 2001 at Solita Oy as a Senior Database Specialist. My main focus is on projects involving Oracle database. Oracle ACE alumni 2012-2018. In this Rafu on db blog I write some interesting issues that evolves from my interaction with databases. Mainly Oracle.