Improooooove not in query

Purpose :
Show that Hash_AJ hint for not in can improve a query lot when the column in not in comparision is nullable. if not null then one can use not exists also.

Time Needed : 10min

Many times not exists improves over not in. But check not in is not always equal to not exists. The thread is in the same folder.But read this it definately helps in any case with not in.

Preparation

change you optimizer mode if it is RULE based optimizer(alter session set optimizer_mode=choose)
drop table invdatew;
drop table investor;
create table invdatew
as
select 1 inv, a.*, sysdate dt from all_objects a where 1=0;
create table investor
as
select 1 inv, a.* from all_objects a where 1=0;
create index invdatew_idx on invdatew(inv);
exec gen_data( ‘INVDATEW’, 5000); — this generated random data. code at the end
exec gen_data( ‘INVESTOR’, 5000); — this generated random data. code at the end
select count(1) from invdatew — 5000 rows created
select count(1) from investor– 5000 rows created
analyze table invdatew compute statistics for
table for all indexes for all indexed columns;
analyze table investor compute statistics for
table for all indexes for all indexed columns;
alter session set sql_trace=true;

1. Test Case 1 (Normally developer writes like this)

insert into invdatew
select a.*, sysdate from investor a
where inv not in (select inv from invdatew);

call count cpu elapsed disk query current rows
——- —— ——– ———- ———- ———- ———- ———-
Parse 1 0.01 0.30 0 0 0 0
Execute 1 34.68 33.91 0 1324021 11546 5000
Fetch 0 0.00 0.00 0 0 0 0
——- —— ——– ———- ———- ———- ———- ———-
total 2 34.69 34.22 0 1324021 11546 5000

Rows Row Source Operation
——- —————————————————
5000 FILTER
5000 TABLE ACCESS FULL INVESTOR
0 TABLE ACCESS FULL INVDATEW
This query was processed like this:

for each row in investor
loop
FULL SCAN INVDATEW making sure that INV not in there
end loop

that is — 5000 full scans of INVDATEW (5000 = number of rows in investor!)

rollback

2. Test Case -2 (Use HASH_AJ to improve)

show parameter hash : Make sure hash_join_enabled = TRUE and hash_area_size is big enough(1048576 — this is for me better to be larger)
insert into invdatew
select a.*, sysdate from investor a
where inv is not null
and inv not in (select /*+ HASH_AJ */ inv from invdatew where inv is not null );

call count cpu elapsed disk query current rows
——- —— ——– ———- ———- ———- ———- ———-
Parse 1 0.01 0.02 0 0 0 0
Execute 1 0.96 1.05 140 566 10612 5000
Fetch 0 0.00 0.00 0 0 0 0
——- —— ——– ———- ———- ———- ———- ———-
total 2 0.97 1.07 140 566 10612 5000

Rows Row Source Operation
——- —————————————————
5000 HASH JOIN ANTI
5000 TABLE ACCESS FULL INVESTOR
5000 INDEX FAST FULL SCAN INVDATEW_IDX (object id 63822)

Wow — that is different — from over 34.69 cpu seconds down to .97.

This was processed sort of like this:

for every row in outer join investor to invdatew
if invdatew.inv is null then insert this record

Much more efficient…

rollback

3. Test Case -3 (but the o/p will be same only if not in column is also not null. Else see the thread in this folder for not in/exists comparision).

insert into invdatew
select a.*, sysdate from investor a
where not exists ( select *
from invdatew
where invdatew.inv = a.inv );

call count cpu elapsed disk query current rows
——- —— ——– ———- ———- ———- ———- ———-
Parse 1 0.00 0.02 0 0 0 0
Execute 1 1.08 1.05 0 10686 10600 5000
Fetch 0 0.00 0.00 0 0 0 0
——- —— ——– ———- ———- ———- ———- ———-
total 2 1.08 1.08 0 10686 10600 5000

Rows Row Source Operation
——- —————————————————
5000 FILTER
5000 TABLE ACCESS FULL INVESTOR
0 INDEX RANGE SCAN INVDATEW_IDX (object id 63822)

Now, not as good as the hash_aj but…. very close, very close. The logical IO’s would make me avoid this approach however, rather do 566 LIO’s then 10,686.
This was processed like this:

for every record in investor
run a subquery that uses an index to find a row in invdatew
if not found
then insert
end if;
end loop

rollback;
alter session set sql_trace=false
tkprof the output file to see above results in your environment.

Package gen_data

create or replace procedure gen_data( p_tname in varchar2, p_records in number )
authid current_user
as
l_insert long;
l_rows number default 0;
begin

dbms_application_info.set_client_info( ‘gen_data ‘ || p_tname );
l_insert := ‘insert /*+ append */ into ‘ || p_tname ||
‘ select ‘;

for x in ( select data_type, data_length,
nvl(rpad(
‘9’,data_precision,’9′)/power(10,data_scale),9999999999) maxval
from user_tab_columns
where table_name = upper(p_tname)
order by column_id )
loop
if ( x.data_type in (‘NUMBER’, ‘FLOAT’ ))
then
l_insert := l_insert || ‘dbms_random.value(1,’ || x.maxval || ‘),’;
elsif ( x.data_type = ‘DATE’ )
then
l_insert := l_insert ||
‘sysdate+dbms_random.value+dbms_random.value(1,1000),’;
else
l_insert := l_insert || ‘dbms_random.string(”A”,’ ||
x.data_length || ‘),’;
end if;
end loop;
l_insert := rtrim(l_insert,’,’) ||
‘ from all_objects where rownum <= :n’;

loop
execute immediate l_insert using p_records – l_rows;
l_rows := l_rows + sql%rowcount;
commit;
dbms_application_info.set_module( l_rows || ‘ rows of ‘ || p_records, ”
);
exit when ( l_rows >= p_records );
end loop;
end;
/

Backout Procedure

drop procedure gen_data;
drop table invdatew;
drop table investor;

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: