EXISTS Conditions
An EXISTS condition tests for existence of rows in a subquery.
exists_condition::=
Text description of exists_condition
Table 5-10 shows the EXISTS condition.
Table 5-10 EXISTS Conditions
Condition Operation Example
EXISTS TRUE if a subquery returns at least one SELECT department_id
FROM departments d
row. WHERE EXISTS
(SELECT * FROM employees
e
WHERE d.department_id
= e.department_id);
Description: With reference to your tip for the week 08/04/2002 submitted by Madan Patil, this is another
way of deleting duplicate rows from a table. The difference being the time it takes to delete the duplicate
rows with this method is many times faster than the earlier method.
Let us take a table containing 3 columns, then we can use the following command to delete the duplicate
rows from the table.
delete from where rowid in (
SELECT rowid FROM
group by rowid,col1,col2,col3
minus
SELECT min(rowid) FROM
group by col1,col2,col3);
To show the difference let's consider a table:
Table: EMP
EMPNO NUMBER
ENAME VARCHAR2(20)
JOB VARCHAR2(20)
CREATE TABLE EMP (
EMPNO NUMBER,
ENAME VARCHAR2(20),
JOB VARCHAR2(20)
);
/
begin
for i in 1..20 loop
insert into emp values (1,'xx','clerk');
end loop;
commit;
end;
/
begin
for i in 1..20 loop
insert into emp values (2,'yy','accountant');
end loop;
commit;
end;
/
begin
for i in 1..20000 loop
insert into emp values (3,'zz','manager');
end loop;
commit;
end;
/
begin
for i in 1..10000 loop
insert into emp values (4,'ab','accountant');
end loop;
commit;
end;
Using the previous method as in your TIP for the Week 08/04/2002
------------------------------------------------------------------------
SQL> select count(*) from emp;
COUNT(*)
----------
30040
SQL> set timing on;
SQL> DELETE FROM EMP E
2 WHERE [Link] > ANY (SELECT ROWID
3 FROM EMP M
4 WHERE [Link] = [Link]
5 AND [Link] = [Link]
6 AND [Link] = [Link] );
30036 rows deleted.
Elapsed: [Link].48
SQL> select count(*) from emp;
COUNT(*)
----------
4
Elapsed: [Link].10
Using the NEW suggested method:
-------------------------------------------
SQL> select count(*) from emp;
COUNT(*)
----------
30040
SQL> delete from emp where rowid in (
2 SELECT rowid FROM emp
3 group by rowid,empno,ename,job
4 minus
5 SELECT min(rowid) FROM emp
6 group by empno,ename,job);
30036 rows deleted.
Elapsed: [Link].94
SQL> select count(*) from emp;
COUNT(*)
----------
4
Elapsed: [Link].10
--------------------------------------------------------------------
As we can see the difference is multifold to achieve the same result. This is because the new method uses
the set operator to compute the list of duplicate rows. The bigger the table the better you can appreciate the
difference.
>>> MIN() allows you to select one row per group—duplicates and non-duplicates—so that you
get a list of all the rows you want to keep:
SELECT MIN(ID) AS ID, LastName, FirstName
FROM Customers
GROUP BY LastName, FirstName;
Listing 5 shows the output of the above code.
Now you just need to delete rows that are not in this list, using the last query as a subquery inside
an antijoin (the NOT IN clause):
DELETE FROM Customers
WHERE ID NOT IN
(SELECT MIN(ID)
FROM Customers
GROUP BY LastName, FirstName);
However, an antijoin query with the NOT IN clause is inefficient to make this work. In our case
two (!) full table scans need to be performed to resolve this SQL statement. That leads to
substantial performance loss for big data sets. For performance testing I created the Customers
data set with 500,000 rows and 45,000 duplicates (9 percent of the total). The above command
ran for more than one hour with no results—except that it exhausted my patience—so I killed the
process.
Another disadvantage of this syntax is that you can't control which row per group of duplicates
you can keep in the database.
A PL/SQL Solution: Deleting Duplicate Data with a Stored Procedure
Let me give you an example of a PL/SQL stored procedure, called DeleteDuplicate (see Listing
6), that cleans up duplicates. The algorithm for this procedure is pretty straightforward:
1. It selects the duplicate data in the cursor, sorted by duplicate key (LastName, FirstName
in our case), as shown in Listing 4.
2. It opens the cursor and fetches each row, one by one, in a loop.
3. It compares the duplicate key value with the previously fetched one.
4. If this is a first fetch, or the value is different, then that's the first row in a new group so it
skips it and fetches the next row. Otherwise, it's a duplicate row within the same group,
so it deletes it.
Let's run the stored procedure and check it against the Customers data:
BEGIN
DeleteDuplicates;
END;
/
SELECT LastName, FirstName, COUNT(*)
FROM Customers
GROUP BY LastName, FirstName
HAVING COUNT(*) > 1;
The last SELECT statement returns no rows, so the duplicates are gone.
The main job of extracting duplicates in this procedure is done by a SQL statement, which is
defined in the csr_Duplicates cursor. The PL/SQL procedural code is used only to implement the
logic of deleting all rows in the group except the first one. Could it all be done by one SQL
statement?