Query fails to execute after converting a column from Varchar2 to CLOB

Question

I have a oracle query

select id from (
    select ID, ROW_NUMBER() over (partition by LATEST_RECEIPT order by ID) rownumber
    from Table
    where LATEST_RECEIPT in 
    (
        select LATEST_RECEIPT from Table 
        group by LATEST_RECEIPT
        having COUNT(1) > 1
    )
) t
where rownumber <> 1;

The data type of LATEST_RECEIPT was earlier varchar2(4000) and this query worked fine. Since the length of the column needs to be extended i modified it to CLOB, after which this fails. Could anyone help me fix this issue or provide a work around?

That's one of the drawbacks on using CLOB/BLOB, they can't be used fo indexes and then also not for GROUP BYand other agregate functions. I think you will have to convert it back into a VARCHAR(much-longer)data type — André Schild
I have data which has length > 4000. But max size of varchar2 in oracle is 4000. So, we switched to CLOB — Vasanthan
Then this is a real problem of your DBMS and the use case you have. Do you realy need to group by the whole LAST_RECEIPT text? (If yes, then you probably would need to add another column with a hash key of the text and then use this for the group by condition) — André Schild

Alex Poole Alex Poole · Accepted Answer · 2016-05-20T10:11:12

You can change your inner query to look for other rows with the same last_receipt value but a different ID (assuming ID is unique); if another row exists then that is equivalent to your count returning greater than one. But you can't simply test two CLOB values for equality, you need to use dbms_lob.compare:

select ID
from your_table t1
where exists (
    select null from your_table t2
    where dbms_lob.compare(t2.LATEST_RECEIPT, t1.LATEST_RECEIPT) = 0
    and t2.ID != t1.ID
    -- or if ID isn't unique: and t2.ROWID != t1.ROWID
);

Applying the row number filter is tricker, as you also can't use a CLOB in the analytic partition by clause. As André Schild suggested, you can use a hash; here passing the integer value 3, which is the equivalent of dbms_crypto.hash_sh1 (though in theory that could change in a future release!):

select id from (
    select ID, ROW_NUMBER() over (partition by dbms_crypto.hash(LATEST_RECEIPT, 3)
        order by ID) rownumber
    from your_table t1
    where exists (
        select null from your_table t2
        where dbms_lob.compare(t2.LATEST_RECEIPT, t1.LATEST_RECEIPT) = 0
        and t2.ID != t1.ID
        -- or if ID isn't unique: and t2.ROWID != t1.ROWID
    )
)
where rownumber > 1;

It is of course possible to get a hash collision, and if that happened - you had two latest_receipt values which both appeared more than once and both hashed to the same value - then you could get too many rows back. That seems pretty unlikely, but it's something to consider.

So rather than ordering you can only look for rows which have the same lastest_receipt and a lower ID:

select ID
from your_table t1
where exists (
    select null from your_table t2
    where dbms_lob.compare(t2.LATEST_RECEIPT, t1.LATEST_RECEIPT) = 0
    and t2.ID < t1.ID
);

Again that assumes ID is unique. If it isn't then you could still use rowid instead, but you would have less control over which rows were found - the lowest rowid isn't necessarily the lowest ID. Presumably you're using this to dine rows to delete. If you actually don't mind which row you keep and which you delete then you could still do:

and t2.ROWID < t1.ROWID

But since you are currently ordering that probably isn't acceptable, and hashing might be preferable, despite the small risk.

Query fails to execute after converting a column from Varchar2 to CLOB

1 Answers