0
votes

I work primarily with SAS and Oracle and am still new to DB2. Im faced with needing a hierarchical query to separate a clob into chunks that can be pulled into sas. SAS has a limit of 32K for character variables so I cant just pull the dataset in normally.

I found an old stackoverflow question about the best way to pull a clob into a sas data set but it is written in Oracle. Import blob through SAS from ORACLE DB

Since I am new to DB2 and the syntax for this type of join seems very different I was hoping to find someone that could help convert it and explain the syntax. I find the Oracle syntax to be much easier to understand. I'm not sure in DB2 if you would use a CTE recursion like this https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/apsg/src/tpc/db2z_xmprecursivecte.html or if you would use hierarchical queries like this https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_71/sqlp/rbafyrecursivequeries.htm

Here is the Oracle query.

SELECT    
      id  
    , level as chunk_id
    , regexp_substr(clob_value, '.{1,32767}', 1, level, 'n') as clob_chunk
FROM (
    SELECT id, clob_value
    FROM schema.table
    WHERE id = 1
)
CONNECT BY LEVEL <= regexp_count(clob_value, '.{1,32767}',1,'n')
order by id, chunk_id;

The table has two fields the id and the clob_value and would look like this.

ID    CLOB_VALUE
1     really large clob
2     medium clob
3     another large clob

The thought is I would want this result. I would only ever be doing this one row at a time where id= which ever row I am processing.

ID  CHUNK_ID   CLOB
1   1       clob_chunk1of3
1   2       clob_chunk2of3
1   3       clob_chunk3of3

Thanks for any time spent reading and helping.

2
A few questions... (showing my obvious ignorance of DB2). Is 32767 the max size of VARCHAR (what Oracle calls VARCHAR2) in DB2? Is there a distinction, like there is in Oracle, between character and byte length semantics? If so, is your character set single byte (very unlikely)? If not, keep in mind that regular expression quantifiers are always in characters, not bytes, so your code may not work (just as it may not work in Oracle, for the same reason). Or do you know for sure that all your CLOBs are 100% ASCII characters?mathguy
Other than that, the obvious way to translate to DB2 is to rewrite the query with a recursive WITH clause (which is SQL Standard and I assume is implemented in DB2).mathguy
No the 32767 is a limit placed on me by SAS. SAS as a programming language doesn't have a clob and the biggest character structure is a varchar with a max of 32k. So in DB2 the column would remain a clob but when I read it into SAS it will change to varchar.dalton_1000101
Hmm... I didn't read your post in detail till now; I just found that DB2 supports hierarchical (CONNECT BY) queries, which I thought were proprietary to Oracle. As you show in your post, they aren't - they exist in DB2 as well, and it seems they have the same syntax as in Oracle. So... do you need help fixing the query you posted? It will work as is if you add a where clause to single out just one id; to work on the entire table, you need to add some conditions to the connect by clause.mathguy

2 Answers

1
votes

Here is a solution that should work in DB2 with few changes (but please be advised that I don't know DB2 at all; I am just using Oracle features that are in the SQL Standard, so they should be implemented identically - or almost so - in DB2).

Below I create a table with your sample data; then I show how to chunk it into substrings of length at most 8 characters. Although the strings are short, I defined the column as CLOB and I am using CLOB tools; this should work on much larger CLOBs.

You can make both the chunk size and the id into bind parameters, if needed. In my demo below I hardcoded the chunk size and I show the result for all IDs in the table. In case the CLOB is NULL, I do return one chunk (which is NULL, of course).

Note that touching CLOBs in a query is very expensive; so most of the work is done without touching the CLOBs. I only work on them as little as possible.

PREP WORK

drop table tbl purge;    -- If needed

create table tbl (id number, clob_value clob);

insert into tbl (id, clob_value)
  select 1, 'really large clob'  from dual union all
  select 2, 'medium clob'        from dual union all
  select 3, 'another large clob' from dual union all
  select 4, null                 from dual             -- added to check handling
;

commit;

QUERY

with
  prep(id, len) as (
    select id, dbms_lob.getlength(clob_value)
    from   tbl
  )
,  rec(id, len, ord, pos) as (
    select  id, len, 1, 1
      from  prep
    union all
    select  id, len, ord + 1, pos + 8
      from  rec
      where len >= pos + 8
  )
select   id, ord, dbms_lob.substr(clob_value, 8, pos)
from     tbl inner join rec using (id)
order by id, ord
;

  ID  ORD CHUNK   
---- ---- --------
   1    1 really l
   1    2 arge clo
   1    3 b       
   2    1 medium c
   2    2 lob     
   3    1 another 
   3    2 large cl
   3    3 ob      
   4    1        
0
votes

Another option is to enable the Oracle compatibility in Db2 and just issue the hierarchical query.

This GitHub repository has background information on SQL recursion in DB2, including the Oracle-style syntax and a side by side example (both work against the Db2 sample database):

-- both queries are against the SAMPLE database
-- and should return the same result
SELECT LEVEL, CAST(SPACE((LEVEL - 1) * 4) || '/' || DEPTNAME
       AS VARCHAR(40)) AS DEPTNAME
FROM DEPARTMENT
     START WITH DEPTNO = 'A00'
     CONNECT BY NOCYCLE PRIOR DEPTNO = ADMRDEPT;


WITH tdep(level, deptname, deptno) as (
    SELECT 1, CAST( DEPTNAME AS VARCHAR(40)) AS DEPTNAME, deptno
    FROM department 
    WHERE DEPTNO = 'A00'
    UNION ALL
    SELECT t.LEVEL+1, CAST(SPACE(t.LEVEL  * 4) || '/' || d.DEPTNAME
       AS VARCHAR(40)) AS DEPTNAME, d.deptno
    FROM DEPARTMENT d, tdep t
    WHERE d.admrdept=t.deptno and d.deptno<>'A00')
SELECT level, deptname
FROM tdep;