0
votes

I have the below XML structure as input

<dataexport>
<header>
     <companyinfo>
     <companyid> 100 </companyid>
     <companyname>ABC Corp</companyname>
     </companyinfo>
</header>
<deptinfo>
     <electrical>
          <response>
               <deptid> 1 </deptid>
               <totalemp> 200 </totalemp>
               <totalunits> 20 </totalunits>
          </response>
     </electrical>
     <mechanical>
          <response>
               <deptid> 2 </deptid>
               <totalemp> 150 </totalemp>
               <totalunits> 40 </totalunits>
          </response>
     </mechanical>
     <chemical>
          <response>
               <deptid> 3 </deptid>
               <totalemp> 100 </totalemp>
               <totalunits> 20 </totalunits>
          </response>
     </chemical>

The output that i am expecting is

Company ID Company name Dept Dept ID Total emp
100 ABC Corp Electrical 1 200
100 ABC Corp Mechanical 2 150
100 ABC Corp Chemical 3 100

The design approach I went with was as below -

enter image description here

tXMLMap component has been configured like below -

enter image description here

With this approach, I was able to get only one output and am unable to combine results of deptinfo_electrical, deptinfo_mechanical and deptinfo_chemical into one dataflow to load to a snowflake table.

Please advise what design approach can be used here.

1

1 Answers

0
votes

so using thia CTE with you data in it, in snowflake:

with data as (
    select parse_xml('<dataexport>
    <header>
         <companyinfo>
         <companyid> 100 </companyid>
         <companyname>ABC Corp</companyname>
         </companyinfo>
    </header>
    <deptinfo>
         <electrical>
              <response>
                   <deptid> 1 </deptid>
                   <totalemp> 200 </totalemp>
                   <totalunits> 20 </totalunits>
              </response>
         </electrical>
         <mechanical>
              <response>
                   <deptid> 2 </deptid>
                   <totalemp> 150 </totalemp>
                   <totalunits> 40 </totalunits>
              </response>
         </mechanical>
         <chemical>
              <response>
                   <deptid> 3 </deptid>
                   <totalemp> 100 </totalemp>
                   <totalunits> 20 </totalunits>
              </response>
         </chemical>
    </deptinfo>
    </dataexport>') as xml
)

this select gets you the values you want:

select xml
    ,xmlget(xmlget(xml,'header',0), 'companyinfo', 0) as comp
    ,get(xmlget(comp, 'companyid'),'$') as companyid
    ,get(xmlget(comp, 'companyname'), '$') as companyname
    ,get(d.value, '@') as depm
    ,xmlget(d.value, 'response') as r
    ,get(xmlget(r, 'deptid'),'$') as deptid
    ,get(xmlget(r, 'totalemp'), '$') as totalemp
    --,d.*
from data,
 LATERAL FLATTEN(GET(xmlget(xml,'deptinfo',0), '$')) d
         

so that is the form of the transformation you are wanting todo.

which can be restructured to:

select 
    get(xmlget(xmlget(xmlget(xml,'header',0), 'companyinfo', 0), 'companyid'),'$') as companyid
    ,get(xmlget(xmlget(xmlget(xml,'header',0), 'companyinfo', 0), 'companyname'), '$') as companyname
    ,get(d.value, '@') as depm
    ,get(xmlget(xmlget(d.value, 'response'), 'deptid'),'$') as deptid
    ,get(xmlget(xmlget(d.value, 'response'), 'totalemp'), '$') as totalemp
from data,
 LATERAL FLATTEN(GET(xmlget(xml,'deptinfo',0), '$')) d
 

to give the results:

COMPANYID   COMPANYNAME DEPM            DEPTID  TOTALEMP
100         "ABC Corp"  "electrical"    1       200
100         "ABC Corp"  "mechanical"    2       150
100         "ABC Corp"  "chemical"      3       100

how to actually do that in Talend, I have no answer, but I would assume you can look to see how your current pattern is run on Snowflake by looking at the query history, to solve how the parts map from Talend logic to snowflake SQL.