0
votes

I have a set of XQuery functions that represent various operations that can be executed to transform a data value. Each function will take a value(s), plus some parameters needed for the transformation. And the plan is to execute a series of nested function calls to compute the final value. The idea is these pipelines will be configured and then persisted prior to execution, since the same pipeline of functions will be called repeatedly with different starting values. So, the thought was to represent the call stack as a series of nested XML elements, i.e.

<mylib:escape>
  <value>
    <mylib:select>
      <config>
        <index>2</index>
      </config>
      <value>
        <mylib:tokenize>
          <config>
            <delimiter>,</delimiter>
          </config>
          <value>
            $starting-value
          </value>
        </mylib:tokenize>
      </value>
    </mylib:select>
  </value>
</mylib:escape>

And in the mylib module namespace, I would have functions:

declare function mylib:tokenize($value as xs:string, $delimiter as xs:string) as xs:string
{ ... }

declare function mylib:select($value as xs:string, $index as xs:int) as xs:string
{ ... }

declare function mylib:escape($value as xs:string) as xs:string
{ ... }
  1. is this a bad idea, and should I take a different approach
  2. Is there an existing library that might already provide this functionality?

This post is tagged with MarkLogic because I am going to be executing this from MarkLogic.

Thanks.

1
I'm wondering why you limit yourself to mylib functions. tokenize, select, and escape should be covered sufficiently with existing functions. You could then rely on docs for those if end users need guidance.. - grtjn
Have you considered CPF? - Dave Cassel
@grtjn these were only examples. There would be a lot more functions,but the point was to have user configurable processing pipelines via some kind of graphical interface. - TJ Tang
@DaveCassel CPF will be part of the processing flow, but I am looking at what I can do to configure and serialize the pipeline, such that they can then be executed on new content coming in, via CPF, or data hub flows. - TJ Tang

1 Answers

0
votes

This is primarily opinion-based (so don't be surprised if mods close your question), but it sounds like you have a set of transformation components and a set of documents describing particular transformation pipeline configurations. To me, this seems like a reasonable separation of concerns. I am not aware of any existing library that duplicates this exactly, but it does resemble XProc.

The only note I have is that unless you have a specific need to store the pipelines as documents, you could simply write XQuery functions to represent the pipelines instead and avoid the overhead of building a component to translate XML into XQuery function calls. If you need the functions to be more composable, take a look at higher-order (i.e.: first-class) functions.