0
votes

I am trying to convert my binary document(DOCX file) using xdmp:word-convert() function it is throwing me the following error.

The file you are trying to convert is not in the right format. DHF-INVFILE: xdmp:word-convert(fn:doc("/content/aplc/binary/13599668870066633077.docx"), "13599668870066633077.docx", <options xmlns:tidy="xdmp:tidy" xmlns="xdmp:word-convert"><tidy>true</tidy>...</options>) -- The file you are trying to convert is not in the right format. input=/var/opt/MarkLogic/Temp/0b71d7278e82c553/toconv.doc

My code is as follows

xdmp:word-convert(
     $xml-input,
     fn:concat(xdmp:hash64("Sample.docx"),".docx"),
     <options xmlns="xdmp:word-convert" xmlns:tidy="xdmp:tidy">
          <tidy>true</tidy>
          <tidy:clean>yes</tidy:clean>
          <tidy:drop-empty-paras>yes</tidy:drop-empty-paras>
          <tidy:drop-font-tags>yes</tidy:drop-font-tags>
          <tidy:hide-comments>yes</tidy:hide-comments>
          <tidy:output-html>no</tidy:output-html>
          <tidy:output-xhtml>no</tidy:output-xhtml>
          <tidy:output-xml>yes</tidy:output-xml>
          <compact>true</compact>
      </options>)

where the same code is working perfectly fine with .doc extensions

If xdmp:word-convert() will not work with DOCX file, what will be the other possible API functions which will do similar work other than xdmp:document-filter.

1

1 Answers

1
votes

Docs on xdmp:word-convert say:

Does not convert Microsoft Office 2007 and later documents.

For the more recent office docs you could look into using CPF with Office OpenXML Extract pipelines as also mentioned here: https://stackoverflow.com/a/11248525/918496

HTH!