2
votes

After a few hours research into XSLT, I am admitting defeat! I need to fix a large number of .xlf XLIFF translation files that have returned to us mangled from an unnamed translation tool. Ideally I would apply the XSL transform to them using a batch tool.

Below is a snippet of one of the XLIFF files:

<body>
    <trans-unit id="1" phase-name="pretrans" restype="x-h3">
        <source>Adding, Deleting or Modifying Notes in the Call Description</source>
        <seg-source>Adding, Deleting or Modifying Notes in the Call Description</seg-source>
        <target state="final">Добавление, удаление и изменение примечаний в описании звонка</target>
    </trans-unit>
    <trans-unit id="2" phase-name="pretrans" restype="x-p">
        <source>Description of Fields on RHS</source>
        <seg-source>Description of Fields on RHS</seg-source>
        <target state="final">Поле описания в правой части</target>
    </trans-unit>
    <trans-unit id="3" phase-name="pretrans" restype="x-p">
        <source>You can add descriptive text notes to a call recording, if you have the appropriate privileges to do so. These notes are visible to all users who have access to the call recording. It is recommended that each user add their initials to the notes to avoid potential confusion.</source>
        <seg-source>
            <mrk mtype="seg" mid="1">You can add descriptive text notes to a call recording, if you have the appropriate privileges to do so.</mrk>
            <mrk mtype="seg" mid="2">These notes are visible to all users who have access to the call recording.</mrk>
            <mrk mtype="seg" mid="3">It is recommended that each user add their initials to the notes to avoid potential confusion.</mrk>
        </seg-source>
        <target state="final">
          <mrk mtype="seg" mid="1" /><ph ctype="" id="1">&lt;MadCap:variable name="zoom_userdocs_variables.var_product_name" xmlns:MadCap="http://www.madcapsoftware.com/Schemas/MadCap.xsd" /&gt;</ph> позволяет находить телефонные взаимодействия, содержащие или не содержащие определенные фразы.
          <mrk mtype="seg" mid="2" />Каждая речевая метка содержит одну или несколько таких фраз.
          <mrk mtype="seg" mid="3" />Ядро <ph ctype="" id="3">&lt;MadCap:variable name="zoom_userdocs_variables.var_product_name" xmlns:MadCap="http://www.madcapsoftware.com/Schemas/MadCap.xsd" /&gt;</ph> индексирует медиафайлы и помечает места вхождения фразы (добавляет к ним метки).
          <mrk mtype="seg" mid="4" />Затем нужные медиафайлы можно искать по связанным с ними меткам.
        </target>
    </trans-unit>
    <trans-unit id="4" phase-name="pretrans" restype="x-p">
        <source>To add, delete, or modify text in the description field, click inside the description field.</source>
        <seg-source>To add, delete, or modify text in the description field, click inside the description field.</seg-source>
        <target state="final">Чтобы добавить, удалить или изменить текст в поле описания, щелкните это поле.</target>
    </trans-unit>
</body>

Notice the target tag in the third trans-unit node. The mrk tags should contain the text nodes that have now become siblings (compared to the earlier seg-source tag, which is still correct), messing up the structure.

Therefore I am trying to identify any mrk tags that do not contain text nodes, and move the following text node back into them.

Here is the desired result:

<body>
    <trans-unit id="1" phase-name="pretrans" restype="x-h3">
        <source>Adding, Deleting or Modifying Notes in the Call Description</source>
        <seg-source>Adding, Deleting or Modifying Notes in the Call Description</seg-source>
        <target state="final">Добавление, удаление и изменение примечаний в описании звонка</target>
    </trans-unit>
    <trans-unit id="2" phase-name="pretrans" restype="x-p">
        <source>Description of Fields on RHS</source>
        <seg-source>Description of Fields on RHS</seg-source>
        <target state="final">Поле описания в правой части</target>
    </trans-unit>
    <trans-unit id="3" phase-name="pretrans" restype="x-p">
        <source>You can add descriptive text notes to a call recording, if you have the appropriate privileges to do so. These notes are visible to all users who have access to the call recording. It is recommended that each user add their initials to the notes to avoid potential confusion.</source>
        <seg-source>
            <mrk mtype="seg" mid="1">You can add descriptive text notes to a call recording, if you have the appropriate privileges to do so.</mrk>
            <mrk mtype="seg" mid="2">These notes are visible to all users who have access to the call recording.</mrk>
            <mrk mtype="seg" mid="3">It is recommended that each user add their initials to the notes to avoid potential confusion.</mrk>
        </seg-source>
        <target state="final">
            <mrk mtype="seg" mid="1"><ph ctype="" id="1">&lt;MadCap:variable name="zoom_userdocs_variables.var_product_name" xmlns:MadCap="http://www.madcapsoftware.com/Schemas/MadCap.xsd" /&gt;</ph> позволяет находить телефонные взаимодействия, содержащие или не содержащие определенные фразы.</mrk>
            <mrk mtype="seg" mid="2">Каждая речевая метка содержит одну или несколько таких фраз.</mrk>
            <mrk mtype="seg" mid="3">Ядро <ph ctype="" id="3">&lt;MadCap:variable name="zoom_userdocs_variables.var_product_name" xmlns:MadCap="http://www.madcapsoftware.com/Schemas/MadCap.xsd" /&gt;</ph> индексирует медиафайлы и помечает места вхождения фразы (добавляет к ним метки).</mrk>
            <mrk mtype="seg" mid="4">Затем нужные медиафайлы можно искать по связанным с ними меткам.</mrk>
        </target>
    </trans-unit>
    <trans-unit id="4" phase-name="pretrans" restype="x-p">
        <source>To add, delete, or modify text in the description field, click inside the description field.</source>
        <seg-source>To add, delete, or modify text in the description field, click inside the description field.</seg-source>
        <target state="final">Чтобы добавить, удалить или изменить текст в поле описания, щелкните это поле.</target>
    </trans-unit>
</body>

I would normally do this in Perl with LibXML or similar, but I'm sure that this is a simple task for XSLT. I've searched for a similar solution, but couldn't find anything that I could make work.

One other point to note - although 'pretty-printed' here, the final body node definition is all on one line.

Thank you! I look forward to learning something new!

EDIT: Updated source above to show further child tags within <target> elements, which must be retained. EDIT 2: Added desired result.

1

1 Answers

2
votes

Try this XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="trans-unit/target/mrk[following-sibling::text()]">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
      <xsl:value-of select="following-sibling::text()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="trans-unit/target/text()"/>

</xsl:stylesheet>

Probably it produces desired result:

<body>
    <trans-unit id="1" phase-name="pretrans" restype="x-h3">
        <source>Adding, Deleting or Modifying Notes in the Call Description</source>
        <seg-source>Adding, Deleting or Modifying Notes in the Call Description</seg-source>
        <target state="final" />
    </trans-unit>
    <trans-unit id="2" phase-name="pretrans" restype="x-p">
        <source>Description of Fields on RHS</source>
        <seg-source>Description of Fields on RHS</seg-source>
        <target state="final" />
    </trans-unit>
    <trans-unit id="3" phase-name="pretrans" restype="x-p">
        <source>You can add descriptive text notes to a call recording, if you have the appropriate privileges to do so. These notes are visible to all users who have access to the call recording. It is recommended that each user add their initials to the notes to avoid potential confusion.</source>
        <seg-source>
            <mrk mtype="seg" mid="1">You can add descriptive text notes to a call recording, if you have the appropriate privileges to do so.</mrk>
            <mrk mtype="seg" mid="2">These notes are visible to all users who have access to the call recording.</mrk>
            <mrk mtype="seg" mid="3">It is recommended that each user add their initials to the notes to avoid potential confusion.</mrk>
        </seg-source>
        <target state="final"><mrk mtype="seg" mid="1">При наличии соответствующих прав можно добавить описательные текстовые примечания к записи звонка.
            </mrk><mrk mtype="seg" mid="2">Эти примечания видны для всех пользователей, которые имеют доступ к записи звонка.
            </mrk><mrk mtype="seg" mid="3">Во избежание возможной путаницы каждому пользователю рекомендуется к примечаниям добавлять свои инициалы.
        </mrk></target>
    </trans-unit>
    <trans-unit id="4" phase-name="pretrans" restype="x-p">
        <source>To add, delete, or modify text in the description field, click inside the description field.</source>
        <seg-source>To add, delete, or modify text in the description field, click inside the description field.</seg-source>
        <target state="final" />
    </trans-unit>
</body>