0
votes

Looking at the raw XML from a .fods file:

  <table:table-column table:style-name="co1" table:default-cell-style-name="ce17"/>
  <table:table-row table:style-name="ro1">
    <table:table-cell table:style-name="ce15" office:value-type="string" calcext:value-type="string">
      <text:p>John Smith</text:p>
    </table:table-cell>
  </table:table-row>
  <table:table-row table:style-name="ro2">
    <table:table-cell table:style-name="ce16" office:value-type="string" calcext:value-type="string">
      <text:p>(123) 456-7890</text:p>
    </table:table-cell>
  </table:table-row>
  <table:table-row table:style-name="ro2">
    <table:table-cell office:value-type="string" calcext:value-type="string">
      <text:p>123 Main Street</text:p>
    </table:table-cell>
  </table:table-row>
  <table:table-row table:style-name="ro2">
    <table:table-cell office:value-type="string" calcext:value-type="string">
      <text:p>Anywhere, ZZ 12345-6789</text:p>
    </table:table-cell>
  </table:table-row>
  <table:table-row table:style-name="ro1">
    <table:table-cell table:style-name="ce15" office:value-type="string" calcext:value-type="string">
      <text:p>Jane Doe</text:p>
    </table:table-cell>
  </table:table-row>
  <table:table-row table:style-name="ro2">
    <table:table-cell table:style-name="ce16" office:value-type="string" calcext:value-type="string">
      <text:p>(234) 567-8901</text:p>

When opened in Libre Office the names are in bold. Where would that be reflected in the above XML? I'm only seeing a value-type="string" with no markup for bold, underline, etc.

Everything is in a single column, so not quite sure what the default-cell-style-name="ce17" attribute indicates.

While the data originated as a .doc file, I'm using Libre Office on the file.

I'm looking to extract the names from the XML, which are only, really, distinguished from phone or address in that they're in bold. I suppose there's no numeric numbers, either, but I'd like to select the bold data from the spreadsheet.

The formatting information seems somewhat vague:

Formatting

The style and formatting controls are numerous, providing a number of controls over the display of information.

Page layout is controlled by a variety of attributes. These include page size, number format, paper tray, print orientation, margins, border (and its line width), padding, shadow, background, columns, print page order, first page number, scale, table centering, maximum footnote height and separator, and many layout grid properties.

Headers and footer can have defined fixed and minimum heights, margins, border line width, padding, background, shadow, and dynamic spacing.

There are many attributes for specific text, paragraphs, ruby text, sections, tables, columns, lists, and fills. Specific characters can have their fonts, sizes, generic font family names (roman – serif, swiss – sans-serif, modern – monospace, decorative, script or system), and other properties set. Paragraphs can have their vertical space controlled through attributes on keep together, widow, and orphan, and have other attributes such as "drop caps" to provide special formatting. The list is extremely extensive; see the references (in particular the actual standard) for details.

1
My conclusion would be that table:style-name="ce15" refers to a style defined elsewhere that causes this table cell to be rendered as bold.Tomalak
that's quite useful, @Tomalak and is the inference I drew as well. I'm looking at a few other things also, but haven't yet located documentation on that, however.Thufir
I'd do a more hands-on approach and search the files for the string "ce15" and see what comes up.Tomalak

1 Answers

0
votes

Values and formats are placed in different sections of the XML file.

So usually, you have a 'style' section where all the formats are defined with a name (style:name).

In the table section, you have the table defined, the values placed in it and which style has (identified by his 'table:style-name'). You can define a style for each cell, for an entire row, entire column or even the entire table.

So in your case, you can identify the bold text looking to the style name is using. That's not always easy, because you can specify a default style for an entire column/row (default-cell-style-name="ce17") which it would takes place in case the style is not defined.

I developed a library for parse ODS Files in Java, so in case you need inspiration you can check it out in Github: https://github.com/miachm/SODS