Looking at the raw XML
from a .fods
file:
<table:table-column table:style-name="co1" table:default-cell-style-name="ce17"/>
<table:table-row table:style-name="ro1">
<table:table-cell table:style-name="ce15" office:value-type="string" calcext:value-type="string">
<text:p>John Smith</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name="ro2">
<table:table-cell table:style-name="ce16" office:value-type="string" calcext:value-type="string">
<text:p>(123) 456-7890</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name="ro2">
<table:table-cell office:value-type="string" calcext:value-type="string">
<text:p>123 Main Street</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name="ro2">
<table:table-cell office:value-type="string" calcext:value-type="string">
<text:p>Anywhere, ZZ 12345-6789</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name="ro1">
<table:table-cell table:style-name="ce15" office:value-type="string" calcext:value-type="string">
<text:p>Jane Doe</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name="ro2">
<table:table-cell table:style-name="ce16" office:value-type="string" calcext:value-type="string">
<text:p>(234) 567-8901</text:p>
When opened in Libre Office
the names are in bold. Where would that be reflected in the above XML
? I'm only seeing a value-type="string"
with no markup for bold, underline, etc.
Everything is in a single column, so not quite sure what the default-cell-style-name="ce17"
attribute indicates.
While the data originated as a .doc
file, I'm using Libre Office
on the file.
I'm looking to extract the names from the XML
, which are only, really, distinguished from phone or address in that they're in bold. I suppose there's no numeric numbers, either, but I'd like to select the bold data from the spreadsheet.
The formatting information seems somewhat vague:
Formatting
The style and formatting controls are numerous, providing a number of controls over the display of information.
Page layout is controlled by a variety of attributes. These include page size, number format, paper tray, print orientation, margins, border (and its line width), padding, shadow, background, columns, print page order, first page number, scale, table centering, maximum footnote height and separator, and many layout grid properties.
Headers and footer can have defined fixed and minimum heights, margins, border line width, padding, background, shadow, and dynamic spacing.
There are many attributes for specific text, paragraphs, ruby text, sections, tables, columns, lists, and fills. Specific characters can have their fonts, sizes, generic font family names (roman – serif, swiss – sans-serif, modern – monospace, decorative, script or system), and other properties set. Paragraphs can have their vertical space controlled through attributes on keep together, widow, and orphan, and have other attributes such as "drop caps" to provide special formatting. The list is extremely extensive; see the references (in particular the actual standard) for details.
table:style-name="ce15"
refers to a style defined elsewhere that causes this table cell to be rendered as bold. – Tomalak