2
votes

I am generating Word docs from html. Basically, I build a file with html and save it as a .doc. Then I open it in Word and apply a template. All good so far.

I would like to automatically generate a custom TOC via the HTML ie when I am building the document. I need to insert a field code to do that, in the same way I do to add page numbering via the HML. eg:

 <span style="mso-field-code: PAGE " class="page-field"></span>

If I save my html doc as docx and apply a template, I can make a TOC based in the styles in the way one would normally create a TOC in Word. I customised the TOC so the Title style is the top level followed by H1, H2 then H3. If I then toggle the field code on the TOC, the field code looks like this:

{ TOC \t "Heading 1,2,Heading 2,3,Heading 3,4,Title,1" }

Now, I can add HTML like this to insert the TOC:

<div style="mso-field-code: TOC " class="toc-field">TOC goes HERE</div>

When I do that, if I right click the text "TOC goes HERE" I get the option to "Update field" and if I do that a TOC is generated using the default H1,H2,H3 tags.

But, what I can't work out is how to include the

\t "Heading 1,2,Heading 2,3,Heading 3,4,Title,1"

part so my custom style sequence is applied. I have tried all sorts of combinations and it seems that adding anything after TOC causes Word to not make a field code.

Does anyone have any suggestions?


Update: Based on the essential help from @slightlysnarky below, I thought I would summarise the outcome here because the information I needed was in a Microsoft chm file that was taken down many years ago. If you read the following extract from that help manual and compare it to the solution below you will see how this all works.

Word marks and stores information for simple fields by means of the Span element with the mso-field-code style. The mso-field-code value represents the string value of the field code. Formatting in the original field code might be lost when saving as HTML if only the string value of the code is necessary for its calculation.

Word has a different way of storing field information to HTML for more complex fields, such as ones that have formatted text or long values. Word marks these fields with so the data is not displayed in the browser. Word uses the Span element with the mso-element: field-begin, mso-element: field-separator, and mso-element: field-end attributes to contain the three respective parts of the field code: the field start, the separator between field code and field results, and the field end. Whenever possible, Word will save the field to HTML in the method that uses the least file space.

So, basically, add tags as shown below to your HTML at the point you wish the TOC to appear.

:-)

1
So why not simply add a TOC field after you open the document in Word? You can specify the reference Styles at that time. Saving such a document in HTML format should give you the correct HTML code, too, for generating future documents with the required HTML code.macropod
Thanks. Yes, of course users can add their own TOC manually. Most would not have a clue how to generate the custom version where the Title tags are selected first, etc. So I am attempting to automate that for those people via the field code. And, yes, I thought I could get the correct HTML format for the field code by saving as HTML but alas, all I get is the actual TOC content. I could not get word to give me the TOC field code in HTML. Ta.Murrah
What kind of settings did you use when saving as HTML. Word can save as "round-trip" HTML which is a lot more verbose than "standard" HTML... Beyond that: Is there some reason macro code can't be added to the template to achieve functionality that the html file format doesn't support?Cindy Meister
@CindyMeister: The use-case is a database full of document fragments in HTML format. Users can pick and choose fragments which will be exported as one or more "Word" docs in HTML format. The fragment titles get MsoTitle style and we want them to be the first level in the TOC when the HTML Word gets opened and saved as a DOCX. Is it possible to add macro code via HTML? And, I will check out the Word HTML save options - I did not realise there were any. Thanks.Murrah
The simplest way to achieve what you describe in your last comment is to format the top level as "Heading 1" - that's automatic. If your requirements are something the (intentionally undocumented) HTML file format conversion doesn't support, I recommend looking at Word Open XML - more complicated for the developer (maybe) but much more versatile - and documented.Cindy Meister

1 Answers

2
votes

Word recognises a "complex field format" in HTML, along the same lines as it does in the Office Open XML format. So you can use

<span style='mso-element:field-begin'></span>TOC \t "Heading 1,2,Heading 2,3,Heading 3,4,Title,1" 
<span style='mso-element:field-separator'></span>This text will show but the user will need to update the field 
<span style='mso-element:field-end'></span>

This construct is outlined in a Microsoft document called "Microsoft Office HTML and XML Reference". It's a Windows .exe that unpacks to a .chm Help file. You can get it here

The info. on encoding fields is in Getting Started with Microsoft Office 2000 HTML and XML->Microsoft Word->Fields

There may be a later version but that's the only one I could find.