3
votes

in my github-flavoured markdown webkalk.md file I have a line:

<span custom-style="OS">something</span>

In reference.docx for pandoc I declared a style "OS".

When I generate my .docx with a command:

pandoc -s webkalk.md > webkalk.docx -f markdown -t docx --reference-doc="reference.docx"

the word something is styled in the way that I intended (style "OS"), but when I try command:

pandoc -s webkalk.md > webkalk.docx -f gfm -t docx --reference-doc="reference.docx"

it is styled just like the plain text.

Is it possible to use custom styles for docx in Github-Flavoured Markdown?

1

1 Answers

0
votes

gfm does not include support for the native_spans extension. Pandoc's default markdown includes support for most of the extensions Pandoc provides, including native_spans, by default.

However, as the documentation explains:

Note, however, that commonmark and gfm have limited support for extensions. Only those listed below (and smart, raw_tex, and hard_line_breaks) will work. The extensions can, however, all be individually disabled. Also, raw_tex only affects gfm output, not input.

gfm (GitHub-Flavored Markdown)

pipe_tables, raw_html, fenced_code_blocks, auto_identifiers, gfm_auto_identifiers, 
backtick_code_blocks, autolink_bare_uris, space_in_atx_header, 
intraword_underscores, strikeout, task_lists, emoji, shortcut_reference_links, 
angle_brackets_escapable, lists_without_preceding_blankline.

By way of explanation, the native_spans and native_divs extensions parse the raw HTML and convert it into Pandoc's native internal format. That allows the content and any associated attributes to be passed to the output format, if the output format includes support. However, without the extension, any output format which does not support HTML directly will only get the plain text content of the raw HTML, which is the behavior you are seeing.

commonmark and gfm each are defined with strict specifications, so it appears that Pandoc does not allow much divergence from those strict specs. Therefore, the native_spans and native_divs extensions are not supported when using the gfm format.

The documentation warns about this:

Because pandoc’s intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. ... While conversions from pandoc’s Markdown to all formats aspire to be perfect, conversions from formats more expressive than pandoc’s Markdown can be expected to be lossy.

The important thing to remember here is that "pandoc's Markdown" (the markdown format) is the only format which is guaranteed to not be "lossy." The gfm format is not "pandoc's Markdown" and therefore does not carry that guarantee.

That said, it might seem like the native_spans extension should be supported by gfm, even if it is not enabled by default. However, the Commonmark spec (which GFM extends), completely reworked how raw HTML is parsed. Presumably, Pandoc needed to redefine the methods which parse raw HTML for commonmark and gfm formats. Therefore, the extensions which work in raw HTML would not work with the alternate parser methods. In other words, any extensions which operate on raw HTML, including native_spans, would need to be rewritten to work with the commonmark and gfm formats. Until that happens, those extensions are not available when using those formats. Whether Pandoc plans to add support in the future or not is not information I am privy to and would be out of scope for this discussion.