4
votes

When exporting to docx, pandoc sets the image label as its caption. Is there a way to set an alt-text different from the caption?

Context

We generate Markdown documents that we want to convert to PDF using Pandoc.

I installed BasicTeX and made some tests with the following markdown:

# This is the title

This is a paragraph.

![This is an image](image.jpg)

This is another paragraph.

You can download the file here.

Converting this code using Pandoc results in this PDF file.

As I'm working for an accessibility consultancy, I first checked the PDF file using PDF Accessibility Checker 2, and the results are devastating:

PAC2 results

More results

So I tried a workaround by exporting to Microsoft Docx (which is highly accessible when the correct document styles are used).

And from there I exported to PDF using AccessPDF, and here's the resulting PDF file

When checked using PAC2, this is the result:

Results

Much better, but there's a missing alternative text for an image! So it seems that the alternative text is used as caption of the figure (the same way it's done in HTML), and while in the HTML export the alt-Tag is set properly, in Docx the alternative text is left away.

How can we fix this? In fact, when there's a caption, the image itself should not have the same text as alt-text again, which means: the way it's exported to HTML is not perfect. So: how much control do we have in markdown to specify specifically the caption and alt contents?

1

1 Answers

3
votes

From the pandoc readme:

An image occurring by itself in a paragraph will be rendered as a figure with a caption. (In LaTeX, a figure environment will be used; in HTML, the image will be placed in a div with class figure, together with a caption in a p with class caption.) The image’s alt text will be used as the caption.

![This is the caption](/url/of/image.png)

Update: I see you've created an issue about this. Indeed, for some reason, the docx writer currently uses the title text instead of the alt text. Not sure whether this is intended behaviour or not. Meanwhile, you can use the title text, as in:

![alt text](foo.jpg "title text")

Concerning accessibility: pandoc per default uses LaTeX (you can choose between a few engines) to generate PDFs, but indeed unfortunately LaTeX isn't known to generate very accessible PDFs. Maybe ConTeXt is better?

$ pandoc -t context -o mydoc.tex input.md
$ context mydoc.tex