25
votes

I am trying to get special characters (for foreign surnames) working in pandoc. I followed the instructions here and made sure all special characters are represented using UTF encoding (as per this page. I chose HTML Entity (decimal) option. The resulting files work well when converting to docx or pdf but not html. Is there an encoding that will work for all three output types, or do I need to include some other option?

Here is a line of markdown code for conversion using the special character encoding

some example text with special characters Å, ä, ö

which should print as

some example text with special characters Å, ä, ö

pandoc commands

pandoc example.md -o example.docx  # Works

pandoc example.md -o example.pdf   # Works

pandoc example.md -o example.html  # Doesn't work

running via inconv does not change output behaviour

iconv -t utf-8 example.md | pandoc -o example.html  # Doesn't work
4

4 Answers

41
votes

Try

pandoc example.md -s -o example.html

instead. The additional -s (for "stand-alone") makes pandoc insert the necessary metadata to create a full HTML file instead of just the HTML snippet that directly corresponds to the text in example.md. As part of the metadata, pandoc also generates the information that the file is UTF8 encoded. Your browser needs this piece of information to display the special characters correctly.

If you cannot use the -s flag for some reason, make sure to tell the browser about the UTF8 some other way.

1
votes

Add the following to _layouts/default.html in the tag when using the summary.md and you are not able to use the -s for standalone.

 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
1
votes

You could also use the option --ascii to produce pure-ascii-output with special charactes encoded as entities.

0
votes

In the index.html change data-charset="iso-8859-15" to data-charset="utf-8" example:

    <section
data-markdown="slides/demo.md"
          data-separator="\n---\n"
          data-separator-vertical="^\n\n"
          data-separator-notes="\n> >"
          data-charset="utf-8">
</section>