1
votes

I have the following segment of Markdown with embedded LaTeX equations:

# Fisher's linear discriminant

\newcommand{\cov}{\mathrm{cov}}
\newcommand{\A}{\mathrm{A}}
\renewcommand{\B}{\mathrm{B}}
\renewcommand{\T}{^\top}

The first method to find an optimal linear discriminant was proposed by Fisher
(1936), using the ratio of the between-class variance to the within-class variance
of the projected data, $d(\vec x)$, as a criterion. Expressed in terms of the
sample properties, the $p$-dimensional centroids $\bar {\vec x}_\A$ and
$\bar {\vec x}_\B$ and the $p \times p$ covariance matrices
$S_A = \cov_i ( \vec x_{\A i} )$ and $S_B = \cov_i ( \vec x_{\B i} )$, the
optimal direction is given by 
$$
\vec w = \left ( \frac{ S_A + S_B }{2} \right ) ^{-1}
~ ( \bar {\vec x}_\B - \bar {\vec x}_\A ).
$$

When I convert it with pandoc to LaTeX and compile it with xelatex, I get the expected text with nicely rendered math. When I convert it with pandoc to MS Word using

pandoc test.text -o test.docx

and open it in MS Office Word 2007, I get the following:

word screenshot

Only those parts of the equations that are symbols or upright text get rendered correctly, while variable names in italics are replaced by a question mark in a box.

How can I make this work?

3
Your input works for me with pandoc 1.12.2 on Mac OS X. Can you post a link to the word file you get? Here's mine: fileswap.com/dl/wajeArZq4cmb21
@mb21 Thanks for replying! Your docx looks identical to mine if I open it in Word. So maybe its a problem with my copy/installation of Word, and not with the file. Btw. I found a workaround: I can switch equation display in Word to "linear" and then back to "professional", and all the symbols appear. – Here's mine: dl.dropboxusercontent.com/u/14431931/test.docxA. Donda
Oh well, that's what your doc looks like on my copy of Word on Mac: share.pho.to/4J6al I guess it might help using the newest version of pandoc...mb21
@mb21 Ah, no that's just having made a mistake just yet; I omitted the last "$$". I've updated the file, please try again.A. Donda
Ah, looks just as mine now. Those question marks usually appear when the chosen font doesn't have that character. Do you have the font Cambria Math installed?mb21

3 Answers

1
votes

In Word 2007, I see a result similar to yours, except that here, I don't see the "question marks in boxes" characters, just space.

If I then take one of the expressions, and use your trick of going to linear display and back, the characters reappear for that expression.

If I save and re-open, the other expressions still do not display correctly, but if I save and look at the XML, I notice that

  1. the Math font has been changed to Cambria Math
  2. additional run parameter (w:rPr) XML specifying the Cambria Math font has been inserted in many of the runs (w:r) inside the oMath elements, even in the oMath expressions that do not display correctly. However, in the oMath expression that now displays correctly, this extra XML has been applied to every run. In the others, it has only been applied to some runs (I think I can see the pattern but I'm running out of time here right now...)
  3. If I manually add the XML to the other runs and re-open the document, the expressions appear correctly. Or at least, they do in the one case I have tried.

Since Word 2010 displays the resuls correctly, I can only assume that it does not rely on these explicit font settings, whereas Word 2007 does. This doesn't really help you yet, because altering all those w:r elements would be even harder than what you are already doing. But it is possible that a default style/font needs to be set, either somewhere higher in the XML hierarchy, or perhaps elsewhere in the .zip (perhaps in fontTable.xml or styles.xml). I'm not familiar enough with Word's XML structures to guess what, if anything might be missing, but may be able to have a look tomorrow.

I suppose another possibility is that you just have to have all these extra rPr elements for this to work in Word 2007, which would suggest that pandoc may have been written for Word 2010, not 2007. (I don't know anything about the tool).

As an example, where you have

<m:r>
  <m:t>(</m:t>
</m:r>

what you need is

<m:r>
  <w:rPr>
    <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math" />
  </w:rPr>
  <m:t>(</m:t>
</m:r>
1
votes

I did the following to get rid of the font issue:

  1. Create a new empty word document.
  2. Copy all content to the new document.
  3. Choose Match Source Format.
0
votes

As discussed above, Windows doesn't have the font Lucida Grande, so substituting the Math Font with Cambria Math should work.

  1. Rename the test.docx to test.zip
  2. vim test.zip and select test/word/settings.xml
  3. find and change Lucida Grande to Cambria Math
  4. save and rename zip to docx. This results in something like this docx.

You can then also supply that file as a sort of docx template to pandoc with the --reference-docx option.