Convert HTML and inline Mathjax math to LaTeX with pandoc ruby

Question

I'm building a Rails app and I'm looking for a way to convert database entries with html and inline MathJax math (TeX) to LaTeX for pdf creation.

I found similar questions like mine:

and I see two options here:

Create a Haskell executable which leaves stuff like \(y=f(x)\) alone when converting html to LaTeX
Write a ruby method which does the following things:
- Take the string and split it into an array with a regex (string.split(regex))
- loop through the created array and if content matches regex convert the parts to LaTeX which do not include inline math with PandocRuby.html(string).to_latex
- concatenate everything back together (array.join)

I would prefer the ruby method solution because I'm hosting my application on Heroku and I don't like to checkin binaries into git.
Note: the pandoc binary is implemented this way http://www.petekeen.net/introduction-to-heroku-buildpacks)

So my question is: what should the regex look like to split the string by \(math\).

E.g. string can look like this: text \(y=f(x) \iff \log_{10}(b)\) and \(a+b=c\) text

And for the sake of completeness, how should the Haskell script be written to leave \(math\) alone when converting to LaTeX and the ruby method is not a possible solution?

I'm not sure but don't think that the standard Ruby regex engine has any recursion support. In which case matching arbitrary balanced parenthesis becomes a lot more tricky. — Qtax
@Qtax something like string.split(/(\\\(.*?\\\))/).each_slice(2).map { |a| [PandocRuby.html(a[0]).to_latex, PandocRuby.convert(a[1].to_s, {f: "html+tex_math_single_backslash", to: :latex})] }.join works. — Daniel

John MacFarlane John MacFarlane · Accepted Answer · 2013-12-10T18:00:19

Get the very latest version of pandoc (1.12.2). Then you can do

pandoc -f html+tex_math_dollars+tex_math_single_backslash -t latex

Convert HTML and inline Mathjax math to LaTeX with pandoc ruby

1 Answers