6
votes

I have a latex-generated .toc file with the table of contents of a large document. I would like to extract the TOC to a (github-)markdown list e.g. with pandoc.

e.g. I have

\contentsline {chapter}{\numberline {1}Introduction}{1}{chapter.1}
\contentsline {section}{\numberline {1.1}Aim and Motivation}{1}{section.1.1}
\contentsline {section}{\numberline {1.2}State of the art}{1}{section.1.2}
\contentsline {section}{\numberline {1.3}Outline}{1}{section.1.3}
\contentsline {chapter}{\numberline {2}Fundamentals}{2}{chapter.2}
...

in my .toc file.

And would like to get something like this

1. Introduction
  1.1. Aim and Motivation
  1.2. State-of-the-art
  1.3. Outline
2. Fundamentals

Another alternative would be to extract this information (without the content) out of the tex-file directly. However, I could not get this working and I also think it would be more error-prone.

Any suggestions?

2

2 Answers

4
votes

Another alternative would be to extract this information out of the tex-file directly.

Pandoc can do that:

$ pandoc -s --toc input.tex -o output.md

To exclude the document body content, you'll have to use a custom pandoc markdown template:

$ pandoc -D markdown > mytemplate.md

Modify mytemplate.md to keep $toc$ and remove $body$, then use with pandoc --template mytemplate.md ...

If you want to customize it more I would recommend outputting to html (pandoc -t html) instead of markdown, then write a small script that traverses the html DOM and does your numbering etc.

0
votes

Pandoc unfortunately creates an empty markdown file in my case. I have created an open source cli tool, that performs that conversion: https://github.com/MaaxGr/latex-toc-markdown

Download the binary (see README of GitHub Page) and execute the following command:

./latex-toc-markdown Input.toc Output.toc

The Output file will look the following:

* 1 Introduction
  * 1.1 Aim and Motivation
  * 1.2 State of the art
  * 1.3 Outline
* 2 Fundamentals