Pandoc and the unknown wonders

The last time I mentioned Haskell was in an article about XMonad. However, it is not the only remarkable thing in the world that I present to you.

Surely everyone already knows Markdown, and if I do not present them. It is a light markup language that allows us to be faster and more productive when writing a text. Enough for now.

Well, Markdown is not alone and there are a multitude of languages ​​out there to perform the same function. Normally everyone goes with the idea of ​​taking a plain text with some marks and passing it to shape it as HTML, LaTex and others. All good up to there.

This brings up several problems. First, and most importantly, some implementations do not have todas the features we want. Or a markup language that has already implemented this we don't like.

And before we get caught up in the whims, you have to think and realize that there is a better solution. Something that turns any markup language into something else. Anyone.

This is where Haskell comes into the picture. The wonder I'm talking about exists, it's alive, it has a name, it works and it's amazing. Calling itself pandoc and it came from the hand of a philosopher from the University of Berkeley, John MacFarlane. Hold on, because the good begins.

All against all

We could expect pandoc to offer mediocre coverage of everything, by not concentrating. But no. It is excellently worked and has amazing functionalities.

To keep it simple, you pass any source file to pandoc (either in Markdown or in reStructuredText or others that pandoc supports) and converts it to a finished format, like - is everyone ready? -

LaTex, plain HTML, PDF, DocBook, OpenDocument, docx, rtf, man, plain text and up to three different types of HTML presentations; and my list is short, very short. Here is a diagram illustrating its power:

And last but not least, it's a Haskell library (or bookstore, call it what you want); that can be integrated into the code of other programs. hakyll It is one of the most exploited, this being a static page generator that using pandoc can convert from harmless Markdown and some LaTex to pure HTML.

Here a site list who already use it as a personal site, in the style of a blog.

And it's pretty fast, to finish it. And even with all these advantages, it seems that it only expands in Anglo-Saxon countries and here there is hardly any information available, such as -and it hurts- presentations of this type. Maybe it's because the user guide is in English.

The cons

Of course there has to be. Aside from its relatively short spread, most publishers don't fully support it.

Vim has syntax highlighting for Markdown and little else by default, so we miss out on some of the coolest things about pandoc: its extended syntax.

Things never raised in the original Markdown and that make our life easier, such as tables, citations, footnotes, HTML and LaTex within the code, metadata and more advanced characteristics.

By the way, Emacs has an advantage here. It has a Markdown mode that gives us syntax highlighting and a few useful commands, but there is a pandoc-mode full-fledged, whose Vim equivalent still can't compete with it.

If you're still interested in getting it into Vim, here's the syntax file. For emacs you have to install the markdown and pandoc modes, as already mentioned.

Straight to the point

I discovered pandoc when looking for the text2tags package (another converter but more limited) en crunchbang and now I know that it is available in Debian stable with the name, guess what, pandoc. A aptitude install enough for this. But the ones we use ArchLinux we have to suffer a couple of setbacks first.

That hell of dependencies

The first thing we think about is making a pacman -S pandoc. Well, no. There is no package in the official repositories and the AUR does not work, due to the massive amount of dependencies it requires. If you already know something about Haskell, you will now think that cabal will solve it. And yes, but with reservations. To do this you have to run the following:

sudo pacman -S ghc cabal-install cabal update cabal install pandoc

This should work but I don't recommend it. Especially if you want to enter the world of Haskell, because this will bring you horrifying problems in the future.

Surely it sounds strange to hear a complaint about ArchLinux and its philosophy, but it seems to me a complete nonsense to have removed the haskell-platform package from the repositories, which provided the last environment sufficiently developed and compatible with each other; why ghc and cabal-install got updated.

If you want to install other packages using cabal, it is best to download the packages old ghc and cabal-install from Arch Rollback Machine.

We install them with a simple pacman -U packet-path and we make pacman ignore them when updating the system, in the file /etc/pacman.conf; inside section ignorePkg.

Now we are able to use cabal to install pandoc and for it to work as it should, we put this line in our .bashrc file:

export PATH = ~ / .cabal / bin: $ PATH

And that's it. Something far-fetched, but we avoid problems. If you want to start with other packages, instead of installing them locally, you can use hsenv to create isolated environments and avoid the headache of installing, for example, Hakyll.

And once I warn you that it is horrible. All this because Haskell and Cabal have not yet solved the dependency hell that other languages ​​have already got rid of, such as Ruby with its Bundle and gems. Anyway, this little solution I owe it to Ian ross of Hakyll's group.

Be patient. It's a long install, because it compiles it for us.

Use and conclusions

You take a terminal and send an order like this:

pandoc -o output-file.ext -i original-file.md

Where Markdown (* .md, is the extension that I use) can be replaced by any other format and .ext by any other available in the output.

For me, that I do this practically every day, it has helped me a lot, especially to export to OpenDocument.

And the ecosystem is incredible. MacFarlane himself has developed a wiki in Haskell, which Pandoc uses to convert its pages, but that's for another story. By the way, this post was proudly written with Pandoc, like most of the ones I've written so far,


4 comments, leave yours

Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: Miguel Ángel Gatón
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.

  1.   Gadi said

    I also use Markdown. Kate and Gedit can have a plugin installed to highlight syntax. Then to convert it to ODT I opt for MultiMarkdown, it is the one that most, how to say it, "respects the text" so that when copying it to a document with paragraph styles it is more comfortable. Pandoc has not given me the same results, or at least I did not know how to do it 😛

    1.    anti said

      You can make Pandoc take only the standard Markdown, by activating the –strict option; if that's what you mean. However, its main advantage is the versatility between formats.
      Recommend it, with the little that I have seen that they use it, it did not hurt me.

  2.   erunamoJAZZ said

    I have used it to go from LaTeX to reStructuredText. He does it very well (most of the time xD)

  3.   msx said

    Interesting, thanks for sharing.