ACM Paper Publisher’s Toolkit

In this post, I’d like to present a few helpful tools for writing and publishing a paper in ACM format. I start with mentioning Vim—arguably the best text editor out there—and some optional, albeit useful, plugins for Markdown syntax highlighting. Markdown is a plain-text formatting syntax that can be later exported to various publishing formats, such as PDF or HTML. Actually, I suggest using Pandoc’s Markdown extension and their conversion tool, as it’s more expressive and powerful. I show how to create a custom ACM template to automate exporting it to PDF. Finally, I talk about how to use R and ggplot2 library to export plots directly to Latex to achieve typesetting matching the ACM template.

Vim

Vim needs no lengthy introduction (I hope). If you have yet to make the switch, you’re missing out on the amazing productivity Vim has to offer. It is by far the most powerful tool in my writing arsenal: coding, scientific or creative writing, you name it—it does it all (indeed, I’m writing this very passage in Vim). On top of the power it gives you out of the box, it supports countless extensions, some of which I’ll mention here.

Admittedly, Vim has a fairly steep learning curve. However, once you master at least the major parts of the ecosystem, your productivity will benefit greatly. Here are some useful resources you should consider when familiarizing yourself with Vim:

However, constant practice is always your best bet. Remember when you first started touch typing? It was challenging at first, and slowed down your work considerably. But then you got the hang of it, and it’s been smooth sailing ever since. Learning Vim is like that.

It is not to say that you must use Vim to take advantage of the rest of the tools presented here. Nevertheless, I assume you do, if anything just to encourage you to use it.

Plugins

Of course, you may simply use Vim as it is, and there is nothing wrong with that approach. However, I recommend vim-pandoc and vim-pandoc-syntax plugins (follow the links to the GitHub pages for installation instructions). They offer many useful features, including syntax highlighting: they use Vim’s conceal functionality to highlight Pandoc’s markdown format. I talk about some of its features in more details in the following sections.

Pandoc

Latex is very powerful but can be cumbersome. I find that Pandoc and Latex together provide a good balance between simplicity and power of expression.

Pandoc is a document converter, to put it simply. It supports many common document formats, such as: markdown, HTML, docx, epub, latex, and PDF. It also introduces a special extension of markdown format. I recommend reading this really interesting article, which describes how to write ACM-style papers in Pandoc. However, I found some issues working with the latest ACM template, which I want to focus on in this article. I also intend to paint a more complete picture of what needs to be done to use Pandoc successfully, and where it falls short of native Latex instructions.

Although markdown is no more powerful than Latex (quite the opposite), I find it much more readable and pleasant. Isn’t it nice to simply start your item list with an asterisk instead of having to define an itemize environment?

\begin{itemize}
  \item A nice little list---\textbf{very} nice indeed.
  \begin{itemize}
    \item Once you \emph{compile} it, that is...
  \end{itemize}
\end{itemize}
* A nice little list---**very** nice indeed.
 * Somewhat more... _succinct_.

But what if I come across something that’s not as easily expressed in markdown? I’m glad you asked. See, the beautiful thing about all this is that—provided you export to PDF or Latex—you can simply embed Latex expression in your documents, such as math equations. (The same is true about HTML tags when you export to HTML.)

To quote Albert Einstein: $E=mc^2$

Simple enough, isn’t it? Not only does it make for a powerful feature, in many cases the plugins handle the formatting for us:

Naturally, we can configure a lot. If you want to, say, disable converting double and triple dashes to en and em dashes (they tend to lose their expressiveness when using fixed-width fonts) or triple commas into ellipses, you can blacklist them in your .vimrc file.

let g:pandoc#syntax#conceal#blacklist = ["endashes", "emdashes", "ellipses"]

ACM Template

Now that we have our bases covered, let’s look into how to produce a paper conforming to the ACM standard template. First of, you need to download the Latex template from the ACM website. Put all the files in the same directory as your source file.

Pandoc Template

Pandoc provides many default templates, one for each supported format. When converting to PDFs, Pandoc actually uses the Latex default template as an intermediate step, and instructs the chosen Latex processor (pdflatex by default) to produce the final output. The default Latex template is very generic, allowing you to define a lot of metadata information in your source file, such as the title, the authors and so on. Pandoc also supports a citation processor (pandoc-citeproc) producing the reference section in many formats (including the ACM format). Thus, we can use a direct markdown to PDF conversion, as nicely shown in the aforementioned article.

However, I found this method to be a little constricting with the latest ACM template for two major reason. First, the default template supports only basic author metadata, whereas ACM template uses special commands to define authors and related information (institution, email or optional footnotes) as well as conference specific information (such as name, year, and location). The second reason is the way the citations are produced by Pandoc. The processor simply outputs them as generated plain text. I found that the formatting of the References section ends up being incorrect.

To fix it, I first define a custom—much simpler—Pandoc template; then, I apply a two-stage compilation: markdown to Latex, and Latex to PDF using pdflatex (which is what would happen internally anyway but it gives me an opportunity to use bibtex for references).

Below, I provide my custom template. I enter Pandoc’s preprocessor mode with the dollar sign. The meta-language supports printing variables, conditional statements and loops.

\documentclass[$for(classoption)$$classoption$$sep$,$endfor$]{acmart}
\usepackage{booktabs}
\usepackage[utf8]{inputenc}

% ACM meta
\setcopyright{$if(copyright)$$copyright$$else$none$endif$}
\acmDOI{$doi$}
\acmISBN{$isbn$}
\acmConference[$conference.shortname$]{$conference.fullname$}{$conference.date$}{$conference.location$}
\acmYear{$year$}
\copyrightyear{$year$}
\acmPrice{$price$}

\providecommand{\tightlist}{%
  \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}

$for(header-includes)$
$header-includes$
$endfor$

$if(title)$
\title{$title$$if(thanks)$\thanks{$thanks$}$endif$}
$endif$
$if(subtitle)$
\providecommand{\subtitle}[1]{}
\subtitle{$subtitle$}
$endif$
$if(author)$
$for(author)$
\author{$author.name$}
$if(author.note)$\authornote{$author.note$}$endif$
\affiliation{%
  \institution{$author.institution$}%
}
\email{$author.email$}%
$endfor$
$endif$
$if(institute)$
\providecommand{\institute}[1]{}
\institute{$for(institute)$$institute$$sep$ \and $endfor$}
$endif$
\date{$date$}

\begin{document}
$if(abstract)$
\begin{abstract}
$abstract$
\end{abstract}
$endif$

\keywords{$keywords$}

\maketitle

$for(include-before)$
$include-before$
$endfor$

$body$

\bibliographystyle{ACM-Reference-Format}
\bibliography{$bibliography$}

$for(include-after)$
$include-after$

$endfor$
\end{document}

Paper

Before we start writing the content, we must provide the metadata. You do that at the beginning of the document between --- lines using YAML.

---
title: Your Title
author:
-
  name: First Author
  institution: University of Good Taste
  email: first@author.edu
  note: A footnote information
-
  name: Second Author
  institution: Polytechnic of Legoland
  email: second@author.edu
abstract: |
  Any field can take up many
  lines.
bibliography: bibliography-file.bib
classoption: sigconf
header-includes: | # loading additional Latex packages
  \usepackage{multirow}
  \usepackage{siunitx}
  \usepackage{pgfplots}
  \pgfplotsset{compat=1.8}
  \usepgfplotslibrary{statistics}
  \usepackage{graphicx}
# ACM-specific stuff
copyright: rightsretained
doi: 00.000/000_0
isbn: 000-0000-00-000/00/00
conference:
  shortname: SIGIR'18
  fullname: International ACM SIGIR Conference on Research and Development in Information Retrieval
  date: July 2018
  location: Ann Arbor, MI, USA
price: 00.00
year: 2018
keywords: bag-of-visual-words; image retrieval; top-k search
---

After that, we simply use the Markdown syntax, which is very intuitive. As I will not cover the syntax here, please check out the official Pandoc User Guide. Here’s a small example of what it will look like:

# Section 1

The first paragraph.

## Subsection

Item list:
* item 1;
* item 2.

### Sub-Subsection

Block equation:
$$$E = mc^2$$$

# Section 2
...

Citations

The extended markdown format has its own syntax for citations: Sentence.^[reference]. However, since I’ve decided to use the bibtex processor, I just embed \cite{reference} in the source file. Note, however, that you might still use the default markdown sytax and implement a custom filter that translates them into \cite commands. Read this tutorial to learn more about it.

Tables and Figures

The ACM template uses a table environment along with tabular for tables. Since this is not supported by Pandoc, I use chose to embed pure Latex for these as well as the figures.

Compilation

I define a short script to export to PDF. First, I use pandoc to export to Latex using my own template. Then, I use pdflatex to produce the PDF output. Finally, bibtex uses produced output to generate the references. Note that you must run multiple times before the new references take effect.

pandoc paper.pdc -o paper.tex --template template.acm
pdflatex paper.tex
bibtex paper

R & ggplot2

Most of the scientific papers are incomplete without a bunch of plots and graphs for data visualization. Latex supports native plotting with tikz and pgfplots packages, however it might prove quite cumbersome and difficult to automate for whenever a new set of data comes in.

The alternative is to use external packages such as python’s Matplotlib or R’s ggplot. The caveat is that it’s difficult to typeset these plots to match the style and fonts of the ACM template. A solution exists, however, that exports plots directly into Latex/tikz files, which can be included in the source.

The package supporting that in R is tikzDevice. It defines an R device that writes directly into a Latex file. It works similar to the PDF device—the only difference is we must import the right package and use tikz function to define the device along with some additional information:

tikz('output_file.tex', width=3.2, height=2, fg='black')

Below is an example of a script I’ve written. I first load all required packages, including tikzDevice for exporting to Latex. I read my data files and bind them together. Next, I define the device using tikz function and set basic parameters, such as filename, width, and height. Afterwards, I plot everything in a usual fashion, and finally close the device.

#! /usr/bin/env Rscript
require(reshape2)
require(ggplot2)
require(tikzDevice)

# Load data
daat = read.csv('daat.times', header=F, col.names='latency')
daat$label = 'DAAT'
taat = read.csv('taat.times', header=F, col.names='latency')
taat$label = 'TAAT'
data = rbind(taat, blocks, blocks_batch, all)

tikz('output.tex', width=3.2, height=2, fg='black')
ggplot(data = data, aes(x=reorder(label, -latency, median), y=latency)) +
    geom_boxplot()
dev.off()

You can read more about the package and its parameters in the manual.

Summary

My description and examples of these tools are hardly exhaustive. However, I hope I was able to point out their strengths and convinced you to try at least some of them. Please refer to links I’ve provided across the article for more detailed documentation. Do not hesitate to leave comments or suggestions below.

Leave a Reply