My lua filters

A collection of lua filters
lua
pandoc
Author
Affiliation
Published

March 15, 2024

For a long time I thought that someone would write a complete Quarto extension for APA-style manuscripts. It turns out that someone was me.

Making apaquarto was more of a challenge than I imagined going in. At first the extension relied a lot of hacks—things I knew were fragile, but I did not know how to do it better. The better way was to use Lua filters. It took time before I wrapped my head around Pandoc Lua filters, but eventually I was able to make them work for me.

Pandoc converts text into an AST (abstract syntax tree). For example, a short paragraph like “I like Quarto.” would be

[Para [
   [ Str "I"
   , Space
   , Str "like"
   , Space
   , Str "Quarto"]
   ]
]

A Lua filter can select and transform specific elements For example, to transform all the Str (string) elements to uppercase:

Str = function(s)
  return string.upper(s)
end

After running this filter, the paragraph would be transformed like so:

[Para [
   [ Str "I"
   , Space
   , Str "LIKE"
   , Space
   , Str "QUARTO"]
   ]
]

Each filter you run will walk through the AST, making any changes to it. Quarto walks through the AST many times (e.g., normalizing metadata, processing figure references).

The documentation for Pandoc Lua filters is comprehensive and has instructive examples. I will be adding more examples to this post as I go.

Replacing ampersands in in-text citations.

When there are multiple authors, APA Style requires an “and” in in-text citations, but apa.csl cannot do that and still get parenthetical citations right. Lua to the rescue!

First Try

Adapted from Samuel Dodson:

function Cite (ct)
  if ct.citations[1].mode == "AuthorInText" then
    ct.content = ct.content:walk {
      Str = function(s)
        if s.text == "&" then
          s.text = "and"
          return s
        end
      end
    }
    return ct.content
  end
end

This function walks down a document and runs every time a citation is found.

The pandoc.Cite type has several fields in it, including citations and content. The citations field is a table that can contain several citations because parenthetical citations can have more than one citation. In-text citations have only one citation. To find the first citation in the citation field, we use square brackets and an index of 1. Thus, ct.citations[1] retrieves the first citation.

Each citation is a special element component that has several fields:

  • id has the citation identifier
  • mode tells the citation type (NormalCitation, AuthorInText, or SuppressAuthor)
  • prefix and suffix contain text before (e.g., e.g.,) and after the citation (e.g., pp. 33-49)

We only want to change AuthorInText citations. Thus, if ct.citations[1].mode == "AuthorInText" then means to only proceed if the citation is an in-text citation.

The content field of a pandoc.Cite contains the text that is going to be displayed from the citation (e.g., “Schneider & McGrew (2018)”). It is not actually a long string, but a sequence of “inline” elements like so:

[ Str "Schneider"
  , Space
  , Str "&"
  , Space
  , Str "McGrew"
  , Space
  , Str "(2018)"]

We want to “walk” through this sequence and find the Str elements that are ampersands and replace them.

So, the following command says, “Walk through the ct.content field, and find each Str element. If the text of the Str element has an ampersand, replace the ampersand with "and". Return the Str element.

ct.content = ct.content:walk {
  Str = function(s)
    if s.text == "&" then
      s.text = "and"
      return s
    end
  end
  }

Adding support for languages beyond English

I wanted people to use the filter in any language. In their metadata, they need to tell apaquarto what their word for and is. For example, Spanish speakers can substitute y for and like so:

lang: es
language:
   citation-last-author-separator: y

To make this work, the Lua filter needs to be able to access the citation-last-author-separator field in the metadata.

We set up a local variable andreplacement with a default value "and".

To pass metadata into a function, we need to set up two local functions, one to retrieve the replacement word for and in the metadata (get_and) and one replace ampersands in the citations (replace_and). Then the return function runs the get_and function on the metadata, followed by the replace_and function on each citation:

-- set default value for and
local andreplacement = "and"

-- get andreplacement if it is specified in metadata
local function get_and(m)
  -- check if m.language exists
  -- check if m.language["citation-last-author-separator"] exists
  if m.language and m.language["citation-last-author-separator"] then
     -- create string from metadata
     andreplacement = pandoc.utils.stringify(m.language["citation-last-author-separator"])
  end
end

-- Replace ampersand for in-text citations
local function replace_and (ct)
    if ct.citations[1].mode == "AuthorInText" then
    ct.content = ct.content:walk {
      Str = function(s)
        if s.text == "&" then
          s.text = andreplacement
          return s
        end
      end
    }
    return ct.content
  end
end

-- run the get_and function first, 
-- then the replace_and function
return {
    { Meta = get_and },
    { Cite = replace_and }
}

Citation

BibTeX citation:
@misc{schneider2024,
  author = {Schneider, W. Joel},
  title = {My Lua Filters},
  date = {2024-03-15},
  url = {https://wjschne.github.io/posts/2024-03-15-lua-filter/mylua.html},
  langid = {en}
}
For attribution, please cite this work as:
Schneider, W. J. (2024, March 15). My lua filters. Schneirographs. https://wjschne.github.io/posts/2024-03-15-lua-filter/mylua.html