Three Ways to Specify Your Concatenative Languages

This post is existential proof that even PL theory blogs can use those annoying listicle titles.

I’ve been thinking about how best to specify the syntax for concatenative languages. It would be nice if we could have a grammar as straightforward as the lambda calculus, but I don’t think we have that luxury. This is one of those places where dealing with concatenative languages is more annoying than dealing with standard applicative languages. I’ve come up with three solutions so far, each with its own set of good and bad properties.

Naive Style

We start with what I consider to be the ‘naive’ definition.

e ::= ϵ
    | [e]
    | call
    | let x in { e }
    | x
    | e e

This is the definition style we see in Christopher Diggins’s paper. What it has going for it is that it kinda looks like an extended lambda calculus (see that e\ e in there?). The first problem is that this is misleading, because concatenative languages do not behave quite like applicative languages: e\ e is not the syntax for function application, but instead for function composition. The second reason is because this definition complicates the reduction semantics because of the tree-like nature of the e\ e production. First, we have to add a bunch of noise rules for getting rid of \epsilon leaves that might have snuck in. But it’s more serious than that: suppose we have the following reduction rule for call:

(e_1\ [e_2])\ call \Rightarrow e_1\ e_2

We can reduce the below term just fine:

([\epsilon]\ [\epsilon])\ call \Rightarrow [\epsilon]\ \epsilon

But then imagine we are presented with this term:

[\epsilon]\ ([\epsilon]\ call)

Apart from the parentheses, this term looks very similar to the first. Given that postfix languages like ours don’t need parentheses, we’d like it to reduce to the same result as the first term. But the pattern of our reduction rule doesn’t match the structure of the term, so we can’t reduce it. We’re stuck! This problem can’t be solved simply by adding a rule that fits this structure, because the e\ e construct allows us to up aribtrary trees that, when flattened and stripped of any \epsilon, look like they have the pattern of our reduction rule. Because we want concatenation to be associative, we know we can flatten the tree into a list via a preorder traversal, but requiring a tree-flatten step before any reduction can take place means we would have to carefully design our reduction rules to maintain that invariant, or include some tree-flattening metafunction as part of the semantics. I don’t even want to bother, because we can actually enforce the flatness invariant within our grammar definition using the next style.

Nested Style

What we want is for trees to be ‘flat’ by definition, so that we don’t have to flatten an arbitrary binary tree before or during reduction. We structure the syntax in such a way that the resulting tree will be be guaranteed to only have one child at each step.

e ::= ϵ
    | e [e]
    | e call
    | e let x { e }
    | e x

Now we are guaranteed by the grammar to have flat expressions. The first benefit that we gain is that we no longer need to have a bunch of uninteresting reduction rules for getting rid of \epsilon. In this syntax, \epsilon can only appear at the ‘bottom’ of a tree, where it can be safely ignored by the reduction rules. The second benefit is that we no longer need to use parentheses when writing out our expressions. And the third, most important benefit is that we only have to write one reduction rule for call! Here it is:

e_1\ [e_2]\ call \Rightarrow e_1 \cdot e_2

Uh oh! What’s that e_1 \cdot e_2 syntax? This is the downside. We can’t write e_1\ e_2 since it’s not a part of our syntax, so to concatenate two programs we have to define a rather annoying \cdot binary operator, which is essentially a more verbose list append, replacing the bottom-most \epsilon in e_2 with e_1. This is still a more acceptable metafunction that tree-flattening, since it doesn’t really require us to be too careful.

List Style

Of course, using that type of representation in an actual implementation would be pretty cumbersome. We can take advantage of the predefined list functions available in many language standard libraries by splitting the syntax into two pieces:

e ::= ϵ
    | e t

t ::= [e]
    | call
    | let x { e }
    | x

Now the list-like nature of concatenative languages is apparent even from the grammar!

The major downside here is the mutual recursion, which can complicate proofs about properties of the reduction semantics (it certainly did for me in Coq, at least). But it makes sense as the abstract syntax in an implementation, since we don’t have to write our own append function and explicit recursion can often become higher-order functions on lists.

Of the three styles, I’d say the only the naive style is not suitable for use beyond defining the basic syntax of a language. The nested style and the list style both find applications in different settings.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s