A Minimal Concatenative Language with Variables

In one of my earlier posts I describe a syntax for concatenative languages with variables. The resulting language suffers from three apparent problems:

  1. The semantic separation between application and substitution is not particularly clear. It seems as though this ought to be the case in concatenative languages.
  2. The language did not look concatenative at all. Parentheses were everywhere! Why write in postfix if you still need a million parentheses?
  3. Variables can only be defined at the head of a function.

I’d like to remedy these three issues, but I’d also like this to be something of an introductory article, so I won’t refer back to my earlier system very much, and will instead start from the beginning.

Why not Stack-based Languages?

A few years ago there was a bit of strife on StackOverflow with regard to what ‘concatenative’ meant when applied to programming languages. This is pretty understandable, given the state of investigation at the time, and Norman Ramsey’s claims ring mostly true even now. But I do prefer the term to the colloquial alternative, stack-based languages, because a concatenative language being stack-based is an implementation detail, not necessarily a defining characteristic. It is quite possible to implement a concatenative language using something like graph reduction or term rewriting, and in some cases it may be preferable to do so to get the desired semantics, but that’s a topic for another day.

So what is the defining characteristic of concatenative languages if it is not operations that manipulate a stack? There’s something to be said for John Backus’s emphasis on function-level programming, and indeed many concatenative languages share an emphasis on tacit (or point-free) programming with Backus’s FP. But I think overemphasizing the combinatorial nature of concatenative languages is harmful in that without strict discipline it often leads to ugly, inscrutable code. Too often the ugliness of the point-free style seems plain unavoidable. Naming things is very useful, and substitution is arguably one of the most powerful operations ever invented.

So I tend to favor Jon Purdy’s definition. A concatenative language is a functional language in which the primary operation upon which programs are built is function composition, rather than function application as in the lambda calculus. Every expression denotes a function, even value names like numbers and strings. This definition almost gives us Backus’s emphasis on function-level programming (since everything is now a function) without restricting ourselves to point-free style.

A Minimal Concatenative Calculus

I will now present a minimal untyped concatenative calculus, similar to the untyped lambda calculus in that the only ‘values’ are function literals and the only operations are ‘calling’ a function and performing substitution of variables.

Let’s start with the syntax:

e ::= ϵ
    | e [e]
    | e call
    | e let x { e }
    | e x

There are no numbers, no strings, no booleans. Just function literals (represented by an expression enclosed in square brackets), a word for explicitly applying a function, and variable introduction and reference. The \epsilon production is there only to be a basis case for the syntax; it plays no role in reduction and can thus be elided when writing out expressions. Note also that this syntax already solves the last two of the problems I introduced above! Let’s finalize the system by solving the first problem.

The lambda calculus has one reduction rule: \beta-reduction. To get a system of equivalent power, we need two reduction rules:

  1. e_1\ [e_2]\ \text{call} \Rightarrow e_1 \cdot e_2
  2. e_1\ [e_2]\ \text{let}\ x\ \{ e_3 \} \Rightarrow e_1 \cdot ([e_2]/x)e_3

These rules rely on two metafunctions: substitution, which is defined much like substitution in the lambda calculus, and the \cdot operator, which works similar to a list append function in that it replaces the \epsilon in e_2 with e_1, thus conjoining the expressions so that e_1 ‘appears before’ e_2 if we write it from the most nested expression outward.

With these two reduction rules and metafunctions, we have a simple system of awesome power. But note that the way we’ve specified these rules neglected any notion of evaluation strategy: we can apply these rules at will anywhere inside the term. To get a stack-based evaluation strategy, we would apply these rules to the most deeply nested expression that is not inside square brackets at every step, always matching e_1 with \epsilon, and stop once we no longer find a match to one of the rules. This is not the only possible evaluation strategy, but it is probably the most familiar (and maybe simplest) to implement.

Examples

To show off that the tiny new language is still useful, we give some examples of how to do certain things. We start by defining Kerby’s eight basic combinators.

swap == let x { let y { x y } }
dup  == let x { x x }
zap  == let x { }

compose  == let f { let g { [g call f call] } }
partial  == let f { let g { [g f call] } }
constant == let f { [f] }

apply == call
dip   == let f { let x { f call x } }

I’ve renamed some of Kerby’s combinators to make it more clear just what they do. We can show an expression that never finishes reducing, proving that the system has the nonterminating property:

omega == [let x { x x } call] let x { x x } call

And one could imagine some of the functions above being written more succinctly if we added some syntactic sugar. Let’s imagine the language has ML/Haskell style lists and pattern matching (quite a leap, so familiary with Haskell or ML would be useful). We can then write higher-order functions over lists.

map []     f == []
map (x:xs) f == x f call [xs f map] dip Cons

fold []     i f == i
fold (x:xs) i f == x [xs i f fold] dip f call

Both of these functions assume that the function supplied to both map and fold consumes exactly the number of arguments it produces, but it need not be used that way since the language is untyped (so we could use map as a stack accumulator, for instance). This is really only recommended for motivated experts with a niche use-case, as the stack manipulation involved can quickly get hairy.

Conclusion

We’ve now defined a fairly intuitive, mostly readable, and noticeably concatenative minimal calculus. I find this core language to be immensely useful right now, and will likely use extended variants of it in my future concatenative endeavors.

As always, happy stacking!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s