TLA+ in Practice and Theory
Part 4: Order in TLA+

This is the last installment in the series. Part 1, Part 2, Part 3. A video of a 40-minute talk that covers parts of this series.

If you find TLA+ and formal methods in general interesting, I invite you to visit and participate in the new /r/tlaplus on Reddit.

In part 3 we saw how TLA, the temporal logic of actions serves as a universal mathematical framework for reasoning about discrete dynamical systems, much like ordinary differential equations are a framework for specifying and reasoning about continuous systems. Now, in our final installment in the series, we will mostly focus on those capabilities of TLA+ that are unique to the needs of reasoning about engineered discrete systems in general, and software in particular. We will discuss encapsulation, instantiation and information hiding, see how they can be used for composition and specifying open systems (think libraries), and introduce important notions of equivalence and order relations between algorithms.

In this post, as in part 3, I will sometimes use the word “set” loosely (as in a set of states or a set of behaviors) to mean any collection, rather than set in the ZFC set-theory sense. I wish to remind you yet again that, while comprehensive, this series is a deep-dive into the theory of TLA+ rather than a tutorial. Understanding of subjects we cover is not necessary to write good TLA+ specifications.


An important ability in programming is encapsulation, or modularity, which allows us to hide internal implementation details of components. Encapsulation can also serve as an important form of abstraction, letting us instantiate components with different parameters. In mathematics, encapsulation is not as important as in programming because mathematical theorems tend to be relatively short. While software specifications are not as large as programs, they can be far more elaborate than theorems used by mathematicians1, and so requires encapsulation. Encapsulation together with parameterized instantiation of components is also important from a theoretical perspective as a means to study composition in software.


Before we move on with the theory, we will cover the last missing technical detail of TLA+, the means by which encapsulation is achieved: the module system. Every TLA+ definition or declaration must reside inside a module. A module serves as (parameterized) namespace for the definitions, constant and variable declarations, and theorems and axioms it contain.

Modules are enclosed like so:

and can be nested to any depth. All definitions and declarations in an enclosing module made prior to a nested module are imported into the inner module.

Every name (definition or declaration) in a module is exported — i.e., made accessible to other modules — by default, unless prefaced by the keyword .

The first (and only first) line in a module may be (and often is) $\EXTENDS M1, M2, \ldots$ where $M1, M2,\ldots$ are modules. It imports all exported symbols from those modules as if they were all defined in the the extending module.

A module can be instantiated once or multiple times within another module, and the instance can be assigned a name with a definition. Every declaration — constant or variable, i.e. a free ordinary or temporal variable — in the instantiated module is treated as a parameter and must be assigned in a clause:

where $d_1 \ldots d_n$ are the declarations in the $Foo$ module, and $e_1 \ldots e_n$ are the meanings given to them. If a symbol with the same name is already defined or declared in the scope where is written, it need not be mentioned explicitly in the clause but is assigned by default to the symbol with same name in the instanting scope.

A module can’t be instantiated without all of its declarations assigned (it is a syntax error). Assignment substitutes the given meaning in place of the declarations. Like parameterized definitions (operators), instantiation is substitution, only at a module level. From a logic perspective, instantiation binds the free variables in the instantiated module (although it may bind them to free variables in the instantiating module, thus leaving them free).

Once a module is instantiated, its definitions can be accessed with the instance name followed by !, as in $MyFoo!Ordered(\ldots)$ (infix operator can be referenced like so: $MyFoo!\preceq(a, b)$). For each definition in $MyFoo$, $MyFoo!Def$ means the same as $Def$ inside $Foo$, except with all of the declarations of $Foo$ replaced by the meaning given to them in the clause (or implicitly). For brevity in the discussion, if module $Foo$ contains some definition $X$, we may sometimes call the definition after the substitution, $\overline{X}$, instead of $MyInstance!X$.

For example, in part 3 we defined “polymorphic” sorting algorithms that had the constants $S$ and declared. If the $QuickSort$ algorithm resides in the $QS$ module, then we can instantiate the module like this (assuming that the temporal variables $A, A0, done$ are declared in the instantiating module, and are therefore substituted implicitly):

Then, $IntegerQS!QuickSort$ will be a sorting algorithm for sequences of integers.

We can also instantiate a module inside a parameterized definition, like so:

and then $IntQS(\leq)!QuickSort$ sorts in ascending order, while $IntQs(\geq)!QuickSort$ sorts in descending order. Note how we don’t need to write $\WITH \ldots, \preceq \leftarrow \preceq$, as the name $\preceq$ exists in the scope of the construct, in this case as an argument of the operator $IntQS$.

We could then write the theorem:

Whereas $M1 \defeq \INSTANCE M \ldots$ creates a named instance — $M1$ — of the module $M$, we can also create unnamed instances simply with $\INSTANCE M \ldots$ This has the effect of importing all the definitions and declarations in $M$ into the instantiating module (with their assigned meanings given in the clause or implicitly), and they can then be accessed without qualification. This, in a sense, turns the instantiating module itself into an instance of the instantiated module. $\LOCAL \INSTANCE M \ldots$ has the same effect, except that the symbols of $M$ become local in the instantiating module, and are not exported. Of course, a module can only be instantiated anonymously once within another module, otherwise the names imported by its multiple instantiations would clash.

Unnamed is similar to , except that requires that all declarations in the instantiated module be assigned (or re-declared), whereas imports all declarations as if they were declared in the extending module (for example, if the instantiated module contains $\CONSTANT C$, then the result of would be as if $\CONSTANT C$ had been declared in the instantiating module, while requires that $C$ be already declared or defined in the instantiating module). Also, does not support substitution, and it allows naming multiple modules at once.

Because module names can only be mentioned in a , , or clause, and nothing but a module name can appear in those constructs, there is no possibility of confusing module names with other names (definitions, declarations), so TLA+ allows using the same name for a module and for a definition or a declaration.

Modules are not first-class objects in TLA+ in the sense that they cannot be passed as arguments to operators like modules in the programming language ML or objects in object-oriented programming languages. However, the pattern,

is very much analogous to (the illegal) $MyM2 \defeq \INSTANCE M2(M1)$, presuming that the “signatures” of $M1$ and $M2$ match, namely that $M1$ exports the names required by $M2$’s declarations.

In fact, module instances — like operator arity we saw in part 2 and TLA “levels” we saw in part 3 — are another case where TLA+ uses something similar to inferred types at the syntax level. The language tracks which definitions (top-level or in blocks) are module instances, and it is a syntax error to use either when the other is needed. Moreover, not only does the syntax track whether or not a symbol refers to a module instance, if it does it tracks which module, and so symbol resolution is never ambiguous (unlike the case in most untyped programming languages). It is fair to say that TLA+’s module system is actually typed in a manner similar to modules in ML.

Modules can be used to abstract away encoding (“implementation”) details of mathematical objects. Here’s a (very) contrived example that abstractly defines vectors without specifying any particular encoding:

Now we’ll declare a $Vector3D$ module that extends $VectorSpace$ by using an unnamed instantiation of $VectorSpace$:

The module assigns the dimension as well as the construction and destruction operations — that, for whatever reason, use a record to encode the 3D vector — but does leaves element addition/subtraction free constants. When $Vector3D$ is instantiated, they will need to be assigned.

The module hides the encoding of the vector. If we instantiate it, like so:

and then construct a 3D vector, $v \defeq V3D!ConsVector(\seq{4.5, 1.6, -18.99})$, we cannot use $v.x$ because the encoding is hidden from us (in other words, the $Vector3D$ does not export any theorem that says that its vectors are records in the set $[x : S, y : S, z : S]$)2. In addition, because we’ve hid $ConsVector$ and $Elem$’s definition inside operators, the details of the definitions cannot be examined by proofs with .

If a module $Foo$ contains $\ASSUME A$ (for some assumption $A$) and a $\THEOREM T$ and is instantiated with $MyFoo \defeq \INSTANCE Foo \;\WITH \ldots$, then $MyFoo!T$ would refer to the theorem $\ASSUME \overline{A} \;\PROVE \overline{T}$ (remember, $\overline{A}$ and $\overline{T}$ denote $A$ and $T$ after the substitutions supplied in the clause). In other words, all assumptions in a module are added as its theorem’s assumptions when instantiated. That’s because the theorems in a module depends on all assumptions in the module, which they treat as axioms.

It is the assignment of temporal meaning to temporal variable declarations that will serve as the basis for the theoretical discussion that will follow. The next two sections cover some technical corner cases of substitution. If you only care about the big picture, you may skip them and go directly to the chapter on composition.

Substitutivity of Operators

There is the technical matter of giving a temporal meaning (an expression that involves a temporal variable) to constant declarations. A constant module is a module where all declarations are constants (i.e., it does not declare any temporal variables), and all its definitions are constant-level, meaning that they do not make use of the prime operator or any temporal operator. If a module $M$ is not a constant module, its constant declarations can only be assigned a constant meaning — not a variable, state function or a temporal formula — and its operator declarations must be assigned a constant-level meaning. However, if a module is a constant module, then its constants may be given non-constant meaning, provided that the meanings assigned to its operator declarations are Leibniz, or substitutive, as I’ll explain. Consider the following module:

The module is constant and the theorem is valid (i.e., it is a tautology), but if we were to instantiate the module like so:

The meaning of the theorem would be $(x = y) \implies (x’ = y’)$ which is false, and we cannot allow a theorem to become false when we substitute its free variables, so that instantiation would be a syntax error.

An operator $F$ is said to be Leibniz (substitutive) iff $e = f \implies F(e) = F(f)$ (similarly for operators of higher arity). All builtin TLA+ operators are Leibniz except for prime and the temporal operators, and all constant-level user-defined operators are Leibniz.

A non-constant operator may be Leibniz, too, for example:

For an operator to be non-Leibniz, one of its parameters must appear in the definition as an argument of a non-Leibniz operator like prime.

The Distributivity of Substitution

We now get to another nuance in TLA+, which is of very minor importance in practice but as we’re diving deep into the theory of TLA+, we should explore its darkest corners (but feel free to skip this section). This nuance, however, sheds some light about the mathematical manipulation of programs in general.

We ask whether or not the substitution operation, or “barring”, is distributive over the various logical operations. Let me explain what I mean with an example. Suppose we have the following module:

which we then instantiate with

We ask, is $M!A \land M!B$ equivalent to $M!C$? In other words, is $\overline{A} \land \overline{B} \equiv \overline{A \land B}$? If so, we say that substitution, or barring, distributes over $\land$.

In The Temporal Logic of Actions Lamport points out that barring distributes over most TLA operations. For example, $\overline{\Box(F \lor G)} \equiv \Box(\overline F \lor \overline G)$ for any formulas $F$ and $G$. Indeed, for the normal form of algorithm specification, .

However, substitution may not distribute into the fairness conditions that make up $Fairness$. I.e. may not be equivalent to . The reason is that the weak and strong fairness operators and are defined in terms of the operator. is special in that it is a state predicate but is defined using two states ($\ENABLED A$ is true in state $s$ iff there exists some state $t$ such that the transition $s \rightharpoonup t$ satisfies the action $A$), and it is not the case that is always equivalent to .

For example, if $A \defeq (x’ = x) \land (y’ \neq y)$ and $v \defeq \seq{x , y}$, then is always true (as there always exists a possible next state regardless of the current values of $x$ and $y$), and so it is equivalent to . If our substitution is $\overline x = z$ and $\overline y = z$, then , but which is always false. This happens because the primed variables are no longer free in the substituted formula, and the substitution that binds them may place constraints on them. So barring does not distribute over and the fairness operators defined using it.

This is a technical subtlety but not a problem in practice, as we can compute from by means other than blindly substituting all variables; it’s easy to compute a state predicate that’s equivalent to but doesn’t mention primed variables.

Let’s place this case in a specification, and analyze the consequences. The example would also demonstrate why we must define such that it is still in the case above, rather than define it to be equal to ; in other words, it will help understand why barring must not distribute over . Consider the following nested modules (and note that module $Inner$ contains the free temporal variable $x$ as it’s defined in an enclosing module prior to $Inner$’s definition):

If we look at module $Inner$ in isolation, then obviously $P$ is a theorem. But what about module $Outer$’s $I!Spec$? What behaviors does it specify? If the fairness condition didn’t exist, then behaviors where both $x$ and $y$ start up equal to each other and stutter forever would satisfy the specification. But the we do have a fairness condition, so recall its definition:

It means that if the action is continuously enabled then it must occur infinitely often. One thing is certain: the action $Next$ cannot occur infinitely often — nor can it occur even once — in $I!Spec$. If barring distributed over , then would be equivalent to , which, as we saw in the discussion above, is always false in our “barred” formula, $I!Spec$. But if that were the case, then would be true (an implication is true when its left operand is false), which would make $I!Spec$ true even though the $Next$ action can never occur; a stuttering behavior where $x$ and $y$ are constant would satisfy it. This is clearly not what we want, and worse, if that were so, then $I!P$ is not a theorem because a behavior where $x$ and $y$ are always equal cannot eventually reach a state where they’re different. We cannot accept a situation where substitution of free variables turns a theorem false.

However, if is defined to be true in this case as in the discussion above (and indeed it is), then the fairness condition would become equivalent (by its definition) to , which is equivalent to in $I!Spec$, as $Next$ cannot occur. And if it’s equivalent to , then so is $I!Spec$. And if $I!Spec$ is , then $I!P$ is vacuously true, because implies anything. All of $I!Spec$’s behavior eventually reach the impossible state where $x \neq x$ because there are none!

Also note that $Spec$ is machine-closed, yet $I!Spec$ isn’t because it is the liveness property that becomes false and makes the entire formula $\FALSE!$ — clearly a very strong safety property (we are not allowed to do anything). Substitution can, therefore, turn a machine-closed specification into a non-machine-closed one, which is yet another reason why we want to allow expressing non-machine-closed specifications.

This annoyance of substitution not being distributive over led Lamport to observe that substitution in general does not distribute over common programming constructs (like sequential composition) in his short note, Substitution: Syntactic versus Semantic.


Engineered systems are often made of components. Describing each component separately, and then the method of their composition is how engineers tackle complexity. Even when a system is made of a single component, it interacts with its environment — the user, the physical world (through sensors and actuators), or other systems — which may be thought of as another component. In this chapter, we’ll learn how to model interaction and composition of components. While TLA can elegantly model arbitrary forms of composition, in the next chapter we’ll see that its powerful “first-class” representation of abstraction, makes composition an unnecessary complication in practice.

Model-Based Specifications and Information Hiding

How do we specify how a system or a component interacts with its environment or with other component? In other words, how do we specify an interface?

Let’s consider the following, very abstract, very mathematical specification of a stack. But, first, as a technical mathematical necessity (see Part 2, Data Refinement and Inductive Data Types), we’ll define an operator that can be used to specify a set by some properties; it selects some “smallest” set that satisfies the property $C$:

Now, the stack:

What’s going on here? We define $Stack$ as the “type” of stacks by defining its algebraic structure: a stack can be $\text{EMPTY}$, all stacks can be constructed by pushing elements, and $\text{EMPTY}$ cannot be a result of a push operation.

This description should be very familiar to people who use typed functional programming languages, as it is precisely how algebraic data types are defined. Axiom A1 (remember that is synonymous with ) specifies the “type” of the stack operations (defined as functions), and axioms A2 and A3 specify the relationships among the three stack operations.

The specification captures precisely what it means to be a stack without saying anything about how one is implemented. It’s true that the specification uses functions rather than algorithms in the TLA sense, but I chose that only to make the specification be recognizable to those who are familiar with algebraic specifications in typed functional languages. Axiomatic specification of algorithms in TLA+ could also be done by using temporal formulas in the axioms.

This stack specification is an example of an algebraic specification, which, in turn, is a special case of an axiomatic specification. Axiomatic specifications specify the external behavior of some system or data structure — its interface — as a list of properties is must satisfy. This approach seems very elegant at first sight, very abstract, very mathematical. But see if you can specify a FIFO queue in that way; or a B-Tree; or a concurrent B-Tree. It is very hard for people who are not logicians to figure out how to specify a system as a list of axioms. For example, the use of $Least$ is necessary to exclude infinite stacks, something which may not be obvious to non-logicians.

Here’s what Lamport says about axiomatic specifications:

The lesson I learned from the specification work of the early ’80s is that axiomatic specifications don’t work. The idea of specifying a system by writing down all the properties it satisfies seems perfect. We just list what the system must and must not do, and we have a completely abstract specification. It sounds wonderful; it just doesn’t work in practice.

After writing down a list of properties, it is very hard to figure out what one has written. It is hard to decide what additional properties are and are not implied by the list. As a result, it becomes impossible to tell whether a specification says all that it should about a system. With perseverance, one can write an axiomatic specification of a simple queue and convince oneself that it is correct. The task is hopeless for systems with any significant degree of complexity.

What works in practice — at least for engineers who are not logicians — is specifying by example. We write a specification of a particular, simple, implementation of a stack, and say that a stack is anything that exposes the same behavior. This approach is called model-based specification. In order for it to work, we need to hide the implementation details of the model because other implementations are not required to mimic those; hiding information is a cornerstone of abstraction.3

Here is a model-based specification of a stack (this time we will use algorithms):

However, we’re only interested in how $Push$, $Pop$ and $Top$ interact with the $in$ and $out$ variables. A stack implementation need not necessarily maintain a $buf$ variable containing a sequence of elements. To use our specification as one describing an interface we must hide the internal implementation detail of our model implementation, namely $buf$. This is done using module instantiation and temporal existential quantification like so:

If we have a specification $MyStack$, the statement $MyStack \implies Stack$ means that $MyStack$ implements $Stack$. We will take a close look of what it means for an algorithm to implement another later, but now I’ll explain how $\EE$ works to hide the $buf$ variable.

In part 3 we briefly encountered existential temporal quantification, $\EE x : F$ as saying that “there exists a temporal variable $x$ that may take a different value at each state such that $F$”. However, it’s easier to think of $\EE x : F$ as “$F$ with $x$ hidden”. This is similar to regular existential quantification in first-order logic where, for example, $\E y \in Int : y * 2 = x$ uses a hidden variable $y$ to say something about the free variable $x$. The bound variable $y$ is no longer part of the model at all, which only speaks of the free variable $x$. This is true for temporal quantification as well.

Note that hiding $buf$ in $Stack \defeq \EE buf : Internal(buf)!IStack$ must be done in a different module from InternalStack, one that does not declare $buf$ as a free variable in a clause, because TLA+ does not allow re-defining any names.

Open System Specifications

Our specifications describe a system’s behavior as it interacts with its environment. If formula $M$ describes how the system behaves, and $E$ describes how the environment does, then the full specification — expressed as a composition — is $E \land M$ (remember the equivalence between IO and concurrency we’ve seen in part 3, and that, in practice, it’s simpler to just write a single formula with nondeterminism that expresses the environment; but we’re talking theory here). An open-system specification describes the contract between the system and the environment — think specification of a library.

The system part, $M$, of the conjoined specification is insufficient as an open system specification because the correctness of $E \land M$ depends on the behavior of $E$ as well; $M$ by itself would assert that the system can behave correctly no matter what the environment does, in other words, it would take $E$ as $\TRUE$, meaning full nondeterminism that allows any sort of interaction with the system. Sometimes, this may be acceptable, as with our stack example above. But sometimes we’d want to specify a library or a system that only guarantees correct behavior if external components — be they other systems or other parts of the program — interact with it in a certain way. We would like to specify just what the system expects or assumes of the environment in order to function correctly rather than the system plus the environment as a whole. Such specifications are also called rely-guarantee or assume-guarantee, signifying the contract that under so and so assumptions, the system can make such and such guarantees. Such a contract requires that the system behave correctly only if the environment acts as expected.

We could use $E \implies M$ as the contract4. However, the contract $E \implies M$ is too lenient on the system. Consider a server that receives requests in the form of even numbers, and returns their value divided by two. If the request does not contain an even number, the system makes no guarantees on the outcome. We could specify $E$ and $M$ like so (in this specification the system does not necessarily respond to every – or even any – request, but that’s good enough for this example):

The specification $E \implies M$ appears to say what we want, but not quite. Suppose that in its first nine steps the environment sends the system even numbers, but in the tenth step the environment sends us an odd number. This would mean that for that behavior, the formula $E$ is false, and so (because $A \implies B$ is true for any $B$ if $A$ is false) the system may exhibit any behavior for which the formula $M$ is false. For example, set $y$ to 33.125 in the second step! This is not what we want. We want to say that $M$ would behave correctly as long as $E$ does, i.e., up until the step $E$ starts misbehaving.

The special temporal operator $\whileop$ says that if $F \whileop G$, then $G$ would not “become false” before $F$ does, i.e., any behavior that satisfies $F$ up to some finite prefix, must also satisfy $G$ up to that prefix. Our open system specification would, therefore, be $E \whileop M$.

In practice, however, simple implication (i.e. $E \implies M$) would suffice, because it is very hard to write a specification for $M$ — let alone a realizable, machine-closed one — that knows in advance that it may misbehave if the environment will misbehave sometime in the future.

Composing Specifications

In part 3 we saw some examples of specifications composed by the conjunction of their formulas. Those examples were of parallel composition, but we can compose specifications in many different ways.

Mathematical theories sometimes involve cumbersome constructions whose intent is just to show they are theoretically possible, and at the same time unnecessary. Turing machines are an example of that. Once their universality has been established with some tedious constructions, that construction need not be repeated, and computer scientists are free to refer to Python programs as Turing machines without writing a Python compiler to TM. We will follow a similar route. I will show how TLA+ can support compositions that emulate programming language constructs only to later show why they are unnecessary.

Ever since structured programming became popular, subroutines have been the primary form of decomposing programs in all mainstream programming languages and in almost all programming languages in general5. We can write specifications as a composition of subroutines by writing some definitions that express the meaning of a subroutine. Those constructs — in particular, the $Subroutine$ operator — are far more complex than anything you’re ever likely to encounter in a real TLA+ specification, but their importance lies only in showing that such compositions can be expressed.

We can now use those definitions to encapsulate modules as subroutines and call them:

It is an easy matter to add support for recursion by simulating a stack; PlusCal does exactly that.

In fact, if $a$ and $b$ are some syntactic constructs — program terms — in some deterministic programming language, however they’re combined — be it as subroutines, sequentially as in $a; b$, in parallel ($a$ || $b$), as rules in some declarative language — then their composition can be expressed in TLA+ as:

where $F_a$ and $F_b$ are formulas specifying the two components, $w$ is a tuple of variables used in the composition, $F$ is an operator that defines the composition, and $C_a$ and $C_b$ are constants (one reason constants may be necessary is that $\land$ is commutative, and the composition used — e.g. $a;b$ — may not be).

This ability means that, at least in principle, we should be able to extract from a program, written in some programming language, a TLA formula that specifies it exactly, by direct transformation on the syntax. The TLA formula — or rather, its simple semantics — would then constitute a semantics for the program. This translation from program to a logic formula in some program logic is called the characteristic formula approach6.

However, given the need for the constants $C_a$ and $C_b$, I am not certain that such a translation would be regarded as compositional by programming-language theorists, as the term $a$ may be represented in two different programs as $F({C_a}_1, F_a)$ and $F({C_a}_2, F_a)$, i.e., with two different constants. Without the constants, the target logic would need to have the same mechanisms of composing terms (sequence, parallel, function application, rule etc.) as that of the language, while TLA can be universal. Because of its applicability to both sequential and interactive or concurrent algorithms, and because of its ability to easily express rich abstraction/refinement and equivalence relations, TLA can be a particularly suitable logic for this approach.

I promised a solution to the “parallel or” problem from part 3. The composition presented there assumed that the given specifications have no fairness condition, for if they did, $Spec0$ could require that if its next-state action were enabled, then it must eventually be taken, yet our composition would terminate — i.e. would not take any further $Spec0$ steps — if $Spec1$ terminates first. In order to accommodate specifications with fairness conditions, we could still, however, write a composition of the kind described here, that counteracts the fairness conditions of the two specifications:

This, indeed, solves the problem as stated, but perhaps not entirely satisfactorily. A $ParallelOr$ specification would indeed terminate iff any of its two given specs terminate, but if $Spec1$ terminates first, and so the second disjunct of $AntiFairness$ for $Spec0$ is true, then $Spec0$ could be ignored completely, and $state0$ could take any values whatsoever. We may wish to create a stronger composition that truly interleaves the two specifications until one of them terminates. To do that — Stephan Merz showed me this solution — we can use the $\whileop$ operator we learned in the section Open System Specifications to define $AntiFairness$:

Now $AntiFairness$ states that its operand $Spec$ must hold as long as the alternator would eventually execute another step of $Spec$ (by setting $s$ to $Spec$’s index; either 0 or 1 in our case). The above formula can be read as, “as long as $s$ will eventually be equal to $x$, $Spec$ must hold.”

To understand why simple logical conjunction is a universal mechanism of composition7, like many things in TLA, it is helpful to think of nondeterminism. The specification of a component imposes certain constraints on the behavior of the system. But a component also interacts with its environment — the user, the network, the operating system or other components — on which the component generally imposes no constraints (except, possibly, as requirement for its correct behavior, as we’ve seen when discussing open system specifications). To the component, the environment appears nondeterministic. The composition of multiple components is, therefore, the intersection of the constraints imposed by all components, or, in logic terms, the conjunction of the formulas describing them. A conjunction of formulas — an intersection of their behavior — represents the combined constraints imposed by them, and forms a decrease in the total nondeterminism of the specification, as we specify the behavior of more of its parts. This observation becomes obvious and precise when we learn, in the next chapter, that there is a partial-order relation on TLA specifications defined by their nondeterminism — where “bigger” is more abstract or less deterministic — and, in fact, they form a boolean lattice (almost). The least deterministic (“biggest”) specification that still determines both $A$ and $B$ (i.e., “smaller” than both $A$ and $B$) is their meet, $A \land B$. This is just a semantic way of expressing this simple deduction:

It should not bother you or offend your aesthetic sensibilities that we need to do things that appear low-level to a programmer, like simulating a stack, as the entire mechanism can be hidden inside a module and thus completely abstracted. In fact, there is never a need to actually write a specification composed so finely. The whole point of this section was to demonstrate that TLA+ can represent the low-level mechanisms of programming languages; what matters now that we know it can be done, is demonstrating that it need not be done. As we’ll now see, we can equivalently and much more easily specify one component at a time, abstracting all others.

Equivalence, Abstraction and Implementation

Equality and Equivalence

In every mathematical theory that defines some object, an equality relation is of the utmost importance, as you don’t really define an object uniquely unless you also say when two descriptions in your theory refer to the same one. The objects TLA is concerned with are discrete dynamical systems, and so we must define when two of them are equal. As we’ve learned, a TLA in formula specifies a set of behaviors. Two algorithms are equivalent, then, iff their respective sets of behaviors are equal8. At the logic (i.e. syntax) level, two formulas denote the same set of behaviors iff they are logically equivalent (i.e., one is true iff the other is true), which is denoted by the simple logical connective $\equiv$ , called “if-and-only-if” or “equivalent”.

TLA’s equality is a very strong form of equivalence. We might sometimes care about weaker forms of equivalence. For example, we can give different significance to different parts of the algorithm’s state. We can say that an algorithm’s behavior is a sequence of states, where each state is a pair of visible state and hidden state, and that two algorithms are equivalent if the set of observable behaviors they denote — when only the observable part of the state is taken into account — are the same. You can think of this as a projection of behavior along some dimension that only captures the observable state.

For example, in part 3 we saw three specifications of algorithms computing the factorial function, and I mentioned that $Fact1$ and $Fact3$ are equivalent, i.e. they’re the same algorithm, only $Fact3$ has more detail (the stack). Let’s see how. We’ll assume that each specification is defined in its own module, named $Fact1$, $Fact2$, and $Fact3$ respectively. In a separate module, we can then write:


Meaning that $Fact1$ and $Fact3$ are equivalent if we disregard (hide) $Fact3$’s variables $pc$ and $stack$, considering them the hidden part of the state, and the variables $N$, $f$ and $i$ the visible part of the state (which we leave free). This is a stronger statement than $Fact3 \implies Fact1$. In ordinary (non-modal) logic, too, if $G$ is a formula with two free variables, $x$ and $y$ (and so, for clarity, we’ll write $G(x,y)$) and $F$ is a formula with just the free variable $x$ (so we’ll write $F(x)$), then,

but the entailment doesn’t work in the other direction (consider the case ).

We might care about even weaker forms of equivalence. A common form of weak equivalence, relevant for sequential programs and is central in the theory of pure functional programs is extensional equality, which says that two programs are equivalent if they produce (or “reduce to”, in PL-theory parlance) equal outputs for equal inputs. It is an equivalence that mimics the notion of equality of functions9. If we treat computations as behaviors, as we do in TLA, the functional form of extensional equality compares just the initial (input) and final (output) states by hiding all intermediate states.

First I’ll show how this can be done by changing the original specifications a bit; later I’ll show how this can be done without any changes.

Here are the two specification, modified by the addition of a new variable, $out$:

Now we’ll establish the extensional equivalence of $Fact1$ and $Fact2$:

By hiding variables $i$ and $f$ we’re still just projecting the states onto a lower dimension, but because we are left only with $N$ — which never changes — and $out$ which changes only at the last step, we get a projection with stuttering steps, which is equivalent to the following algorithm with a single step:

We can even extend this kind of extensional view to nonterminating computations. Remember the complicated subroutine construction we’ve seen? Well, when hiding the control variables with

our algorithm is equivalent to:

This is very important because it means that we are mathematically justified in replacing subroutines with functions that represent them extensionally if we’re not interested in the dynamics of the subroutine. If you’re interested in the theory of programming languages, this gives us a very precise relationship between denotational and operational semantics. We can regard the factorial completely denotationally, as a function, or operationally, as an algorithm or a subroutine, and the relationship between the two is exactly an equivalence with hidden state. Later we’ll see that this is a special case of abstraction/refinement relation.

We can take equivalence even further. Consider the following two specifications:

These two algorithms are clearly not equal, but they are also equivalent in a very clear sense. Here are their respective state diagrams (every state also leads to itself because of stuttering invariance):

%3 a a a->a b b a->b b->a b->b
%3 a 0 a->a b 1 a->b b->a b->b

The two algorithms only differ in the values of their states — also called labels in the theory of program analysis — but the structure of their state transitions is identical.

We can express their equivalence like so:

This equivalence is the equality of $Spec1$ and $\overline{Spec2}$, under a mapping of the states. In this case, the mapping is 1-1 and onto, i.e. a bijection, but it doesn’t have to be. Let’s go back to our factorials and do things a little differently this time. We’ll now specify the $FactAlg$ module like this:

And we’ll go back to our original definition of $Fact1$:

We can do something nearly identical with $Fact2$:

Instead of hiding variables in $Fact1$ and $Fact2$, we used a mapping of states, only this time, unlike in our example of the two-state machines, the mapping is contractive; multiple states of $Fact1$ (and $Fact2$) are mapped to a single state of $FactAlg$. The ability to replace hiding (i.e., temporal existential quantification) with a substitution, or a state mapping is related to an important theorem we’ll get to shortly.

Abstraction and Refinement

We’ve talked about what it means for algorithms to be equivalent to one another, and we’ve seen how we can define coarse or fine equivalence relations using hiding or substitution. But it is very common in computer science to speak of an algorithm implementing some interface, or of a type or an interface being an abstraction. TLA allows us to speak precisely about those concepts, and so we will give precise meaning to the familiar intuitive notions of implementation and abstraction.

There are many ways we think of abstraction. We can think of it as a distillation of the important features of some system, or we can think of it as identifying commonalities in a set of things and factoring them out, or we can think of them as a screen hiding irrelevant detail. There are probably other ways people think of when they talk of abstraction, but mathematically we can describe them all — as they all mean the same thing — as a superset. There are red apples, red cars and other red things, and the abstract notion of redness can be represented as a set that includes them all. Of course, nothing stops us from having another set of apples of all colors, representing the essence of “appleness”, or a set of all small things etc. An abstraction abstracts many things — called instances of the abstraction — and a thing can be abstracted by many abstractions. A map — which is an abstraction of a territory — corresponds to the large set of all possible territories that the map may describe; i.e., they may differ in many ways, but not in anything that is represented in the map. There are even sub-abstractions — red apples, or more detailed maps — that have their own instances (red apples of all sizes, a subset of the original set of territories) and their own sub-abstractions (small red apples, and even more detailed maps). A sub-abstraction, as well as an instance, is always more specific, provides more details, than its abstraction.

When we talk of algorithms and programs, we call an instance of an abstraction an implementation. We will also use another name that is common in program analysis: refinement. As we’ve seen in part 3, TLA makes no distinction between an algorithm and an algorithm property (i.e. a set of algorithms), so we will use the terms implementation and refinement interchangeably.

An algorithm, algorithm property, or any discrete system $A$, is an implementation or a refinement of a system $B$ iff the set of behaviors of $A$ is a subset of that of $B$. In which case we say that $B$ is an abstraction of $A$; sometimes people also call $B$ a specification of $A$, but I will not use that word as it may be confusing.

The abstraction/refinement relation in TLA defines a partial order on the universe of behaviors. Indeed that universe forms a bounded lattice, with refinement being set inclusion. Syntactically, the bottom element is , top is , and the join and meet are just $\lor$ and $\land$ respectively. In short, it forms a boolean algebra10.

The refinement relation, $A$ implements $B$ (or $A$’s set of behaviors is a subset of $B$’s) is represented by simple logical implication, $\implies$. Even if little in the previous paragraph made sense to you, you can appreciate that $\implies$ is a “less than” order relation in boolean algebra: look at its truth table with 0 as false and 1 as true, and see that it is the same as $\leq$.

For example, a web service that emits values of type int can be represented as $x \in Int \land \Box[x’ \in Int]_x$ (or just $\Box(x \in Int)$), which then forms an abstraction for a service that emits incrementing integer values. Indeed:

And, as we’ve seen in part 3, a clock that displays both the hour and the minute is a refinement of a clock that just displays the hour.

As we’ve done with equivalence, we can say that an algorithm’s behavior has a visible part and a hidden part, and that algorithm $A$ is a refinement of $B$ if the visible behaviors of $A$ form a subset of those of $B$. Hiding detail — with the temporal existential quantifier — is quite a general form of abstraction.

If the algorithms/systems are represented by formulas $F$ and $G$, in TLA the abstraction/refinement relation becomes:

where $w$ is a tuple of those variables in $F$ we wish to hide, and $v$ is a tuple of the variables of $G$ we want to hide. As with any existential quantification in a logic that permits free variables, the statement above is logically equivalent to:

In general $(\EE v : \Psi) \implies \Phi$ is eqivalent to $\Psi \implies \Phi$, provided that the variables $v$ do not appear free in $\Phi$. Intuitively, if the existence alone of appropriate values is enough to imply the right-hand side then so does every concrete satisfying assignment; e.g., the proposition $(\E x : x > 0 \land y \geq x) \implies y > 0$ is equivalent to the proposition $x > 0 \land y \geq x \implies y \geq 0$. Using the same ordinary logic analogy, it is easy to see that the converse does not hold, i.e., $\Psi \implies (\EE w : \Phi)$ is not equivalent to $\Psi \implies \Phi$.

To give the above formula a more concrete meaning, recall our discussion of model-based specification. We state that a specification $MyStack$ implements a stack, as specified by our model, with the proposition:

This is obviously something we are interested in verifying, but how do we do that? Unfortunately, the TLC model checker cannot check formulas containing temporal quantification. The problem is, in general, co-NP-hard in the number of states, which is big to begin with, although an algorithm that is efficient in practice may be found. Proving it directly is also hard.

But recall that in the previous section we saw how an equivalence with hidden variables (extensional equivalence of factorial algorithms) could also be expressed using substitution. The model checker can check formulas of the form $F \implies \overline{G}$, and as both $F$ and $G$ are presumably written as state machines in normal form and specify successor state step-by-step, proving such a mapping is not as hard as proving implication that uses states hidden with a temporal quantifier. In 1988, Martín Abadi and Leslie Lamport published an important result about that very idea in their paper, The Existence of Refinement Mappings (1991).

A function from $F$’s states to $G$’s states is called a refinement mapping. Unfortunately, even if $F$ implements $G$ in the sense above — namely that the visible portion of $F$’s behaviors is a subset of $G$’s — such a mapping does not always exist. Consider some examples:

  • Algorithm $B$ computes a running average of input numbers that flow into the system one by one. $B$ does so by maintaining a (hidden, internal) growing sequence of all inputs and emitting their average. But an algorithm $A$ exhibiting the same visible behavior — emitting a running average — is not required to maintain such a list, and, instead, can make do with (internally) keeping a running sum and count. There is no mapping from $A$’s state of a sum and a count to $B$’s state of a list of all previously observed numbers, as the latter requires more information.
  • Algorithm $B$ emits a sequence of ten nondeterministically chosen numbers. $B$ does so by (internally) nondeterministically choosing a sequence of ten numbers in the initial state, and then emitting them one by one. Algorithm $A$ could exhibit the same visible behavior by nondeterministically choosing a number at each state. There is no mapping from the state of $A$, that only has the currently emitted value, to that of $B$, containing all future outputs.
  • Algorithm $B$ specifies an hour clock, but internally keeps track of minutes, too, while algorithm $A$ only emits the time in hours. Both have the same visible external behavior, but $B$ “runs slower”. There is no function mapping $A$’s state — which changes only once an hour — to the multiple states representing that same hour in $B$.

Abadi and Lamport proved that if $A$ is machine-closed, $B$ has finite invisible nondeterminism (both discussed in part 3), plus another condition on $B$ that we didn’t cover, then there exists a refinement mapping from $A$ to $B$, or from a modified version of $A$ that contains auxiliary variables — either history or prophecy variables — to $B$, where a history variable records the history of the state (or a part of the state) and a prophecy variable nondeterministically chooses the future states. In other words, the theorem is that the addition of history and/or prophecy variables is enough to guarantee a refinement mapping for any abstraction/refinement.

Consider the following algorithm that emits ten nondeterministically chosen numbers:

Here is the algorithm with an added history variable, $h$:

And here it is with an added prophecy variable, $p$:

But how do we prove that adding the auxiliary variables does not restrict $F$’s behaviors, and that our refinement mapping is sound? We need to prove that which brings us back to the original problem.

The situation with history variables is simple because if is $F$ with a history variable added, and we define

(where $vars$ is the tuple of variables in $F$) then:

But unfortunately, this simple decomposition cannot be done for prophecy variables. Lamport and Stephan Merz recently published Auxiliary Variables in TLA+ which gives syntactic rules for using prophecy variables that are proven to be sound (i.e., don’t restrict behaviors).

Refinement mapping is a very powerful verification technique not only in theory, but also in practice. It is sometimes beneficial to specify an algorithm at several levels of detail, say, $A$, $B$ and $C$, with $A$ being the most detailed (refined) and $C$ the most abstract, and then verify that for some property $P$, $C \implies P$. Because $A \implies B \implies C$, we learn that $A \implies P$, and $A$ is specified at a level that is close enough to code to be readily implemented.

The Meaning of Abstraction

The equivalences and abstraction/refinement relations we’ve seen mean that there is rarely a need to compose specifications. A component can more easily be specified in isolation, with simple abstractions standing in for its cooperator components.

The definition of a refinement/abstraction relation in TLA+, $B \implies A$, is that the behaviors of $B$ are all behaviors of $A$. One way to look at it is that algorithm $B$ implements abstraction $A$ if all of $B$’s behaviors are allowed by $A$ or, conversely, that $B$ never exhibits a behavior that is forbidden by $A$. Looking at it from another perspective, abstraction — like input and concurrency — is just nondeterminism. $A$ is a nondeterministic descriptions of all algorithms that implement it.

An abstraction, then, is a description with less detail, or more nondeterminism, but one that fully delineates the possible behaviors of implementations. It’s a conservative approximation, one that helps reasoning by being on the one hand simpler than its concrete instances, and on the other, sound with respect to them, meaning a property that’s true for the abstraction must be true for all implementations.

It is therefore clear that we do not want to limit the how far our abstractions stray from physical realizability, as the more abstract they become, the easier it is to prove things about them. On the other hand, we want flexibility to lower our abstractions arbitrarily, to make sure that what we want to verify is still true for them. We want our abstractions just concrete enough for the properties we’re interested in verifying are true, but not lower. The ability to ignore detail flexibly and selectively — which is at the very core of TLA+’s design — is of the utmost important in formal methods. The cost of verifying the correctness of a program — be it with automatic tools or manually — grows very rapidly with the addition of detail.

Even if we set aside concerns of formal verification, formal methods may assist greatly in understanding why an algorithm works by providing insight. Formal proof, as a side effect of perfect verification, often provides great insight. It is very hard to prove that something works without getting to the bottom of why it works. But formal proofs are often prohibitively expensive. Constructing a more abstract specification and a refinement mapping provides a great deal of insight more cheaply than formal proof.

In the next section I’ll mention an even more general description of abstraction than TLA’s, employing something called Galois connections. It is perhaps the most general concept, one which does not rely on any specific description of computation and encompasses all other particular notions of abstraction, but the general principle holds: abstraction is nondeterminism.

Once you understand this, it becomes easy to see that the very application of a (sound) mathematical description of reality is an abstraction — in this very sense — of reality, or that reality is a refinement of the description. The question of how closely should the mathematical framework match reality becomes merely one of asking at which level of abstraction do we wish to think.

Other Notions of Equivalence and Order

There are other notions of equivalence and order (abstraction/refinement) on programs and algorithms in academic literature. I will mention a few of them, and show their relationship to TLA.

Process Algebras

Around the same time Lamport and others were working on specifying systems using temporal logic, others were working on an algebraic, programming-language-like approaches called process calculi or process algebras, most notable among them were Robin Milner and Tony Hoare.

In a programming language one usually doesn’t mention the program’s state, only the actions it performs. If we want to reason about algorithms in a pseudo-programming language like a process algebra, we also need notions of equivalence and order, but things may get complicated when we don’t wish to mention state. Comparing traces of events alone (I will not call them actions to not confuse with TLA actions) for equality or inclusion yields an imprecise equivalence. For example, the following two state machines share the same set of action traces, $\set{a \to b, a \to c}$, and yet are clearly very different:

%3 A B A->B a C B->C b D B->D c
%3 A B A->B a C A->C a E B->E b D C->D c

The order relation suggested by Milner to represent implementation is called simulation. Given two event-labeled state machines, $P$ and $Q$, a simulation over $\seq{P, Q}$ is a relation $R \subseteq P \times Q$, such that for every $\seq{p, q} \in R$, if $p$ is an initial state of $P$ then $q$ is an initial state of $Q$, and if $p \xrightarrow{a} p’$ then $q \xrightarrow{a} q’$ and $\seq{p’, q’} \in R$. If such a relation exists, we say that $Q$ simulates $P$, or that $P$ is similar to $Q$. Intuitively, we can think of simulation as a game: $Q$ simulates $P$ if for every move taken by $P$, $Q$ can match it with the same move.

Simulation can be used to define an equivalence called bisimulation: if both $R$ and $R^{-1}$ (the relation with its pairs reversed) are simulations, then $R$ is said to be a bisimulation. If there exists a bisimulation between systems $P$ and $Q$, they are said to be bisimilar. Using our game analogy, $P$ and $Q$ are bisimilar if at every turn, no matter which goes first, the other can match the move. Note that if $P$ simulates $Q$ and $Q$ simulates $P$, then they are not necessarily bisimilar, as bisimilarity requires that the same relation and its inverse are simulations. This is why simulation is a preorder rather than a partial order, as it is not antisymmetric — if two processes simulate one another, they are not necessarily equivalent.

Milner’s simulation relation is sometimes also called forward simulation, and a dual notion can be defined as follows: a relation $R \subseteq P \times Q$ is a backward simulation if for every $\seq{p’, q’} \in R$, if $p \xrightarrow{a} p’$ then there exists a $q$ such that $q \xrightarrow{a} q’$ and $\seq{p, q} \in R$.

Consider the following three state machines:

%3 M1 A A B B A->B a C C A->C b D D B->D c C->D d
%3 M2 D1 D 1 D2 D 2 A A B B A->B a C C A->C b B->D2 c C->D1 d
%3 M3 A1 A 1 B B A1->B a A2 A 2 C C A2->C b D D B->D c C->D d

M1 simulates both M2 and M3. M2 forward-simulates M1 (in fact, they are bisimilar) but does not backward-simulate it, and M3 does not forward-simulate M1 but it does backward-simulates it.

How are these relations related to those of TLA? $M2 \implies \overline{M1}$ and $M3 \implies \overline{M1}$ under the refinement mappings that map states $D_1$ and $D_2$ to $D$ and $A_1$ and $A_2$ to $A$, respectively. But what about the other direction? Note that M2 is exactly M1 with a history variable added (a single state in M1 corresponds to multiple states in M2 iff it can have different histories), so $M1_h \implies M2$ (in fact, they are equivalent), and M3 is exactly M1 with a prophecy variable added (a single state in M1 corresponds to multiple states in M3 iff it can have different futures), so $M1_p \implies M3$. Forward simulation, then, corresponds to the addition of a history variable (if necessary), and backward simulation corresponds to the addition of a prophecy variable.

TLA manages to be simpler than simulation relations — as those require some structural analysis of the processes rather than just defining relation on their traces — because it mentions state explicitly.

Categorical Algorithms and Refinement Calculus

Another approach for defining equivalence relations over algorithms can be found in the 2010 paper, Towards a Definition of an Algorithm, by Noson Yanofsky. The paper is quite readable, but I’ll quickly go over the salient details. Yanofsky defines an algorithm to be an equivalence class on programs. In the paper, he uses a pseudo-programming language that is made of a tree whose nodes are the primitive operations in the definition of primitive recursive functions, and calls those trees programs. Those programs compute (primitive-recursive) functions. He then defines a multigraph of objects that are similar to states, except they don’t represent intermediate stages of computation, but inputs and outputs. The programs are the edges of the graph, connecting inputs to outputs.

He then notes that the multigraph is transitive: if there is a program $f$ connecting node $a$ to node $b$, and program $g$ from node $b$ to node $c$, then there must be a program $f \circ g$ from $a$ to $c$. However, this multigraph is not a category because it is not associative: the two programs $h \circ (g \circ f)$ and $(h \circ g) \circ f$ are represented by edges connecting the same node, but as the two programs (i.e., trees) are not the same, the two edges are distinct. He then goes on to define a series of equivalence relations on programs (trees) which he says preserve their essence, and shows that if we take a quotient of the program graph with respect to the equivalence relations, we get a category of algorithms.

Yanofsky’s approach only covers equivalence, not order, but the three concentric equivalence relations he defines — on programs, algorithms and functions — essentially create an order consisting of exactly three tiers. Programs refine algorithms, and algorithms refine functions, but there is no general order relation that puts all three levels on some continuous spectrum, as seen in this image from the paper:

Programs, Algorithms and Functions

The reason Yanofsky needs to define programs structurally (or intensionally), and only then algorithms as a quotient of programs with respect to structural equivalence, rather than extensionally, by how they behave, is due to the original sin of those approaches that view programs (or algorithms) as denoting functions rather than discrete dynamical processes11.

A similarly structural approach, but one that does create an order relation on programs, is Ralph-Johan Back’s refinement calculus and Joakim von Wright’s refinement algebra. The idea behind refinement calculus is defining some transformations that make a program more refined or abstract. Refinement algebra is surprisingly similar to the ideas we’ve seen in TLA, only expressed as an algebra rather than a logic, and it does not extend to concurrent algorithms. Back and von Wright wrote a book together about their approach that is freely available online. Refinement algebra/calculus is similar in some ways to TLA (sometimes very similar), but is less universal and more complicated; on the other hand, it can — at least in principle — be applied directly to programs written in a programming language. This is another example of how much expressive power a simple formalism like TLA can gain by giving up on attempting to directly talk about programming languages.


Typed programming languages have not one but two notions of implementation order: subtypes and type inhabitants (instances). A type is an abstraction of both its subtypes as well as its concrete instances; conversely a subtype is a refinement of its supertypes, and a value is a refinement of its type. While we can create any hierarchy of abstraction relations using subtypes, we cannot directly show that one program — a type inhabitant — implements another, even in formalisms with dependent types; a program is a leaf in the order relation12. If our types are rich enough, we can simulate such a relation by writing our algorithm at the type level, only then it is no longer a program but a type. Equivalence in those formalisms is also complicated.

TLA makes no such distinctions between types (properties) and their inhabitants (algorithms). Both are just formulas — or, semantically, sets of behaviors — and are part of the same abstraction/refinement partial lattice, that of implication (or set inclusion). There are just different levels of detail.

The semantics of TLA is too simple to allow making any useful distinction between denotation13 and operation — an interesting distinction in the theory of programming languages — but as we’ve seen, it is a simple matter to hide any detail we deem irrelevant for the sake of a particular abstraction/refinement relation and choose the precise level of detail for the sake of comparisons, one of which may correspond exactly to the level of detail provided by your programming language of choice, be it SQL, Java, assembly, or digital circuits.

Of course, types are normally used for the more difficult task of specifying an actual runnable program, and they can potentially be used for some advanced techniques such as proof-carrying code. The distinction between types and their inhabitant programs is necessitated by the need to ensure programs are sufficiently detailed to be compilable to efficient executables. Types also have benefits beyond specification (for which, I believe, other approaches such as contracts are more appropriate) such as assisting with tooling, maintenance and code organization.

Galois Connections

All specific definitions of abstraction/implementations we’ve seen — TLA refinements, process simulations, subtyping and type inhabitants — are encompassed by the very general, very abstract concept of a Galois connection. A clear introduction to abstraction relations as Galois connections can be found in Binary Relations for Abstraction and Refinement by David A. Schmidt. The discussion in the paper (except for the last part, which specifically deals with temporal logic) is completely detached from any particular formalism, and helps to think of programs abstractly and mathematically, detached from their expression in any particular formalism. Another introduction to Galois connections and their power, although less in the context of algorithms, can be found in Peter Smith’s, The Galois Connection between Syntax and Semantics. I will not attempt to discuss Galois connections in any depth, just quickly define them and show their relevance to the subject of abstraction and refinement in the context of TLA.

If $\mathcal{P} = \seq{P, \preceq}$ and $\mathcal{Q} = \seq{Q, \sqsubseteq}$ are posets (partially ordered sets), and , , are functions, then the pair is a Galois connection if for all : iff . is then called the lower, or left, adjoint (of the connection), and the upper, or right, adjoint. and uniquely determine each other; setting one forces the other.

In the context of TLA, if we have a refinement mapping such that $F \implies \overline{G}$, we can call that mapping, when seen as a function on complete behaviors, $f$. We then we take $P$ and $Q$ to be the powersets14 of the behaviors of $F$ and $G$ respectively, and define as a function on sets, such that it applies $f$ element-wise to every behavior in the a set. is then a function from $P$ to $Q$, mapping the elements of $P$ — subsets of the behaviors of $F$ — to elements of $Q$ — subsets of $G$’s behaviors. We define to be the preimage of $f$. We get a Galois connection if we take the partial orders on both posets as just set inclusion.

Galois connections serve as the basis for abstract interpretation, a central concept in the theory of program analysis, developed by Patrick and Radhia Cousot in the 1970s. In TLA terms, abstract interpretation is a process by which the more abstract specification, $G$, is automatically generated from the more concrete one, $F$. Abstract interpretation places different kinds of programs semantics on a spectrum of abstraction, from behaviors to functional denotational semantics. Many ideas in program analysis, including type checking, model checking and deductive proofs can all be viewed as forms of abstract interpretation. In TLA+ this idea is made explicit and formally manipulable, though less general.

TLA manages to express equivalence and order extensionally — based on what the program does, not how it is constructed — rather than structurally, like the process calculi, refinement calculus and Yanofsky’s categories. And yet, unlike types, TLA is able to express arbitrary level of detail because of two features: it treats algorithms as denoting behaviors rather than functions, and it explicitly models state. The structural refinement calculus/algebra is similar in its expressiveness as far as sequential algorithms are concerned.

Parameterization vs. Nondeterminism

In part 1 I mentioned a claim made that TLA+ is not higher-order, suggesting it cannot easily describe higher-order algorithms. While I think that by this point it should be clear how universal TLA+ is, the point is still worth revisiting, if only to discuss what Lamport calls the “Whorfian syndrome” — the confusion of language with reality.

Whether a program is “higher-orpder” or not is not a property of what it does, but how it is described; in other words, higher-orderness is a property of a formalism, not of reality. Just as continuous dynamical systems that are themselves modified by other dynamical systems are usually not expressed as higher-order ODEs but as first-order ODEs of a higher state dimension, so too whether or not algorithms that are modified by other algorithms are expressed as higher-order constructs depends on the formalism, and in TLA, every composition of algorithms can be expressed as a conjunction of formulas (and adding variables, possibly hidden, to describe the state of each component; i.e., increasing the dimension of the state).

Let’s consider higher-order functions, in the functional programming sense. A higher-order function is a program (subroutine) that is parameterized by another program. We’ve seen how that could be expressed using a conjunction of formulas, but before we get back to that we’ll look at abstracting the “higher-order” parameter as a function — which is justified by an abstraction relation.

Here is one formulation of the flatmap operation familiar to functional programmers. Normally we wouldn’t specify such a simple operation as an algorithm but rather abstract it as a function or operator, but for the sake of discussion:

(we could have made $f$ a parameter of the module itself, with $\CONSTANT f$ and $\ASSUME f \in [S \to Seq(T)]$).

The following theorem expresses a very simple type property:

It is expressed analogously to how it would be expressed in a typed programming language: with a universal quantifier over the inhabitants of the input types. However, we could express the algorithm and theorem equivalently as follows:

You can think of the former formulation as a description of a single, parameterized, computation, and of the latter as a description of all possible computations or as a nondeterministic algorithm, but the theorem is equivalent in both formulations. Nondeterminism is just a different expression of parameterization.

From a formal logic perspective, the difference is one between a formalism that requires all variables to be bound (by quantifiers or lambda expressions) and one that allows free variables. For example, in the latter, instead of writing $\A x \in Int : x > 0 \implies -x < 0$ (all variables are bound) you can write $x \in Int \land x > 0 \implies -x < 0$, and instead of $\A y \in Real : (\E x \in Real : x > 0 \land y \geq x) \implies y > 0$, you can write $x \in Real \land y \in Real \land x > 0 \land y \geq x \implies y > 0$ .

Programming languages are formalisms that generally require all variables to be bound. This requires higher-order constructs where a logic allowing free variables doesn’t. Free variables are just how nondeterminism is expressed syntactically.

But you may object and say that my trick of abstracting the subroutine parameter as a function — while justified as a valid abstraction — is sidestepping the issue. In TLA+, algorithms (such as subroutines) are temporal formulas, whereas functions are simple values no different from integers. While abstraction allows us to evade the issue of “higher-order” algorithms, is the theory able to tackle them directly?

The answer is yes. Suppose we have the specification $Spec(G) \defeq F \land G$, which is parameterized by the algorithm specification $G$ (e.g. $G$ can be a subroutine like we saw above, or a concurrent process that interacts with $F$). If in a typed language we wanted to prove that $Spec(G : Q) : P$, or that for $G$ of type $Q$, $Spec$ is of type $P$, then in a logic like TLA we would think to express the same proposition like so:

However, the above is not a TLA formula: the data logic of TLA+ allows us to quantify over constants (i.e. non-temporal variables), and TLA additionally allows us to quantify over temporal variables, but $G$ is neither — it is a temporal formula. The model of a TLA formula is a set of behaviors, but an algorithm (or a property) is not a TLA object and we cannot quantify over it (nor have a free variable that denotes a set of behaviors). The formula above is a second-order proposition, as it universally quantifies over $G$, an algorithm (a set of behaviors).

However, as we’ve seen, while all TLA formulas are first-order, the TLA+ proof language does allow us to write second-order propositions that are not themselves formulas. We can therefore write the proposition like so:

But this is both overkill and cannot be checked by the TLC model checker. More importantly, the proposition is a very mundane, reasonable one about algorithm correctness, and it stands to reason that a good formalism would allow us to directly specify it as a formula.

As it turns out, this is easy. Because tautologically $\A A : (A \implies B) \implies (A \implies C)$ is equivalent to $B \implies C$15, the above is equivalent to the much simpler, first-order proposition:

which we can write as a TLA formula. Moreover, as $F$ is presumably written in normal form and $Q$ is likely a simple safety property, i.e., an invariant, $F \land Q$ could be easily rewritten as a single, normal form formula, which would allow us to check that it satisfies $P$ using the model checker.

We are saved, again, by nondeterminism. TLA does not distinguish between algorithms and properties of algorithms (or types) — they’re both just formulas, or semantically, sets of behaviors. We don’t need to quantify over all algorithms $G$ that satisfy the property $Q$; we just write $Q$.

Various Cool Stuff

Before concluding this treatment of the theory and design of TLA+, I want showcase some uses of TLA+ that you’re unlikely to encounter when modeling “ordinary” software but are of great importance for special classes of software, as they demonstrates the power of the formalism. Rather than a thorough treatment, I will just give a taste of what’s possible.

Real Time

TLA+ can be used to model all kinds of systems, among them realtime systems. The following example is taken from Specifying Systems.

Let’s revisit our specification of the hour clock from part 3:

While this describes the behavior of an hour clock at the level of detail that may be of interest to some, it says nothing about the actual time duration between ticks, which may be of great interest to others. To model real time, we do what scientists normally do and introduce a real-valued variable, $t$ to represent it. The variable $t$ (in second units) will change between ticks of the clock, but not during the tick — we’ll assume that the tick is instantaneous (this is just a modeling choice). If we wanted to implement our clock in a realtime system, we would want to specify that it ticks approximately every hour, allowing for some margin of error, which we’ll call $k$. We’ll introduce a variable, $elapsed$ to represent the elapsed time between now and the last tick:

We can model the requirement that no more than $3600 + k$ elapse between ticks like so:

We also want to specify that at least $3600 - k$ seconds elapse between consecutive ticks:

(remember, an action can only follow $\Box$ within $[]_e$ to ensure invariance under stuttering; in this case $e$ is just $h$ because we don’t care about actions that don’t change the value of $h$).

We can now write the specification for our realtime hour clock:

Finally, we need to specify how $t$ can change. Time can move forward by any amount as long as $h$ doesn’t change (the tick is instantaneous):

The safety part of the time specification would then be $(t \in Real) \land \Box[TNext]_t$, but what about liveness? Requiring that $TNext$ would occur infinitely often, thus ensuring $t$ increases indefinitely, is not enough. Because $t$ is a real-valued variable, even though $TNext$ requires that $h$ is unchanged and therefore $t$ cannot grow by more than $3600 + k$ at a time, $TNext$ can still occur infinitely often, and $t$ grow indefinitely while never reaching $3600 + k$ and the clock ticking even once. This is what’s known as a Zeno behavior, after Zeno’s paradox of Achilles and the tortoise. To rule it out, and to ensure $t$ grows without bound, we’ll use the following fairness condition

which is equivalent to $\A r \in Real : \Diamond(t’ > r)$ but makes it clear that this is indeed a fairness property. We can now finish the time-flow part of the specification:

The complete specification will then be:

It is also true that $RTHourClock \implies HourClock$; our realtime clock implements the less detailed $HourClock$ specification, which serves as an abstraction of the realtime clock.

Specifying Systems shows how we can define just two general realtime operators that can then be conjoined with an untimed specifications to create a realtime specification. The two operators are responsible for the two roles we’ve seen: specifying that actions happen within a certain time threshold from a deadline constraint, and specifying that time grows unbound. For even more information on using TLA+ for specifying realtime systems, see Lamport and Abadi’s 1993 paper, An Old-Fashioned Recipe for Real Time, or Lamport’s 2005 paper, Real Time is Really Simple.

Cyber-Physical Systems

Cyber-physical systems, also called hybrid systems, are discrete systems that interact with the physical world through sensors and/or actuators. While the digital system is specified as a (usually realtime) algorithm, the physical world it interacts with is best modeled as a continuous dynamical system.

The idea of how to do this is very similar to the one used in our realtime specification, only instead of a single continuous variable representing time, we have other variables representing continuous quantities that continuously change over time. The following example is also taken from Specifying Systems:

Consider a cyber-physical system where a discrete switch affects the behavior of some object moving along a single dimension. We will denote the object’s position as $x$, and its motion is described by the equation:

where $c$ and $k$ are some constants, $f$ is some continuous function, $t$ is time, and $\dot x$ and $\ddot x$ are the first and second derivatives, respectively, of $x$ with respect to $t$. In TLA+, we can write it as the equation $D[t, x, \dot x, \ddot x] = 0$ where $D$ is:

We then define an operator, $Integrate(t, D, t0, \seq{x_0, \ldots, x_n-1})$ whose value is the value at time $t$ of the tuple $\seq{x, \dot x, \ldots, \overset{\scriptscriptstyle (n-1)}{x}}$, the time-derivatives of $x$ that form the solution to the ordinary differential equation $D[t, x, \dot x, \ldots,\overset{\scriptscriptstyle (n-1)}{x}] = 0$, and $\seq{x_0, \ldots, x_n-1}$ are the values of those derivatives at time $t0$. You can find the definition of $Integrate$ on page 178 of Specifying Systems. The partial specification (showing only the continuous part) of our hybrid system, where $v$ stands for the particle velocity $\dot{x}$, is:

For an example of a hybrid specification with a correctness proof, see Hybrid Systems in TLA+.

A simplified version of TLA that is not stuttering-invariant, basically rTLA from part 3, was implemented in Coq as part of the VeriDrone project to reason about quadcopter controllers precisely in this manner.


A metabolic network can also be easily modeled as a nondeterministic state machine. For example:

We can then state propositions about the system, such as that it has three static attractors:

Of course, we can make much more complex propositions, such as that the system has no more than two periodical attractors of period less than 100.

See A Computational Framework For Complex Diseases for a brief mention of TLA in this context.


The goal of this series was to show why a mathematical specification of software systems, rather than a programming language-based one, is often a good choice for reasoning about systems as it is both very powerful and relatively simple, how TLA+ allows such mathematical reasoning, and how this form of reasoning provides insight into similarities between concepts and algorithms that are often obscured in programming languages.

However, I think that the choice of a general, ordinary math formalism for reasoning about algorithms is not enough in itself to rid us of the Whorfian syndrome, as that syndrome has two components. The first is due to the choice of formalisms, like those based on programming languages, that are intentionally restrictive, either for aesthetic or technical reasons, or both. This effect can be seen in a blog post by an accomplished engineer:

Over the years I’ve seen a few people argue that there’s something fundamentally wrong with the notion of the algorithm because it doesn’t apply to the kind of open-ended loop we see in operating systems and interactive applications. Some have even gone further to suggest that somehow mathematics and computer science are fundamentally different because mathematics can’t seek to describe these kinds of open-ended phenomena… [N]ot only are there nice ways to look at open-ended computations, but from a mathematical perspective they are precisely dual, in the category theoretical sense, to terminating computations. It may be true that mathematicians sometimes spend more time with things, and computer scientists with cothings. But this really isn’t such a big difference and the same language can be used to talk about both… [I realised] there was this entire co-universe out there…

By “co-universe”, he’s referring to pure-FP’s way of modeling infinite data streams and infinite computations. Clearly, “this entire co-universe” is not really “out there”, but very much deep inside his chosen formalism, one that chooses (for historical and aesthetic reasons) computable functions as the fundamental building blocks of computation. The notion of co-data or co-computation, doesn’t really provide any fundamental insight into nonterminating computations, but only into their description in that specific formalism. Bartosz Milewski — a physicist, no less! — also repeats this sentiment in his lectures on category theory in the context of functional programming, saying, “time is really hard to describe in mathematics”. But mathematics has no difficulty whatsoever in expressing time or “open-ended” discrete systems any more that it finds it difficult to describe open-ended continuous systems. As should now be clear, computations of all kinds are easily and elegantly, expressed in mathematics as infinite sequences of states. The difficulty arrises only when using specific formalisms that explicitly choose to disregard time. Whether or not that is a useful simplification is a separate discussion, but the difficulty is not with mathematics.

Formalisms are important, but their importance cannot be properly appreciated if one does not distinguish between insights a formalism provides into its own operation and the far more valuable insights a formalism offers that help discover features of reality. An ideological, almost religious, adherence to specific formalisms, seems to have been common among some logicians almost hundred years ago, long before it became common among programmers; that ideological perspective was no more conducive to understanding mathematics back then as it is for understanding computation now16. In this sense, the Whorfian syndrome is a form of an ideological conviction that blinds more than it enlightens.

But the Whorfian syndrome has another component, one that cannot be avoided even by choosing a very general, relatively simple, mathematical framework such as TLA for reasoning about computation. All formalisms are merely descriptions of computations. Computation is neither a “thing” nor a “cothing”; it isn’t an infinite sequence of states, either, but a discrete dynamical system that we can choose to describe and denote in many different formalisms, picking the one that is most convenient for the task at hand. Whichever language we choose to describe reality, we run the risk of confusing the description with reality. Understanding abstraction, and how it also relates a description of reality to reality helps, but it also takes practice.

This is a post to the TLA+ mailing list, where Lamport discusses this more general Whorfian syndrome in TLA+

You originally wrote:

My conundrum is linked to the fact that if the action is taken the primed variable equations have immediate effect, that is the state transition is instantaneous.

An action is a formula. It makes no sense to say that a formula has immediate effect. Does the formula 2+3=5 have immediate effect? That no one else has pointed this out shows that many people don’t appreciate the distinction between math and reality.

I have a digital clock on my desk. If I cover part of the display, the clock behaves as follows. It shows a pattern of light and dark regions. For about an hour, that pattern remains quite stable, with barely perceptible changes in light intensity. Then, for a fraction of a second, the pattern fluctuates wildly. It then reaches a different stable pattern. About an hour later, there is another brief period of wild fluctuation followed by another stable pattern. And so on. My brain has learned to recognize the stable patterns as numbers and to ignore the brief fluctuations. When I call the device on my desk a digital clock, I mean that I can think of it as showing the number 22 for about an hour, then the number 23 for about an hour, then the number 0, etc. This allows me to describe a behavior of the clock as a sequence of states — for example, a sequence starting with a state having x=22, followed by a state having x=23, etc. I can describe the set of all such state sequences by a TLA+ formula.

That same TLA+ formula can be interpreted to describe other things. For example, I called the elements of the sequence “states”. I could just as well have called them “events” and have interpreted the sequence as describing a system that releases a sequence of hot air balloons with numbers painted on them. The “event” with x=22 describes the release of a balloon with the number 22 on it. I could also interpret the state sequence as describing a clock falling into a black hole in an imaginary universe, where the clock speeds up as it falls, displaying an infinite sequence of numbers in the few seconds that it takes to reach the event horizon (from the point of view of an external observer). These are perfectly valid interpretations of the behaviors described by the TLA+ formula that I use to describe my clock. However, they are perfectly irrelevant to my clock specification. For that specification, the only important question is: does it provide a useful description of my clock? And the answer to that question depends on what I want to use the description for. It is useful only for applications in which the fact that the states change about once per hour is unimportant…

When people get used to describing systems mathematically in a certain way, they tend to confuse the mathematical description with the system.

I think that physicists and engineers of physical systems are very adept at describing reality with mathematics, and understanding the difference between them. Mathematicians don’t directly deal with reality, and programmers are used to thinking that their formal descriptions create reality. Even though TLA+ makes the very notion of levels of description through abstraction and refinement and hiding of variables, it still takes practice until one is comfortable with modeling reality with mathematics. But I can think of no better tool than TLA+ to afford programmers the practice required in mastering this important skill.

Discuss on Reddit

  1. It is said that math is narrow and deep, while formal verification is wide and shallow. 

  2. The model checker TLC does break the encapsulation, and $v.x$ would mean 4.5. However, that’s an implementation detail of the checker. The encoding is hidden from the proof system. 

  3. Unsurprisingly, for verification techniques like property-based randomized testing, model-based specifications are the most effective

  4. Because we’ve normally seen implication used in propositions, I will repeat the distinction I mentioned in part 2. By $E \implies M$ we do not mean the proposition $\vdash E \implies M$, i.e. that the environment implements the machine specification, but that the specification of the contract is itself is $E \implies M$. To explain the difference consider the logical formula over the integers $(x > 0) \implies (x > 2)$. This formula is not valid, i.e. $\nvdash (x > 0) \implies (x > 2)$ . But it does describe — or, its model is — all numbers that are either negative or greater than 2, or formally $\vdash ((x > 0) \implies (x > 2)) \implies (x < 0 \lor x > 2)$. 

  5. And long before that. In 1945 Alan Turing described how programs in his proposed ACE computer will be composed of subroutines, employing a stack and push/pop operations (which he called bury/unbury), before the first computer was ever built. Sadly, his design was never realized. 

  6. There have been two academic projects that do just that for C programs and for Java bytecode, although not by employing such “universal” — and inefficient — composition, but rather more direct means of translating expression. In fact, PlusCal[^pcal] can be thought of as a simple programming language that is easily translated to TLA. See Lamport’s, The PlusCal Algorithm Language, 2009, and the extended version of PlusCal described in Sabina Akhtar, Stephan Merz, Martin Quinson, A High-Level Language for Modeling Algorithms and their Properties, 2010. 

  7. Of course, if the language is not deterministic, some comoposition constructs may decrease determinism rather than increase it, and so connectives other than conjunction would be used to model the composition in TLA+

  8. With the important caveat that a machine-closed formula may be equivalent to a non-machine-closed one, as we’ve seen in part 3. However, we should only consider machine-closed formulas to be descriptions of algorithms. 

  9. Pure functional languages treat programs as something similar to functions. Programs and subroutines in pure functional languages that don’t enforce totality are partial constructive functions, which resemble the classical notion of functions in some senses: in that they define a one-valued mapping between objects and in their extensional notion of equality. They differ from ordinary functions in that they must be computable (i.e. constructive) and may not map an object to every object in their domain type (i.e. partial). 

  10. I believe that as the universe is a proper class rather than a formal set, it does not precisely form a boolean algebra but a relatively complemented lattice. But this is an unimportant detail. 

  11. It sometimes seems like endless ideas employing novel mathematics are employed just to get around the problems imposed by this original choice that predates even the precise descriptions of computation in 1936, and appears to have no justification today other than the aesthetic desire to unify constructive mathematics and programming. Then again, great technological breakthroughs have been known to occasionally come of seemingly impractical research. 

  12. I’ve been told that in the computational type theory of PRL, programs can refine one another. 

  13. A good introduction to what denotation means in programming language theory can be found in Chapter 1, “Sense, Denotation and Semantics”, of Jean-Yves Girard’s Proof and Types

  14. Again, I’m informally ignoring that the sets of behaviors may not be actual sets. 

  15. That is easy to see if we recall that implication is an order relation, and $a \leq b \implies a \leq c$ iff $b \leq c$. 

  16. As Turing himself strongly believed. See Juliet Floyd, Turing on “Common Sense”