The Church Encoding of Simple Algebraic Data Types

Representing data as functions

01 Dec 2021

When first learning about the lambda calculus, students are frequently introduced to Church numerals and Church-encoded booleans. These enable the expression of simple data types as pure functions. When I was first learning about them these encodings felt arbitrary, because I did not understand that they were expressions of a more general underlying principle. This post explores this more general form: the Church encoding of algebraic data types. To keep the length of this post manageable, only non-recursive algebraic data types will be covered. Recursive data types, such as lists and the natural numbers, will be covered in a future post. This post will cover the following topics:

The expression of booleans and tuples as algebraic data types
How to encode arbitrary non-recursive algebraic data types into the lambda calculus
Some potential benefits of using Church encodings in practical TypeScript development

This post assumes basic knowledge of algebraic data types and the lambda calculus.

Examples will be given in TypeScript with a tiny sprinkling of traditional lambda calculus syntax. The abbreviation “ADT” will be used to mean “algebraic data type”, not “abstract data type”.

Booleans as algebraic data types #

In TypeScript, we use a built-in primitive type for boolean. However, booleans may be expressed as algebraic data types as well. They are one of the simplest ADTs, a choice between two variants which each carry no additional data. Here’s booleans as an ADT in TypeScript:

type ADTBoolean = True | False;

type True = { tag: "true" };
type False = { tag: "false" };

We can use our ADT booleans in the same way as built-in booleans, with just a bit of added verbosity. For example, let’s declare some basic logical functions on booleans:

function and(left: ADTBoolean, right: ADTBoolean): ADTBoolean {
    return left.tag === "true" ? right : { tag: "false" };
}

function or(left: ADTBoolean, right: ADTBoolean): ADTBoolean {
    return left.tag === "true" ? { tag: "true" } : right;
}

These show the basic form of interacting with an ADT: when we interact with the value of an ADT, we inspect its tag, and then perform branching logic based on the tag. An ADTBoolean has two variants, so when we consume it we will specify logic in two branches. In this case, these branches correspond to True and False.

We can also move between our ADT and primitive representations easily:

function adtEncodeBoolean(bool: boolean): ADTBoolean {
    return bool ? { tag: "true" } : { tag: "false" };
}

function adtDecodeBoolean(bool: ADTBoolean): boolean {
    return bool.tag === "true" ? true : false;
}

The ability to losslessly map booleans between their primitive form and their ADT form tells us that these two forms are isomorphic, and contain equivalent information. In many functional languages booleans are represented as ADTs rather than as special-cased, built-in data structure.

Church-encoded booleans #

I think of the connection between ADTs and Church encodings as the connection between what data is on the inside and how data can be used by pure functions from the outside. When we branch on the tag of an ADT we’re checking what the data is. As an alternative, let’s focus on how we use the data, by specifying the contents of the branches without inspecting the data directly. Here’s a type that will let us do just that:

type ChurchBoolean = <A>(choices: { tru: A, fals: A}) => A;

function churchEncodeBoolean(bool: ADTBoolean): ChurchBoolean {
    return bool.tag === "true" ? ({ tru, fals }) => tru
    :                            ({ tru, fals }) => fals;
}

This is a Church encoding of booleans, albeit a slightly nontraditional one. A Church boolean is a function which takes two arguments¹, one for “true”, one for “false”, and then returns the one which matches the boolean’s “internal value”. Note that this “internal value” is encapsulated: We can’t inspect a Church-encoded boolean to see whether it’s true or false. We can only apply it to arguments and see how it behaves.

Note that the generic argument A is on the right side of the = in the type definition, not on the left. This is important, but unfortunately a full explanation is beyond the scope of this post.²

Here’s some sample functions on Church booleans:

const tru: ChurchBoolean = ({ tru, fals }) => tru;
const fals: ChurchBoolean = ({ tru, fals }) => fals;

function churchAnd(
    left: ChurchBoolean,
    right: ChurchBoolean): ChurchBoolean {

    return left({
        tru: right,
        fals
    })
}

function churchOr(
    left: ChurchBoolean,
    right: ChurchBoolean): ChurchBoolean {

    return left({
      tru,
      fals: right
    });
}

// Contrived example of how these might be used
const oneIsOdd = tru;
const twoIsOdd = fals;

// Using a Church boolean to create a string
const message = churchAnd(oneIsOdd, twoIsOdd)({
  tru: "both 1 and 2 are odd numbers",
  fals: "at least one of 1 or 2 are not an odd number"
});

// We can return other types from a Church Boolean as well
const undefinedIfOneIsOdd: undefined | number = oneIsOdd({
  tru: undefined,
  fals: 1
})

// Show that we can turn our Church boolean back into an ADT
// without losing information, which establishes that they are
// equivalent.
function churchDecodeBoolean(bool: ChurchBoolean): ADTBoolean {
    return bool({
        tru: { tag: "true" },
        fals: { tag: "false" }
    });
}

First, we have the functions tru and fals, which are our basic building blocks, representing the ADT variants True and False. We use these in our implementation of churchAnd and churchOr. With these, we can create and combine encoded boolean values and branch on them by providing one value for the “true” branch and one for the “false” branch.

The use of Church booleans to compute our message value shows that Church booleans are like “an if statement as a function”. Rather than inspecting a boolean in an if statement and assigning a value in each branch of the if, we call the Church boolean as a function and pass in the value corresponding to each branch.

Finally we have churchDecodeBoolean, which, together with churchEncodeBoolean, shows that our ADTBoolean type and our Church-encoded booleans are isomorphic. Anything we can do with one, we can do with the other, because we have the ability to move between these two encodings as needed.

We can generalize the approach we took with booleans to more complex types. This gives us rule #1 for Church-encoding ADTs:

To Church-encode an algebraic data type with multiple variants, create a function which accepts an argument for each variant. The function must return the argument corresponding to the variant which the function encodes.

Caveats on my approach #

Previously I described these encodings as “nontraditional”, and here’s why. The standard Church encodings of booleans in the (untyped) lambda calculus are as follows:

true: λt.λf.t
false: λt.λf.f

In TypeScript, these would look like this:

type ChurchBoolean = <A>(t: A) => (f: A) => A;

const tru: ChurchBoolean = <A>(t: A) => (f: A) => t;
const fals: ChurchBoolean = (t: A) => (f: A) => f;

Traditionally, Church booleans are fully-curried functions, as is standard for the lambda calculus. In the direct TypeScript encoding of this style tru takes two values and returns the first, while fals takes two values and returns the second. In contrast, I’ve taken the approach of using an object to simulate named arguments. It is trivial to map between these two approaches, and they both serve the purpose of representing data types as functions. However, in practical TypeScript development the use of positional parameters is less explicit and is more likely to cause confusion and maintainability issues. As we’ll see later, these issues get worse as the data types involved get more complex. Because the “named argument” approach still captures the essential aspects of a Church encoding, I consider it valid to refer to it as such.

A lambda calculus purist might also express concern over the application of Church encodings to data types other than functions, like how we called oneIsOdd with a number and undefined as arguments. While it is true that such data types do not exist in the most basic form of the lambda calculus, it is common to enrich the lambda calculus with additional data types when putting it to practical use. This does not compromise the calculus or its patterns in any way. It can be useful to treat a functional subset of JavaScript/TypeScript as one such enriched lambda calculus, and this is the approach I take here.

With these considerations out of the way, let’s continue to Church-encoding more complex data types.

Church-encoded tuples #

ADTs consist of sum and product types. We’ve covered the simplest sum type, Booleans. Now we’ll look at one of the simplest product types, the tuple. A tuple (or 2-tuple, to be very precise) is the simultaneous combination of two component data types. Here’s what a tuple might look like as an ADT:

type Tuple<A, B> = {
    tag: 'tuple',
    first: A,
    second: B
}

The Tuple type is generic, because it can be used to package any two types, without restriction. We can, for example, create a Tuple of numbers and strings:

const pair: Tuple<number, string> = {
    tag: 'tuple',
    first: 1,
    second: 'hello world'
}

Once you have a tuple, what can you do with it? You can take the values out of it and use them to compute a new value. Here’s a type and function for doing just that:

type ChurchTuple<A, B> = <C>(fn: (first: A, second: B) => C) => C;

function churchEncodeTuple<A, B>(tuple: Tuple<A, B>): ChurchTuple<A, B> {
    return fn => fn(tuple.first, tuple.second);
}

Here’s some examples comparing the use of ADT-based and Church-encoded tuples.

// Convenience functions, for readability
function adtTuple<A, B>(first: A, second: B): Tuple<A, B> {
    return {
        tag: 'tuple',
        first,
        second
    };
}

function churchTuple<A, B>(a: A, b: B): ChurchTuple<A, B> {
    return function(fn) {
        return fn(a, b);
    }
}

const fourFiveAdt = adtTuple(4, 5);
const fourFiveChurch = churchTuple(4, 5);

const twenty = fourFiveAdt.first * fourFiveAdt.second;
const twentyAgain = fourFiveChurch((first, second) => first * second);

const fortyFive = `${fourFiveAdt.first}{fourFiveAdt.second}`;
const fortyFiveAgain = fourFiveChurch(
    (first, second) => `${first}${second}`);

We use an ADT tuple by looking inside of it and pulling its values out; we use a Church tuple by passing it a function specifying what it should “do” with its contents, and let the tuple itself compute and return the result.

As before, we can decode a Church tuple, to get back an ADT tuple:

function decodeChurchTuple<A, B>(tuple: ChurchTuple<A, B>): Tuple<A, B> {
    return tuple((first, second) => ({
        tag: 'tuple',
        first,
        second
    }));
}

This shows us that our types are equivalent. Tuples give us rule #2 for Church-encoding algebraic data types:

To Chuch-encode a product type, create a higher-order function which takes a second function as an argument. This second function should accept multiple arguments, one for each field in the product type. The higher-order function must call the provided function with the encoded arguments and return the result.

Putting the pieces together: Slaying a UI antipattern with Church encodings #

We’ve seen the rules for representing sum and product types via Church encodings. In this section we’ll use both of them to Church-encode a more complicated data type, and we’ll see how this can help us solve practical problems. Our example will come from the blog posts How Elm Slays a UI Antipattern and Slaying a UI Antipattern in FantasyLand.

Say that we have a UI which will display a user’s country, represented as a string. Our UI needs to make a request to the backend to retrieve this string, and needs to accurately reflect the state of this request when doing so. There are four states such a request might be in:

Not yet made
In progress
Succeeded with a result
Failed with an error

We can represent these possibilities with an ADT. Here’s our type:

type RequestState =
    | NotYetSent
    | InProgress
    | Succeeded
    | Failed;

type NotYetSent = { tag: "NotYetSent" };
type InProgress = { tag: "InProgress" };
type Succeeded = {
    tag: "Succeeded",
    result: string
}
type Failed = {
    tag: "Failed",
    errorCode: number,
    errorMessage: string
}

These ADTs are the sum type of four case types. The “Succeeded” and “Failed” types are product types, which carry additional data.

When we define the UI displaying the user’s country, we need to provide a definition for each of these four cases. If we forget one, we’ve fallen into the antipattern of mis-communicating the state of the request. We could us an if statement, and check in-band which state we’re dealing with, but in TypeScript it’s easy to forget to cover one of the four cases. Instead, we can Church-encode our data type and see how that changes things.

Recall our two rules for Church-encoding ADTs:

To Church-encode an algebraic data type with multiple variants, create a function which accepts an argument for each variant. The function must return the argument corresponding to the variant which the function encodes.
To Chuch-encode a product type, create a higher-order function which takes a second function as an argument. This second function should accept multiple arguments, one for each field in the product type. The higher-order function must call the provided function on the encoded values and return the result.

Let’s use these to work out the type of our Church-encoded RequestState ADT.

First, we’ll encode the outer sum type:

type ChurchRequestState = <A>(choices: {
    notYetSent: A,
    inProgress: A,
    succeeded: _,
    failed: __   
}) => A

Rule #1 tells us the outer shape of this function. notYetSent and inProgress are variants of the union type, so they get the generic type A. However, if we were to use A as the type of succeeded or failed then our type would lose the information that there is additional data stored in the Succeeded and Failed ADT variants. We would no longer have a proper encoding, because we wouldn’t be able to turn our Church-encoded function back into an ADT value. To capture this information, we need to use rule #2 to encode these two variants as product types:

type ChurchSucceeded = <A>(fn: (result: string) => A) => A;
type ChurchFailed = <A>(fn: (errorCode: number, errorMessage: string) => A) => A;

Now, we can plug these into the sum type where they belong:

type ChurchRequestState = <A>(choices: {
    notYetSent: A,
    inProgress: A,
    succeeded: (result: string) => A,
    failed: (errorCode: number, errorMessage: string) => A
}) => A

To demonstrate that our encoding carries equivalent information to our original ADT, here are the corresponding encode and decode functions:

function churchEncodeRequestState(
    requestState: RequestState
    ): ChurchRequestState {

    return function<A>(choices: {
        notYetSent: A,
        inProgress: A,
        succeeded: (result: string) => A,
        failed: (errorCode: number, errorMessage: string) => A
    }): A {
        if (requestState.tag === "NotYetSent") {
            return choices.notYetSent;
        } 

        if (requestState.tag === "InProgress") {
            return choices.inProgress;
        }

        if (requestState.tag === "Succeeded") {
            return choices.succeeded(requestState.result);
        }

        return choices.failed(
            requestState.errorCode,
            requestState.errorMessage);
    }
}

function churchDecodeRequestState(
    churchRequestState: ChurchRequestState): RequestState {

    return churchRequestState<RequestState>({
        notYetSent: { tag: "NotYetSent" },
        inProgress: { tag: "InProgress" },
        succeeded: result => ({
            tag: "Succeeded",
            result
        }),
        failed: (errorCode, errorMessage) => ({
            tag: "Failed",
            errorCode,
            errorMessage
        })
    });
}

In our hypothetical frontend application, we could make our HTTP request method return data in Church-encoded form. This gives the following benefits:

The Church-encoded form requires that callers provide an implementation for every ADT variant. This is not the case for concrete data types which are consumed via a series of if statements.
The Church-encoded form is more succinct, and declaratively maps each variant tag to its corresponding value.
Using Church encodings prevents regressions when additional variants are added to the represented data type. When adding variants to an ADT which is manually inspected using if statements it is possible for existing if-chains to silently ignore the added variant. Adding a variant to a Church-encoded sum type is a breaking change to a function’s type, and will cause the TypeScript compiler to identify every location in the codebase where additional handling is required.
Using Church encodings keeps us in the functional, expression-based world, which can help readability and type inference (in some cases).

To illustrate these differences, here’s a contrived example of how UI code might handle data in each form:

// Using an ADT
let uiContents;
if(adtRequestState.tag === "NotYetSent") {
    uiContents = "request pending...";
} else if (adtRequestState.tag === "InProgress") {
    uiContents = "loading...";
} else if (adtRequestState.tag === "Succeeded") {
    uiContents = adtRequestState.result;
} else {
    const {errorCode, errorMessage} = adtRequestState;
    uiContents = `Received error ${errorCode}: ${errorMessage}`
}

// Using a Church encoding
const uiContents2 = churchRequestState({
    notYetSent: "request pending...",
    inProgress: "loading...",
    succeeded: result => result,
    failed: (code, msg) => `Recieved error ${code}: ${msg}` 
})

As with most coding style discussions, there are tradeoffs and contextual factors to consider. That said, in this case (and in many others) I believe that a codebase’s quality is improved by adding the safety and succinctness of the Church-encoded approach.

As an aside, were we using a traditional Church encoding (consisting solely of fully-curried functions and without record types) our type would look like this:

type PuristChurchRequestState =
    <T>(handleNotYetSent: T) =>
       (handleInProgress: T) =>
       (handleSuccess: (result: string) => T) =>
       (handleError: (errorCode: number) => (errorMessage: string) => T) => T

This loses the explicit, declarative mapping from tags to their handlers and would be nearly unusable in practice. While it may be more “conceptually pure”, conceptual purity does not inherently bring clarity or maintainability to a codebase.

Conclusion #

When operating on algebraic data types we can choose to represent these type directly as concrete data or as encapsulated functions using Church encodings. These two approaches are equivalent in expressive power but they can have different effects on code’s clarity and maintainability. When we understand how to move between them, we can select the best approach for each situation.

I’ve only scratched the surface of Church encodings here. In a future post I’ll show how we can Church-encode recursive data types, and explore how Church (and the related Scott) encodings can provide performance optimizations, as well as how they relate to object-oriented programming.

For a deeper dive on this topic, the paper “Programming in the λ-Calculus: From Church to Scott and Back” by Jan Martin Jansen explores Church encodings (and more) of algebraic data types in the pure lambda calculus.

All code samples in this post are released into the public domain under a Creative Commons 0 license. Code samples and full license text may be found here.

In this case I am taking these two “arguments” as named fields on a single object rather than as full-fledged arguments. This is for readability’s sake, more on this later. ↩
If we had the generic argument on the left side of the =, giving us the generic type ChurchBoolean<A>, then we would have to fix the type of values that a given Church boolean could be used with when that Church boolean was created. This is the difference between a generic type, which is a function, and the type of a generic function. The way we’ve actually implemented it, any value with the concrete type ChurchBoolean is a generic function, which can be used with different type arguments at different times. The ability to move generic arguments to the right is called “higher-rank polymorphism” (or sometimes “rankNTypes”) and is a relatively rare programming language feature. More advanced usages of this feature may be found in the post Existential Quantification in TypeScript, which uses higher-order polymorphism to encode existential types. ↩

Programming should be enjoyable

The Church Encoding of Simple Algebraic Data Types

Representing data as functions

Booleans as algebraic data types #

Church-encoded booleans #

Caveats on my approach #

Church-encoded tuples #

Putting the pieces together: Slaying a UI antipattern with Church encodings #

Conclusion #

Related Posts

Extensible TypeScript with Object Algebras 11 Aug 2024

Bridging the Object-Oriented and Functional Divide with the Visitor pattern 11 Feb 2023

The Church and Scott Encodings of Algebraic Data Types 02 Jan 2022