Type-Driven Development with TypeScript

I am always interested in making my coding process faster and more robust. I want to be confident that my code will work as expected. And I want to spend as little time as possible debugging, testing, and hunting down pieces of code that I forgot to update when I make changes to a project. That is why I am a fan of correct-by-construction methods. Those are methods of constructing programs where the program will not build / compile unless it behaves the way that it is supposed to.

letterpress - Photo by Marcus dePaula on Unsplash

Correct-by-construction

Test-driven development is one form of correct-by-construction method. The philosophy of test-driven development is that your tests are the specification for how your program should behave. If you look at your test suite as a mandatory part of your build process, if your tests do not pass the program does not build because it is not correct. Of course the limitation is that the correctness of your program is only as certain as the completeness of your tests. Nevertheless studies have found that test-driven development can result in 40%-80% fewer bugs in production.

Writing your program around types

Type-driven development is the practice of writing your program around types, and of choosing types that make it easy for the type checker to catch logic errors. In type-driven development your data types and type signatures are the program specification. Types also serve as a form of documentation that is guaranteed to be up-to-date. Type-driven development is another correct-by-construction method. As with test-driven development, type-driven development can improve your confidence in your code, and can save you time when making changes to a large codebase.

Planning your types

But it takes more than writing type signatures and running the type checker to gain those benefits. The same studies that show the effectiveness of test-driven development found that writing tests after the fact is not as effective as the "test first, then implement" flow of test-driven development. That could be because writing tests first encourages programmers to write code that can be tested effectively. The same idea applies to type-driven development: you will get the most benefit from your type checker when you expose as much information as possible in type signatures. That means planning your types, and writing code to match.

Types vs Tests

Types and tests are not an either/or proposition. There are plenty of classes of errors that type-checking is bad at catching. You should catch those with tests! On the other hand there are classes of errors that are tedious to write tests for but that type-checking can catch easily. My opinion is that types are quick to write, quick to read, and do not require fixtures, mock data, test databases, setup, or teardown; so I lean on types when I can.

A simple example - sanitized user input

Let's look at a simple example in TypeScript. From time to time I end up working on a project where keeping track of whether some user input has already been sanitized becomes kinda tricky. Problems come up when HTML sanitization is run where it shouldn't be. That can lead to double-escaping issues, where content like "Types & Tests" ends up rendered as "Types & Tests" (thankfully some sanitizers are smart enough to avoid this); or you can run into situations where functionality is lost because trusted HTML content is stripped. On the other hand if content is never sanitized you get a code injection vulnerability.

Distinguishing trusted vs untrusted content

From a type-driven perspective the problem is that the program specification (i.e. type signatures) makes no distinction between user-submitted content (type string), and trusted HTML content (type string). If we use two different types we can make the program specification more precise, providing essential information to the type checker and to programmers reading the code.

// HTML.ts

import sanitizeHtml from "sanitize-html"

export type HTML = {
  trustedHtml: string
}

export function sanitize(input: string): HTML {
  return {
    trustedHtml: sanitizeHtml(input),
  }
}

export function trustedHtml(input: string): HTML {
  return {
    trustedHtml: input,
  }
}

export function fromHtml(input: HTML): string {
  return input.trustedHtml
}

Untrusted input still has type string. The new type HTML represents content that has been sanitized or has been explicitly declared to be trusted. We don't want callers to have to know about the internal representation of the HTML type. (We might want to change that representation in the future.) So the HTML.ts module defines constructors for producing an HTML value from either untrusted or trusted input, and an extractor to get the underlying string from an HTML value. A function that ultimately outputs HTML might look like this:

import { HTML, fromHtml } from "./HTML"
import { ServerResponse } from "http"

function renderContent(
  response: ServerResponse,
  html: HTML,
  cb: (err?: Error) => void,
) {
  // Simplified for clarity
  response.write("<body>" + fromHtml(html) + "</body>", "utf8", cb)
}

Verifying content is sanitized exactly once

The result is that the type checker is able to do the hard work of checking that untrusted input is sanitized exactly once through the entire logic flow of the program. Content that has never been sanitized or declared trusted will not have the correct type when it is passed to renderContent, and you will get a type error. If content has already been sanitized once then you will get a type error if sanitize is called a second time because the input will have type HTML, not string.

const untrusted = "hello <script>alert('world')</script>"
renderContent(response, untrusted, () => {})

// error TS2345: Argument of type '"hello <script>alert('world')</script>"' is not assignable to parameter of type 'HTML'.

const trusted = sanitize(untrusted)
sanitize(trusted)

// error TS2345: Argument of type 'HTML' is not assignable to parameter of type 'string'.

You could make the same change in plain Javascript without static type checking, and get similar errors at runtime - which would be almost as helpful. But without type checking you would need thorough test coverage to catch mismatches. With type checking if we had an existing renderContent function that took a string input, and changed it to take HTML input the type checker would immediately highlight all of the call sites that need to be updated.

Leaning on the type checker pays off when the program accumulates multiple layers of content processing:

import { HTML, fromHtml, sanitize, trustedHtml } from "./HTML"

function logInjectionAttempts(content: string): string {
  if (content.match(/<\s*script|onclick\s*=/i)) {
    console.warn("injection attempt", content)
  }
  return content
}

function appendLikeButton(content: HTML): HTML {
  return trustedHtml(fromHtml(content) + '<button class="like">+1</button>')
}

function appendWordCount(content: HTML): HTML {
  const str = fromHtml(content)
  const count = str.split(/[\s]/).length
  return trustedHtml(str + `<aside>word count: ${count}</aside>`)
}

const responseContent = appendWordCount(
  appendLikeButton(sanitize(logInjectionAttempts(userInput))),
)
renderContent(response, responseContent, cb)

When a program gets big it can be easy to accidentally put content processing steps in the wrong order, or to lose track of what steps have been performed at any given point in a program. Type-driven development helps to set you straight.

Opaque type aliases

The implementation of the HTML type in the examples here requires an object wrapper that is not strictly necessary. It would be OK for untrusted input and HTML to both be strings at runtime if we could assign them distinct types at type-checking time. We could use a trick described by Charles Pick at codemix to define an opaque type alias, which we could use to declare types that TypeScript treats as incompatible, but that actually represent the same runtime type. That would be a "zero-cost abstraction": an abstraction that has no runtime overhead. I did not use an opaque type alias for this post because that would get into TypeScript features that are a bit more advanced than I wanted to cover in this post. But maybe we will revisit that idea another time.

Type-driven development and functional programming

Type-driven development goes hand-in-hand with functional programming (FP). That is because FP discourages side-effects and mutation. A type signature can tell you all about the inputs and outputs of a function or method; but most languages do not have options for declaring side-effects in type signatures. Writing code in a functional style is how you make important information available to your type checker. For example the processing steps above could have been written in a more imperative fashion:

type Result = {
  status: number
  content: string
}

function logInjectionAttempts(result: Result) {
  if (result.content.match(/<\s*script|onclick\s*=/i)) {
    console.warn("injection attempt", result.content)
    result.status = 401
  }
}

function appendLikeButton(result: Result) {
  result.content += '<button class="like">+1</button>'
}

function appendWordCount(result: Result) {
  const count = result.content.split(/[\s]/).length
  result.content += `<aside>word count: ${count}</aside>`
}

const result = { status: 200, content: userInput }
logInjectionAttempts(result)
appendLikeButton(result)
appendWordCount(result)
response.write("<body>" + result.content + "</body>", "utf8", cb)

Imperative mutations are weakly typed

This version uses a command pattern, where each function updates a data structure and does not return a value. Without return values, there is nowhere in the type signatures to indicate where content changes from an untrusted string value to a trusted HTML value. The input types in a type signature specify requirements that must be satisfied before a piece of the program can run. The return type describes changes to program state. A signature with a void return type does not give any information about state changes. void functions are sometimes useful and necessary; but if they are used excessively then the program ends up underspecified.

This was an introduction to the idea of type-driven development with some simple examples to give you a peek what types can do for you. Watch this space for posts that will dive deeper into thinking in types.

Photo by Marcus dePaula on Unsplash