Benchmarking and Optimizing Slow JavaScript

While writing about optimizing data for performance, I rewrote my transformation function a few different ways and benchmarked the performance of each. Wow! The results were surprising.

Some background

Immutability and persistent data structures are all the rage lately. They can reduce errors by preventing modifications to existing data, so functions can rely on consistency.

Languages like Clojure have built-in immutable data types that use structural sharing.

JavaScript has no built in immutable data structures, but libraries like Immutable.js and Mori provide this. When you modify an array with Immutable.js, it returns a new array that references the remaining objects from the previous array.

But, Immutable.js requires some additional ceremony, like initializers, getters and setters:

const { Map } = require("immutable")
const map1 = Map({ a: 1, b: 2, c: 3 })
const map2 = map1.set("b", 50)
map1.get("b") // 2
map2.get("b") // 50

Using ES6 Spread Operator for Immutability

With the advent of ES6, you can avoid mutations using the spread operator, which looks like an ellipsis ...

With the spread operator, you can copy the elements of an existing object or array to a new one, thus you have a new object with your desired modification, without changing the original:

const newArray = [...oldArray, newItem]
const newObject = { ...oldObject, newItem: "foo" }

The best part about the spread operator is that it works with standard JavaScript objects and arrays, no new libraries, getters or setters needed. But, it makes a complete copy of every element from the previous array. Sometimes, that's exactly what you want, so it's very useful.

TypeScript, tslint-immutable

Why do we care? In our most recent frontend projects, we've been using TypeScript to great effect. Increased type safety, fewer run time bugs, and with tslint-immutable, we found a way to enforce immutability at compile time instead of at runtime and ditched the use of Immutable.js, opting for TypeScript readonly properties and ReadonlyArray's instead.

The only drawback is, sometimes you need to loop through one array and add items to another, repeatedly in a loop. If we enforce immutability on arrays, we can't use Array.push() to mutate an existing array. To work around that, we can create a new array and use the spread operator to copy the previous items, on every loop iteration. Wait a second...

Hmm, that might be slow

Question was, how slow was it?

So, I made a few variations of a little function that transforms arrays of objects into a map of objects and an array of keys, to see how fast they performed.

The task was to transform this structure:

const authorArray = [
  {
    id: "1",
    name: "Scott",
  },
  {
    id: "2",
    name: "Aron",
  },
]

into this one:

const authors = {
  // a map of items, keyed by id, for fast lookups
  items: {
    1: {
      id: "1",
      name: "Scott",
    },
    2: {
      id: "2",
      name: "Aron",
    },
  },

  // an ordered array of keys, for maintaining original sort order
  keys: ["1", "2"],
}

Turns out you can implement this in widely varying degrees of efficiency. To cut to the chase...

This was the most efficient implementation:

function parseUsers3(apiUsers) {
  let acc = { items: {}, keys: [] }
  for (let i = 0; i < apiUsers.length; i++) {
    acc.items[apiUsers[i].id] = apiUsers[i]
    acc.keys.push(apiUsers[i].id)
  }
  return acc
}

Note that besides the let keyword, we're using bog standard old school imperative JavaScript, with a for loop and mutable objects.

And here was the least efficient:

function parseUsers2(apiUsers) {
  return apiUsers.reduce(
    (acc, user) => ({
      items: {
        ...acc.items,
        [user.id]: user,
      },
      keys: [...acc.keys, user.id],
    }),
    {
      items: {},
      keys: [],
    },
  )
}

Here we are using a reduce function, but that's not the slow part. Note the use of the spread operator to avoid mutations to the existing accumulator object, acc. Each iteration creates a brand new accumulator object to hand back to the reducer, with fresh copies of all items and keys from the previous iterations. That, it turns out, is not a cheap operation.

Benchmarking

In my initial attempt at measuring code performance, I rolled my own janky benchmarking code that measured how long it took to call a function N number of times.

Then it occurred "perhaps this is not the best benchmarking technique ever conceived. Maybe there's already a library out there by people who thought about how to do it right."

I found that in Benchmark.js, "a library that supports high-resolution timers & returns statistically significant results".

Benchmark.js spits out some nice results like this:

parseUsers1(apiUsers) x 5.93 ops/sec ±7.56% (20 runs sampled)
parseUsers2(apiUsers) x 5.15 ops/sec ±1.83% (17 runs sampled)
parseUsers3(apiUsers) x 29,350 ops/sec ±1.37% (83 runs sampled)
parseUsers4(apiUsers) x 19,607 ops/sec ±3.26% (80 runs sampled)
parseUsers5(apiUsers) x 11,763 ops/sec ±8.97% (68 runs sampled)
parseUsers6(apiUsers) x 20,284 ops/sec ±1.66% (81 runs sampled)

Fastest is parseUsers3(apiUsers)

Note that the fastest implementation could run my benchmark test about 30,000 times a second, while the slowest would run about 5 times a second. That's an insane difference! I guess we're ditching the spread operator here and going with good old Array.push().

Writing Benchmark Tests

Writing a benchmark test with Benchmark.js is very simple. Create a suite and add a test case for each implementation that you want to compare:

var Benchmark = require("benchmark")
var suite = new Benchmark.Suite()

// add tests
suite
  .add("parseUsers1(apiUsers)", () => {
    parseUsers1(apiUsers)
  })
  .add("parseUsers2(apiUsers)", () => {
    parseUsers2(apiUsers)
  })
  .add("parseUsers3(apiUsers)", () => {
    parseUsers3(apiUsers)
  })
  .add("parseUsers4(apiUsers)", () => {
    parseUsers4(apiUsers)
  })
  .add("parseUsers5(apiUsers)", () => {
    parseUsers5(apiUsers)
  })
  .add("parseUsers6(apiUsers)", () => {
    parseUsers6(apiUsers)
  })

  // add listeners
  .on("cycle", (event) => {
    console.log(String(event.target))
  })
  .on("complete", () => {
    console.log("Fastest is " + this.filter("fastest").map("name"))
  })
  // run async
  .run({ async: true })

Here's the repository on Github containing all 6 implementations, and the tests which you can run yourself.

Conclusions

There are many ways of accomplishing a task in code. Premature optimization should generally be avoided, but if you suspect something might be slow, and it's going to occur frequently, measure performance! Solid benchmarking libraries exist in most languages these days that make it easy.

I certainly learned something from this experiment: use the right tool for the job. Don't blindly use immutability or the spread operator for every situation. When accumulating items into arrays or objects, it's silly, and kills performance.