Faster JS array includes

August 3rd, 2019

I am sure you’ve come across cases where you need to search for items multiple times in the same array. And you have written the code in an easy to read way, but not exactly the most performant way.

const productsToExclude = [4, 5, 10];
products.forEach((id) => {
  // check if product is in exclusion list before processing
  if (productsToExclude.includes(id)) {
    return;
  }
  // start processing..  
});

This isnt a problem in normal cases. But when you product list is numbering hundreds of thousands and you have multiple includes/indexOf calls inside the product iteration loop, they add up and start taking up non-negligible CPU time.

Essentially the time-order complexity of that code is O(n^2) (means if product.length is n, and productsToExclude.length == products.length then you end up searching each of the item within another array of same length.. which means the code does this n*n number of times). The common solution to make this faster, is to create an item “lookup” hash map. Querying a hashmap is O(1) time-complexity (i.e fast) because JS engine internally uses a hash function to find the memory location.

const productsToExclude = [4, 5, 10];
const productsToExcludeLookup = productsToExclude.reduce((acc, id) => { // O(n)
  acc[id] = true;
 return acc;
}, {});

products.forEach((id) => { // O(n)
  // check if product is in exclusion list before processing
  if (productsToExcludeLookup[id]) { // this operation is O(1)
    return;
  }
  // start processing..  
});

The time-complexity is O(n+n*1) = O(2n) = O(n)

However writing this requires more code and more thinking. Today I was re-thinking on whether it would be possible to avoid writing this boilerplate again and again, but rather have an easy utility function to do this optimization for us.

What it would need is to create a lookup map the first time it sees a new array and then keep it in memory until the next time the function is called with the same array. But that has a problem right? The lookup map does take some memory and it’s purpose ends when the initial array is no longer used and is garbage collected (GC). However a naive implementation would keep the lookup maps forever, and thus leak memory.

The solution to this I realized is a WeakMap (ES6 data structure/feature). A WeakMap’s key is garbage collectable if WeakMaps are the only ones keeping references to the key. And furthermore if the key is GCed, and the value has no other references in memory, then the value too would be GCed. So let’s put the pieces togeather

const includes = (() => {
  const itemtoIndexMaps = new WeakMap();
  return (arr, item) => {
    if (!itemtoIndexMaps.has(arr)) {
      itemtoIndexMaps.set(
        arr,
        arr.reduce((acc, item, index) => { // O(n)
          acc[item] = index;
          return acc;
        }, {})
      );
      console.log('created map and cached');
    } else {
      console.log('map fetched from cache');
    }
    return Boolean(itemtoIndexMaps.get(arr)[item]); // O(1)
		// overall O(n+1) for a new array
  };
})();

Let’s test it

> const arr = [1,2,3,4,5,6,7,8,9];
undefined
> includes(arr, 5)  // O(n)
created map and cached
true
> includes(arr, 3) // O(1)
map fetched from cache
true
> includes(arr, 10) // O(1)
map fetched from cache
false

Works. The first run is O(n) because it needs to go through each item in array to create the lookup map. But subsequent calls just fetches the cached map. Now we can change the original code to the following:

const productsToExclude = [4, 5, 10];
products.forEach((id) => { // O(n)
  // check if product is in exclusion list before processing
  if (includes(productsToExclude, id)) { // O(n+1) first time and O(1) thereafter
    return;
  }
  // start processing..  
});

Time-complexity is O((n+1) + (n-1)*1) = O(2n) = O(n). Cool, so we acheived a nice array.includes()-like interface that is performant by default and also doesn’t leak memory.

That’s all folks.

JavaScript