Mongo Gone Zalgo

Mongo Gone Zalgo

Abandon all Hope Ye Who Enter Here

Do not extend JavaScript built in prototypes! I repeat do not extend JavaScript built in prototypes!

Ok, wanted to get that out upfront, but for those with patience, stick around for a twisted tale.

It all started a few years ago when a colleague shared this post, Designing APIS For Asynchrony. It explains why you should not craft functions which sometimes operate asynchronously and other times take a purely synchronous path through to a callback. The post calls this anti-pattern, "releasing Zalgo". This is indeed something to be avoided, but it's not what I'm about to talk about.

I had never read about the mythical demon, Zalgo. Zalgo is a creation of internet, a god of chaos and destruction. His coming is heralded by proclamations of scrambled text like this

Zalgo Text Generator

I lost hours (after work of course), examining the internet's finest offerings to Zalgo.

I also became obsessed with striking this pattern from our codebase. I was quickly becoming known as the Zalgo guy at the office.

Then, for a while, I moved on, ... until one day, months later, while trying to track down a strange bug, I queried my local Mongo database and saw this!

WHAT THE FUCK!!! My stomach turned over as thoughts began racing through my head. "I've been running mongo with no password, clearly I've been hacked...no wait it only listens for local connections?! An office prank, but how did they get access to my machine?!"

I posted screen shots and db dumps to slack. Everyone at work found it funny (except me), but no one fessed up.

At this point I became worried. Most of the other developers probably assumed I had done something strange to cause this, but I knew that wasn't the case. All I could think was that, somehow our source had done this to a clean database under a normal use case. What if this had happened to a customer!?

Seeing that that the mongo document literally contained the key "zalgo", I began searching our code for zalgo. Fun times, Cloc output (line counts):

I didn't really expect to find this word in our code, but, turns out there was exactly one place that the word zalgo appeared. It was inside a dependency, colors.

It all starts with this one seemingly innocuous line

const colors = require('colors');

In fairness to the colorJS folks, it starts there, but getting to the end of this tangled web requires several more mistakes, but lets start by looking at what ColorJS does.

ColorsJS extends String.prototype with several useful "getters" that make it easy to add colors to your console output. For example,

const colors = require('colors');
console.log('welcome'.green);

This will output a pretty green welcome message on the CLI. Poking at this in the CLI,

'foo'.red
=>
'\u001b[31mfoo\u001b[39m'

This is the special ANSI escape string needed to color text red, makes sense, but my favorite bit of ColorJS code is,


"Please no," that's not what you want to see in the comments of dependency source. ColorJS conveniently offers the ability to output your logs in Zalgo mode! While perhaps not needed, I can't say I don't enjoy this feature. How on earth did attributes from the string prototype end up in Mongo though?

Skipping over the month it took to eventually reproduce this issue from a clean database, it turns out that in our model layer you can pass in a JavaScript object via the constructor and it will recursively build a model tree. Inside the instantiation code we use:

_.extend(this.attributes, newProperties);

_.extend comes from lodash and expects newProperties to be an object like { foo: 'bar' }, but in some cases we were passing a string.

It's not like we were obviously passing strings to _.extend. I do some dumb things but not that dumb.

How it worked was that when an array of objects was nested in the object being passed to a model constructor, the model layer intelligently tried to cast it to a collection of models. Something like

new ZalgoModel({
   name: 'sweet mother of god!!!'
   moreListsThanMobyDick: [
      {
        name: 'some nested model',
        modelType: 'NestedModel'
      },
      {
         name: 'some other model bro',
         modelType: 'NestedModel'
      }
   ],
   listWithString: [
      'nothing to see here'
      'i could not possibly be Zalgo'
   ],
   modelType: 'ZalgoModel'
});

Unfortunately a field which had previously been a collection of models had been changed to a collection of strings and migrations had not caught all cases. If the stars were aligned and you had a database with an existing collection of models and then applied the correct series of edits a string would get passed to the new Model constructor, eventually leading to the aforementioned misuse of _.extend.

Interestingly, in both UnderscoreJS and earlier versions of lodash a type check prevented this, but as of the current version (4.17.4), that is no longer the case. Passing a string as the source argument will do for (var key in value), where values is the string.

Try it yourself

const colors = require('colors');
const __ = require('lodash');
let obj = {};
__.extend(obj, 'bad wolf');

I assure you, nothing good comes of it.

I reported this to lodash to see if they consider it a regression. They explained that this tracks with the behavior of Object.assign and is an edge case. Point taken; so, in short, do your own type checking and do it carefully.

Here is very contrived example which lets you create your own Zalgo infested database. Run it, assuming you are a sick bastard.

Takeaways

  1. Zalgo is very real, he comes, and he will devour us all
  2. Be very careful when doing type checking in JavaScript
    • (typeof '') => 'string'
    • (typeof (new String('')) => 'object'
    • Don't assume anything with .length is an array, strings, any enumerable object, and many other objects can also contain length
    • And Just for fun, (typeof null) => 'object'
  3. Extending the prototype of JavaScript builtins while powerful should be exercised with great caution
    • It creates hard to debug code
    • In addition to the edge case of enumeration, you could override built properties (perhaps ones that will be added in a future ES version).
    • The more libraries that do this the greater the chance of collision, no one wants dependencies fighting over the same name space

Resolution

We now have much more careful type checking (haven't been won over to TypeScript yet, but I'm starting to see the light). We also switched away from Colors to ChalkJS. This fork of Color states on gibhub that it was created in response to this very issue. When writing my own APIs I am now very careful to avoid the temptation to extend built in prototypes. You can create the illusion of extending the language this way which is very cool, but the magic is not worth it.

Remember, around every corner, inside each and every function lurks Zalgo, waiting for the right moment to strike.