35

The Holey Array Problem

 5 years ago
source link: https://www.tuicool.com/articles/hit/URB3Avb
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

One of the "features" of JavaScript I hate the most is "holey" arrays. If you're unsure what that is, consider the following:

const array = [1, 2, 3];

That is what is called a "packed" array. Elements are contiguous and the array is made up of one element type: number .

On the C++ side: when using V8 (aka Node.js), and under the hood, this array is actually stored as PACKED_SMI_ELEMENTS , which is a way to store small integers in memory, and arguably the most efficient out of the myriad of ways V8 will store your arrays.

Now consider this innocuous line of code:

array.push(3.14); // push a floating point number to the array.

On the C++ side: your array has just been transformed from PACKED_SMI_ELEMENTS (integers) to PACKED_DOUBLE_ELEMENTS (doubles) in memory. It became slightly different, but is still tightly packed and performant. This transformation is irreversible.

Nothing has changed on the JavaScript side.

Moving on to the next step:

array.push('Hello world!'); // push a string to the array

On the C++ side: your array has just been irreversibly transformed again. This time from PACKED_DOUBLE_ELEMENTS to PACKED_ELEMENTS . A PACKED_ELEMENTS array can hold any JavaScript value; but has to sacrifice much more memory space to represent itself compared to a SMI or Double array.

Now let us proceed to the next line of code:

console.log(array.length); // 5
array[9] = true;
console.log(array.length); // 10

This is allowed in JavaScript, right? You may assign to an arbitrary index in the array, and your array will be padded. So what happens on the C++ side?

On the C++ side: your array has been irreversibly transformed yet again, this time to HOLEY_ELEMENTS . Much, much slower to work with; and you've just made the V8's JIT (Just-In-time compiler) optimisations much harder, as it will be unable to optimise your program to a large extent.

It's worthy of note that calling new Array(n) or Array(n) will always create this type of array and slow down your code.

But why stop here? Let me introduce Satan's special data structure:

array[999] = 'HAIL SATAN! ♥'

On the C++ side: your array has just transformed from HOLEY_ELEMENTS to DICTIONARY_ELEMENTS , and you've summoned a demon that can no longer be banished.

Let me quote the V8 source code directly:

// The "slow" kind.
  DICTIONARY_ELEMENTS,

From JavaScript's point of view: your array just became a dictionary, or in other words: a plain object. The literal worst case scenario of JavaScript arrays.

Why this is dangerous:

  • Such an operation will silently succeed and never throw an error.
  • Any form of loop-based enumeration or attempt at serialisation will most likely crash your server.
  • The array's keys will silently be converted to strings.
  • The array will still serialise to an array, not an object. ( JSON.stringify will trying padding all empty indices using null s)
  • Array.isArray(array) will return true for DICTIONARY_ELEMENTS arrays.

If you try to call JSON.stringify on the array above, this is what you'll get:

[1,2,3,3.14,"Hello world!",null,null,null,null,true,null,null,null,null,null,null,null,null,null,null,null,null,null,...,null,null,null,null,"HAIL SATAN! ♥"]

How this can be used against you:

Consider the following example of a REST API using express to manipulate a todo list:

// Naïve example of holey array potential vulnerability

class Todos {

  constructor(username, items) {
    this.username = username;
    this.items = items || Todos.load(username);
  }

  // add a new todo
  add(todo) {
    this.items.push(todo);
    return this.items.length - 1;
  }

  // update an existing todo
  update(index, todo) {
    // index is expected to be an integer
    // we're making the mistake of accepting an arbitrary/unbounded index here though
    // this operation will succeed silently, and node won't throw any errors with a huge index.
    // e.g. if an attacker passes 10000000, the program won't crash or show signs of instability, the array will silently become "DICTIONARY_ELEMENTS".
    this.items[index] = todo;
    return index;
  }

  remove(index) {
    return this.items.splice(index, 1);
  }

  // another common case:
  // you're keeping a list of todos and want to give the user the ability to reorder items.
  swap(i1, i2) {
    const temp = this.items[i1];
    this.items[i1] = this.items[i2];
    this.items[i2] = temp;
  }

  // load a list of the user's previously saved todos
  // we’re not using a database for simplicity’s sake
  static load(username) {
    const userPath = path.join('data', this.username + '.json');
    if (fs.existsSync(userPath) {
      return JSON.parse(fs.readFileSync(userPath, 'utf8'));
    }
    return [];
  }

  // this saves the array back to disk as JSON when the request is ending
  // holey/dictionary arrays with absurd indices will pad empty ranges with `null`.
  // this could result a multi-gigabyte file if passed a holey/dictionary array with a big enough (sparse) index in them. Most likely we’ll run out of memory first because the resulting string will be too big.
  save() {
    fs.writeFileSync(path.join('data', this.username + '.json'), JSON.stringify(this.items));
  }

}

app.use((req, res, next) => {
  // initialise/load previous todos
  req.todos = req.todos || new Todos(req.session.username);
  next();
});

// add new todo
app.post('/todos/new', (req, res, next) => {
  if (req.body.payload)
    res.json({ index: req.todos.add(req.body.payload) });
  else
    res.status(500).json({ error: 'empty input' });
});

/// update existing todo (vulnerable to unbound indices!)
app.post('/todos/:idx/update', (req, res, next) => {
  if (req.body.payload)
    res.json(req.todos.update(parseInt(req.params.idx, 10), req.body.payload));
  else
    res.status(500).json({ error: 'empty input' });
});

…

// save current todo list after request
// a better idea is to override res.end() via a thunk though.
app.use((req, res, next) => {
  next();
  req.todos.save();
});

Here’s an example malicious request: POST /todos/10000000/update payload="hi"

You now have an invisible problem (10000000 element dictionary array) in memory, when the request ends it will try to write out a huge JSON file, or your server will run out of memory trying to serialise the array to a string.

For further reading on V8 internals:


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK