You might have heard about the high-impact security vulnerability issue recently fixed and announced by the NodeJS team. This post will attempt to explain the issue, how and why it happened.
Thanks to the amazing effort of the team at NodeJS, this vulnerability has been fixed immediately across releases (4.x, 6.x, 7.x, and 8.x).
Make sure you upgrade your node version as soon as you can (preferably when you finish reading this post!). For upgrading your node head to the official nodeJS blog for details.
The vulnerability roots from Hash tables, so let us start with a quick recap on HashTables, just in case you missed your computer science classes.
HashTables
HashTables fulfill the dream of *constant time* insertions and access.
Some of the places where HashTables are used:
- Method lookups.
- Sets, Objects, and Maps.
- HTTP headers.
- JSON representations.
- URL-encoded post form data.
A HashTable is a group of linked lists, the linked lists must be small in order for the performance to be good.
The create HashTable function pre-allocates the number of linked lists it can contain, throughout this post, the number of linked lists the HashTable can hold is denoted by the word l
.
l = 2**n
The best scenario for optimal performance is to have n/l
entries in each linked list, where n
is the number of entries in the HashTable. so if we have 256 entries in the HashTable, the perfect hash function would distribute them into the 256 lists, each holding only 1 entry.
Example: Storing strings one to ten in a HashTable
In this example, our hash function is the following:
H(s) = first byte of s
l = 256
In our “one” to “ten” example, inserting the values into the HashTable will result in something like the following:
{
"o": ["one"],
"t": ["two", "three", "ten"],
"f": ["four", "five"],
"s": ["six", "seven"],
"e": ["eight"],
"g": [],
"h": [],
"n": ["nine"],
"r": [],
…
}
Notice that we have empty linked lists in the HashTable because they are not filled up by any value. Additionally, there is a pile-up on the t
list in the HashTable. This is a bad hash function since it does not do a good job of distributing the strings through the linked lists. No matter how many linked lists you have, if the hash function does a bad job in the distribution among them you will get bad performance, since linked lists are slow, hash tables are fast by having only short linked lists in them.
To avoid overflowing the HashTable, since there are more H(s)
results than l
, the hash function is always reduced by l
using mod
H(s) % l
H(s) = first byte of s
is not a good hashing function. NodeJS implements a very good hash function that quickly looks through the whole string, and gives randomly looking results to spread the results across the HashTable.
Since we have a good hashing function, normal usage should *never* cause HashTables to over-flood. However, an attacker just might.
Hashing malicious strings.
The attacker provides strings where H(s)%l
will result in the same HashTable index, hence all these strings will be stored in the same linked list, making the app go super slow.
To solve such issues, programming languages looked at different solutions, some are the following:
Solution #1
However, most programs go with the simpler approach of using simple linked lists to avoid bugs, heavy re-writes, debugging issues, and implementation complexity.
Solution #2
n
entries.This solution looks pleasant, but for real-world applications, you cannot just discard newly coming results, hence this solution is not viable for all situations. It can be only used in caching solutions.
Solution #3
sha256
.Crypto hash functions should be collision-resistant, you should not find any two strings with the same output.
But this solution is troublesome for two main reasons:
- The hash function should be fast, crypto hashes are not the fastest.
- Even if the crypto hashes were fast enough, this does not solve the problem. The attacker does not need to find collisions in the crypto hashing function itself, instead, they need to find collisions in the output which is reduced to the number of
l
linked lists specified.H(s)%l
is not collision-resistant based on the hashing function, since the output will always be reduced to the2**n
lists. So the attacker might run a few million iterations to find critical amounts of collisions.
Current Solution
Randomizing the hash function with a secret key.
These attacks were repeated many times throughout the years under different names: low bandwidth denial of service attacks, hash flooding, algorithmic complexity attacks, etc.
Attacks were targeted at many programming languages, such as Perl, Redis, and python. Usually the response to these attacks is to secretly randomize the hash function on each application boot, or hashtable initialization, this way the attacker has no way to guess the hashing function hence they cannot find collisions because they do not know the secret key of the hashing function.
NodeJS uses the same approach, by randomizing its “HashTable seed”.
**Whether this is actually secure or not is a debate for another time.
The NodeJS Constant HashTable vulnerability
The vulnerability came along by a bug introduced in the randomizer of the HashTable seed where the seed was always the same for a given version of node.
NodeJS was susceptible to hash flooding remote DoS attacks as the HashTable seed was constant across a given released version of NodeJS.
For example, an HTTP node server is vulnerable to this attack since the URL-encoded post form data is stored in a HashTable on the heap. The attacker can send small payloads each with different input but yield the same hash key, across multiple post requests (say 10 bytes each), each request will accumulate the payload in the same HashTable linked list stored on the heap.
So after accumulating the requests and abusing the same linked list, the server will face huge delays in responses since it is synchronously trying to access the same linked list in the HashTable.
The Cause
The JavaScript specification includes a lot of built-in functionality, from math functions to a full-featured regular expression engine. Every newly-created V8 context has these functions available from the start. For this to work, the global object (for example, the window object in a browser) and all the built-in functionality must be set up and initialized into V8’s heap at the time the context is created. It takes quite some time to do this from scratch.
V8 uses 'snapshots' to solve this issue: A snapshot is basically a saved context that can be re-used in future boots.
Fortunately, V8 uses a shortcut to speed things up: just like thawing a frozen pizza for a quick dinner, we deserialize a previously-prepared snapshot directly into the heap to get an initialized context. On a regular desktop computer, this can bring the time to create a context from 40 ms down to less than 2 ms. On an average mobile phone, this could mean a difference between 270 ms and 10 ms.
Snapshots basically enabled NodeJS to reduce boot time by taking a snapshot of the pre-initialized boot context and using it for the next runs. Additionally, users are allowed to use custom snapshots to build on top of the original context.
The root cause of the vulnerability originates from having the v8 snapshots feature enabled by default, so the initial randomized hash table seeds are constant across each version of NodeJS. This minor error resulted in NodeJS being susceptible to remote DOS attacks via hash flooding.