How Docker memory caching works & finding a hidden hapi vulnerability

It was a dark and stormy night in May when our internal metric readings for memory usage spiked in our Address Verification service for no obvious reason. Even weirder than that, our largest API and main service had suddenly started crashing and restarting every several hours—a dark night indeed. The following is a summary of our painful investigation over several weeks, involving multiple team members and endless :confused_dog: and :computer_rage_bang_head: emojis. But (spoiler alert!), after the storm comes a rainbow.

TL;DR: We learned how Docker memory caching works and found a 7-year-old file management bug which led to a secret hidden hapi vulnerability; all of which helped stabilize our containers and product offering.

What you need to know about Docker

Traditionally for those who have worked with Docker and Datadog, the main way to measure memory usage in a container is to look for Docker.mem.in_use. Now for reasons unbeknownst to me, Docker.mem.in_use used to only measure Rss memory usage rather than the combination of both Docker.mem.rss and Docker.mem.cache:

Docker.mem.cache: The amount of memory that is being used to cache data from disk (e.g., memory contents that can be associated precisely with a block on a block device).
Docker.mem.rss: The amount of non-cache memory that belongs to the container's processes.; used for stacks, heaps, etc.

RSS memory usage is essentially what most would think about when they see a memory leak; a piece of the heap not being garbage collected, or an object that continues to grow infinitely. And that’s why a large chunk of our investigation looked towards traditional sources of memory leaks. But after several weeks with little progress, I started to piece together why our container memory readings had spiked and why the memory leak in our main service had started at roughly the same time.

During my investigation I discovered that on May 9th, Datadog made a change in how memory data is read: mem_in_use is now a combination of both rss and cache usage.

This was when a lightbulb went off: None of our Docker containers had any mounts of volumes attached, so whenever we wrote to disk, we would be writing to memory instead. Logically it makes quite a bit of sense, if you don’t have disk space the only other place you can write to is memory(ram).

This actually even goes a step further, there might be instances where you may have a mount or volume but you still notice docker.mem.cache going up. Well certain linux distributions will freely cache disk read/writes to memory if there is ample space, and will automatically start freeing it up once you start running low. For a deeper breakdown, here’s a great article: Linux Ate my RAM!

So:

import { promises as fs } from 'fs' 

await fs.writeFile(...)

…wasn’t writing to disk but straight to memory. Specifically for our Address Verification Service, we load nearly 4 gigabytes of files into memory at runtime in order to operate.

Now this helped us identify and resolve the problem in our Address Verification service: we were improperly using cached Docker builds. Once we pushed a fix for that, our memory readings went back to normal. For those that are confused why that might be the case, the details need to be kept in house, but in short, if our assets had been properly cached as a docker layer, the 4 GB of data in that case would be counted as part of the image size and would be part of Docker’s storage driver. That way it would no longer be using memory to store those files and instead would properly be a part of the disk space.

What you need to know about hapi file management

With one service fixed, I just had to fix the memory problem in our core API. To understand how I fixed it, I think it’s relevant to provide some background into hapi.js. We use hapi as our main framework for our api and are currently running v20. As a company that deals heavily with turning assets into printable mail pieces we needed a framework that makes it easy to work with parsing payloads.

The main options of dealing with payloads that Hapi provides are:

Data
Streams
Files

The first option, Data, is fairly straightforward: if you pass a JSON body into a Post request it’ll parse it into JSON data. Streams will do the same thing, but if you get a multipart upload (i.e., a file is uploaded), it’ll read that file into a stream without any additional tooling needed. The third is similar, but it’ll write a file to a specified directory rather than give you a stream:

‘File’: the incoming payload is written to a temporary file in the directory specified by the uploads settings. If the payload is 'multipart/form-data' and parse is true, field values are presented as text while files are saved to disk. Note that it is the sole responsibility of the application to clean up the files generated by the framework. This can be done by keeping track of which files are used (e.g. using the request.app object), and listening to the server 'response' event to perform cleanup.

Now there’s two things that developers should be aware of:

If you’re running a Docker container with no storage method, hapi will write your file to memory. This goes back to what I mentioned earlier. It’s quite intuitive but you cannot go with an assumption that disk storage actually exists with how popular services like s3 are now and how volumes are built in by default.
The second is, that it is the sole responsibility of the application to clean up these files. Unfortunately we missed this line in the docs and hadn’t actually been cleaning up those files. Now this in of itself isn’t the biggest deal breaker. There were several endpoints that were failing to delete these files after we had performed a major refactor, so I quickly cleaned those up. Immediately after those changes, our containers were showing a significant decrease in memory leaking but it was still leaking. Our containers were now crashing every 3 days rather than every 3 hours. This led me to have a final discovery…

The Hapi.js request lifecycle doesn’t make a lot of sense

While the hapi docs do let us know that we should be deleting assets, they fail to raise a pretty important concern in that it’s super important to listen to a server response event rather than manually handling file deletion within your normal code paths. The huge issue is that payload parsing occurs first in the hapi lifecycle before almost anything else—stuff like authentication and request validation happen after a payload is parsed—creating somewhat of a vulnerability if you’re not careful.

Say a malicious actor, let’s call him Bob, had a personal vendetta against Lob. They could continually try to hit our endpoints with ill-formatted requests, along with a massive 200 mb asset attached. That request payload would be parsed and the file would be written to memory. Now we’re smart enough to create some validators that will check any files attached to make sure they’re less than a certain size, let’s say 20 mb in this case. That same request would certainly fail payload validation and then we would send a 422 response. But that 200 mb of data is now sitting in memory, and with a couple more requests it would start causing our containers to run out of memory. So rather than making sure these files were deleted our normal code path and controllers, we had to make sure they were being deleted regardless of whether they made it that far in the pipeline.

hapi lifecycle

All it took was a little chunk of code to clean up response payloads right before a response.

Say we’re handling a request like this:

/v1/upload

{

“Upload_name”: “Testing”,

“file_we_want_to_upload”: “/tmp/file.html”

}

We need a response handler that’ll delete that file right before sending a response regardless of whether it was successful or not.


module.exports = {

 name: 'cleanUpAssets',

 register: async function (server) {

   server.events.on('response', async function (request) {

     rimraf(request.payload.file_we_want_to_upload) }

};

The end of the rainbow

And with these changes our memory was no longer spiking! The best part? Even after we shrunk our container memory limit by 1/4th, we were still seeing stable memory readings Celebration ensued! Now the emojis were of the :partyporg: and :dancingpickle: variety.

This reduction in cpu and memory is likely to result in AWS savings from right sizing containers; there may even be future savings when we complete our migration to Nomad.

Engineers may occasionally get called stubborn, but in the Case of the Mysterious Memory Leak, tenacity paid off. In leveling up our Docker and hapi knowledge, we uncovered a legacy bug and a potential vulnerability, and the changes we made resulted in a more stable, secure, and cost-effective operation.