For the address verification team at Lob, last year featured quite a few wins. We created a slew of accuracy-boosting features and machine learning models, introduced several low code tools, scaled up our service to handle hundreds of requests a second, and we even launched an Address Autocomplete API. This API had done well under low loads for a select set of customers, but heading into this year, scale was at the front of my mind. In load testing, we were nowhere near our goals, but our recent investigation uncovered a few issues we were able to address (excuse the pun) for a 174% performance boost.
Autocomplete API wasn’t scaling
Address verification performance is partially dependent on the types of addresses fed in; “dirtier” addresses (spelling mistakes and/or incorrect info) take longer to process. So we typically take our load test results and consider an additional 10% performance hit to emulate real-world performance. That said, we are never satisfied and continually try to stress our systems incrementally until we see a performance drop. Though requests per second (rps) for our address autocomplete API are much lower, on the whole, stress testing this functionality is just as important. (Our goal was 30-40 rps without going above 200ms). While at low volume, our address autocomplete service was serviceable, during load tests, at just 2x the current load, we saw latency hit nearly 2 seconds a request.
Metrics:
- Median: 63 ms
- Average: 149 ms
- 95th percentile: 548 ms
- At just 8 rps
High Elasticsearch CPU load and incorrect hardware profiles
Looking at Elasticsearch, I noticed that we were running at 140% average CPU load in our Elastic Cloud deployments at 6 vCPUs. If we wanted to scale to 2x the CPU load, we’d have to pay at least 2x the current cost (costs for CPU are not 100% linear; it would cost nearly 5.8x more for just 4x the CPU) so we couldn’t just mindlessly horizontally scale the service. One fundamental question I had was, why did we only have 6 vCPUs?
It turns out we were unaware that Elastic Cloud offers several different hardware profiles, each with its own blend of storage, memory, and vCPU. We were using I/O-optimized instances, which are great for logging or storing massive terabytes of data, and requiring a search here or there. However, in our autocomplete use case, we only had 60 gb of data, but we make anywhere from 5 to 16 thousand Elasticsearch searches a second – we needed a CPU-optimized instance to better handle that query load.
The shift from an I/O to a Compute-optimized profile immediately gave us 4x the vCPUs for the same exact cost.
Slow Running Queries
Unfortunately, when we ran more load tests, there wasn’t as big of a jump in performance as we thought there would be. Our upper 95th percentile performance was still atrocious compared to our overall median performance. This ratio typically points to slow-running queries as the likely cause of performance degradation. So, we took both an automated and manual approach to uncover the culprits.
We often rely on ElasticSearch's slow search log tool, but in this case, we ran a filter for slow requests against our request logs.
To manually test the product, we had an api key, and wrote a script that emulated someone typing out multiple addresses (just letter by letter) and saved the response times. It was pretty obvious early on that requests with only primary numbers were nearly 10x slower than other queries.
Our problem was an error in how we query for possible primary numbers. When searching for primary numbers, like in 1234 Main St, we have two fields: a primary_low (lowest range an address record could have) and primary_high range (highest range). So, we were searching for any records that could be less than or equal to the primary_low, and any records whose primary number can be greater than or equal to the primary_high:
Because the example query only has a single lte or gte, the result is an unbounded range search that queries our entire index. As you can imagine, depending on the search, that can be a very expensive query.
Streamlining the Query
So instead of a range query that would search all possible numbers, we bounded the search to take the user inputted number, and then find anything whose primary low or high was within a +/- 100 range.
For comparison, let’s you are a mail carrier searching for house number 40—pretend you have no other info because the letter is torn. You would have to search every block to find the one you were looking for. Instead, if you could immediately narrow that search to only blocks whose address range possibly contains the value you are looking for, your search would go much faster.
The lesson: do not execute unbounded range queries. Elasticsearch documentation for range searches doesn’t mention this, but it made a world of difference for us.
Results
A customized hardware profile and streamlining a primary query had the positive impact we were looking for: significant improvements in the speed of address autocomplete.
"After" Metrics:
- Average: 38 ms
- 95th percentile: 218 ms
- At 120 rps
Our 95th percentile improved by nearly 4x, close to our target of 200ms, but at 120 rps! We were able to meet our API performance benchmarks resulting in improved type-ahead capability with simple changes; it’s worth taking a look to see if they could work for you.