To start thinking about load testing

Apache bench

This is our step 1 tool, but keep in mind that this is single threaded. Think of ab like sanity testing for performance. For step 1, you can use a single machine, with a good amount of RAM and CPU. Run loads for up to 15–30 mins to check nothing obvious is going wrong.

wrk

wrk is our step 2 tool for load testing, it is similar to ab, but this will puts proper load (concurrent). In step 2, we would use a big machine (think 64 GB ram, 8–16 cores). We are still testing on a single machine in this phase though. Run loads for about 30–60 mins. This gives a good understanding of the system before moving on to the next phase.

Up until 1 and 2, we are probably testing single end points of the application. This is easy to setup and get the results. But things are not complete until we do distributed load testing + complete application flows (scenarios where multiple API are called before the flow is completed). Distributed load testing will take both longer to setup, and more expensive then 1 and 2.

Distributed load testing

Our goto tool here is jmeter. Jmeter might sound specific to java but it is not so. It has been a good investment of time to pick up jmeter. Plus since jmeter is a very old tool, its very easy to find a lot of helpful guides and setups.

The goal of 3 would be to get a good estimate of Requests per second per machine (do take note of the machine specs). Once you have this number, you can project your requirements based on expected traffic/peak traffic. Also helps to decide how much vertical/horizontal scaling you will need.

For vertical scaling

Choice of language

Once you have identified the main flows you need to focus for performance, you can rewrite those pieces is a compiled language (this is almost always a low hanging fruit to get performance boost). But this is only when your app is performing a lot of stuff and not just a middle layer to talk to the database.

Architecture

You are already at a good stage where the setup is across a few services instead of one huge service. Try to not get too many services either. You want to take into account each network call (this will add a minimum of 10ms for every call). Sometimes coupling two services will give you the gains.

Going into memory usage

It might be harder to get these numbers for nodejs, but on a code level, there can be certain low hanging fruits as well. Identify hotspots for gc or in many cases the algorithm and data structure itself will matter as well.

For horizontal scaling

Load balancing

checkout the arrangement of your http calls. How many routers is the service going thru after entering your network. How do plan to evenly distribute the load. A round-robin approach is okay for the start, but as you will move on, identify which calls are heavy, which ones are not and configure the router accordingly.

Caching

If the system is non-transactional/non-payment system, where 99% accuracy is as good as 100%, then caching helps a lot. Be aware of all the edge cases for caching though. Performance comes only second to the actual business. **A bad caching configuration can go really bad, including lower performance and maybe wrong results.**Also be aware of caches which are already present (a lot of frameworks/orms lately include hidden caching)

You can apply caching at all levels: database level, functional level, controller level, api level.

Database

for a majority-read/minor-writes system, you can leverage database slaves to increase through-put (this sort of falls under caching, although its a different type). Also make sure you account for what happens when a replica goes down.

- If the case is opposite, major-write/minor-reads, you will want to check what database you are using (having multiple master is helpful)

- Also how you are storing data in that particular db. How much data needs to be written to disk for each DB transaction.

Above is a brief summary of approach. For each point above, you should also consider the downsides

what if one machine goes down
what if 5 go down
what if network latency increases for some reason
what if io speed goes down for some reason

Some general points for above:

Make note of all the calls going out of your system (to a third party)
If a third party cannot guarantee their RPS/QPS, you might want to shift to async flows (queues)
Keep note of which of your services are IO heavy, vs CPU heavy vs Network heavy (many times you can couple two of these in a single service)
Try to minimise mocks when load testing.