Tuesday, August 30, 2022

Quick, somewhat naïve but still useful: benchmarking HTTP services from the command line

Quite often, while developing HTTP services, you may find yourself looking for a quick and easy way to throw some load at them. The tools like Gatling and JMeter are the golden standard in the open-source world, but developing meaningful load test suites in both of them may take some. It is very likely you are going to end up with one of these but during the development, using more approachable tooling gives the invaluable feedback much faster.

The command line HTTP load testing tools is what we are going to talk about. There are a lot of them, but we will focus on just a few: ab, vegeta, wrk, wrk2, and rewrk. And since HTTP/2 is getting more and more traction, the tools that support both will be highlighted and awarded with bonus points. The sample HTTP service we are going to run tests against is exposing only single GET /services/catalog endpoint over HTTP/1.1, HTTP/2 and HTTP/2 over clear text.

Let us kick off with ab, the Apache HTTP server benchmarking tool: one of the oldest HTTP load testing tools out there. It is available on most Linux distributions and supports only HTTP/1.x (in fact, it does not implement HTTP/1.x fully). The standard set of parameters like desired number of requests, concurrency and timeout is supported.

$> ab -n 1000 -c 10 -s 1 -k  http://localhost:19091/services/catalog

This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
...
Completed 1000 requests
Finished 1000 requests

Server Software:
Server Hostname:        localhost
Server Port:            19091

Document Path:          /services/catalog
Document Length:        41 bytes

Concurrency Level:      10
Time taken for tests:   51.031 seconds
Complete requests:      1000
Failed requests:        0
Keep-Alive requests:    0
Total transferred:      146000 bytes
HTML transferred:       41000 bytes
Requests per second:    19.60 [#/sec] (mean)
Time per request:       510.310 [ms] (mean)
Time per request:       51.031 [ms] (mean, across all concurrent requests)
Transfer rate:          2.79 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       5
Processing:     3  498 289.3    497    1004
Waiting:        2  498 289.2    496    1003
Total:          3  499 289.3    497    1004

Percentage of the requests served within a certain time (ms)
  50%    497
  66%    645
  75%    744
  80%    803
  90%    914
  95%    955
  98%    979
  99%    994
 100%   1004 (longest request)

The report is pretty comprehensive but if your service is talking over HTTP/2 or is using HTTPS with self-signed certificates (not unusual in development), you are out of luck.

Let us move on to a bit more advanced tools, or to be precise - a family of tools, inspired by wrk: a modern HTTP benchmarking tool. There are no official binary releases of wrk available so you are better off building the bits from the sources yourself.

$> wrk -c 50 -d 10 -t 5 --latency --timeout 5s https://localhost:19093/services/catalog

Running 10s test @ https://localhost:19093/services/catalog
  5 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   490.83ms  295.99ms   1.04s    57.06%
    Req/Sec    20.79     11.78    60.00     81.29%
  Latency Distribution
     50%  474.64ms
     75%  747.03ms
     90%  903.44ms
     99%  999.52ms
  978 requests in 10.02s, 165.23KB read
Requests/sec:     97.62
Transfer/sec:     16.49KB

Capability-wise, it is very close to ab with slightly better HTTPS support. The report is as minimal as it could get, however the distinguishing feature of the wrk is the ability to use LuaJIT scripting to perform HTTP request generation. No HTTP/2 support though.

wrk2 is an improved version of wrk (and is based mostly on its codebase) that was modified to produce a constant throughput load and accurate latency details. Unsurprisingly, you have to build this one from the sources as well (and, the binary name is kept as wrk).

$> wrk -c 50 -d 10 -t 5 -L -R 100 --timeout 5s https://localhost:19093/services/catalog

Running 10s test @ https://localhost:19093/services/catalog
  5 threads and 50 connections
  Thread calibration: mean lat.: 821.804ms, rate sampling interval: 2693ms
  Thread calibration: mean lat.: 1077.276ms, rate sampling interval: 3698ms
  Thread calibration: mean lat.: 993.376ms, rate sampling interval: 3282ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   933.61ms  565.76ms   3.35s    70.42%
    Req/Sec       -nan      -nan   0.00      0.00%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%  865.79ms
 75.000%    1.22s
 90.000%    1.69s
 99.000%    2.61s
 99.900%    3.35s
 99.990%    3.35s
 99.999%    3.35s
100.000%    3.35s

  Detailed Percentile spectrum:
       Value   Percentile   TotalCount 1/(1-Percentile)

      28.143     0.000000            1         1.00
     278.015     0.100000           36         1.11
     426.751     0.200000           71         1.25
                       ....
    2969.599     0.996875          354       320.00
    3348.479     0.997266          355       365.71
    3348.479     1.000000          355          inf
#[Mean    =      933.606, StdDeviation   =      565.759]
#[Max     =     3346.432, Total count    =          355]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  893 requests in 10.05s, 150.87KB read
Requests/sec:     88.85
Transfer/sec:     15.01KB

Besides these noticeable enhancements, the feature set of wrk2 is largely matching the wrk's one, so HTTP/2 is out of the picture. But do not give up just yet, we are not done.

The most recent addition to wrk's family is rewrk, a more modern HTTP framework benchmark utility, which could be thought of as wrk rewritten in beloved Rust with HTTP/2 support backed in.

$> rewrk -c 50 -d 10s -t 5 --http2 --pct --host https://localhost:19093/services/catalog

Beginning round 1...
Benchmarking 50 connections @ https://localhost:19093/services/catalog for 10 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    494.62ms  281.78ms  5.56ms   1038.28ms
  Requests:
    Total:   972   Req/Sec:  97.26
  Transfer:
    Total: 192.20 KB Transfer Rate: 19.23 KB/Sec
+ --------------- + --------------- +
|   Percentile    |   Avg Latency   |
+ --------------- + --------------- +
|      99.9%      |    1038.28ms    |
|       99%       |    1006.26ms    |
|       95%       |    975.33ms     |
|       90%       |    942.94ms     |
|       75%       |    859.73ms     |
|       50%       |    737.45ms     |
+ --------------- + --------------- +

As you may have noticed, the report is very similar to the one produced by wrk. In my opinion, this is a tool which has a perfect balance of features, simplicity and insights, at least while you are in the middle of development.

The last one we are going to look at is vegeta, the HTTP load testing tool and library, written in Go. It supports not only HTTP/2, but HTTP/2 over clear text and has powerful reporting built-in. It heavily uses pipes for composing different steps together, for example:

$> echo "GET https://localhost:19093/services/catalog" | 
   vegeta attack -http2 -timeout 5s -workers 10 -insecure -duration 10s | 
   vegeta report

Requests      [total, rate, throughput]         500, 50.10, 45.96
Duration      [total, attack, wait]             10.88s, 9.98s, 899.836ms
Latencies     [min, mean, 50, 90, 95, 99, max]  14.244ms, 529.886ms, 537.689ms, 918.294ms, 962.448ms, 1.007s, 1.068s
Bytes In      [total, mean]                     20500, 41.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:500
Error Set:

You could also ask for histograms (with customizable reporting buckets), not to mention beautiful plots with plot command:

$> echo "GET http://localhost:19092/services/catalog" | 
   vegeta attack -h2c -timeout 5s -workers 10 -insecure -duration 10s | 
   vegeta report  -type="hist[0,200ms,400ms,600ms,800ms]"

Bucket           #    %       Histogram
[0s,     200ms]  86   17.20%  ############
[200ms,  400ms]  102  20.40%  ###############
[400ms,  600ms]  98   19.60%  ##############
[600ms,  800ms]  113  22.60%  ################
[800ms,  +Inf]   101  20.20%  ###############

Out of all above, vegeta is clearly the most powerful tool in terms of capabilities and reporting. It has all the chances to become the one-stop HTTP benchmarking harness even for production.

So we walked through a good number of tools, which one is yours? My advice would be: for HTTP/1.x, use ab; for basic HTTP/2 look at rewrk; and if none of these fit, turn to vegeta. Once your demands and sophistication of the load tests grow, consider Gatling or JMeter.

With that, Happy HTTP Load Testing!