LOADTESTER
Published 2026-05-06 · By Kristian Razum · Editorial policy

Load Testing Tool Benchmarks: Methodology

This page documents how we benchmark HTTP load testing tools — the target service, hardware, scenarios, metrics, and how raw data is published. Benchmark numbers without methodology are not benchmarks; they are claims. The goal of publishing the methodology separately is to make the runs reproducible by anyone with a Hetzner account and a few hours.

Status. DRAFT — first run pending. Methodology is fixed before any tool is run, then locked. The first published run will append a results section to this page and link the raw CSVs. We will not edit methodology after a result is published; if it changes, that is a new benchmark.

What this benchmark is and is not

The benchmark answers one question: given identical conditions, how many requests per second can each tool sustain against a known target before its own load generator becomes the bottleneck, and what does its latency reporting look like under that load?

It is not:

Target service

The target is a minimal HTTP/1.1 server written in Go that does the cheapest thing possible: returns 200 OK with a fixed 256-byte JSON body, no logging, no allocation per request beyond what the standard library forces. Source is published in the benchmark repo.

Why a synthetic target instead of a real application? Because the question we are answering is about the load-generator, not the system under test. A real application introduces variance from the database, the runtime, the OS scheduler, and external dependencies. Those are interesting questions, but they belong to a different benchmark.

The target runs on a separate machine from each tool, on a 1 Gbps private network, and is warmed for 60 seconds before each run. Server CPU is monitored; any run where target CPU exceeds 70% is discarded as a target-bound run.

Target reproducibility

git clone https://github.com/cloud-native/loadtester-benchmarks
cd loadtester-benchmarks/target
go build -o target .
./target -addr :8080 -body 256

Hardware

Both the target and the load generator run on Hetzner Cloud CCX33 instances (8 dedicated vCPU, 32 GB RAM, Ubuntu 24.04 LTS) in the same datacenter (HEL1). Network limit is documented as 10 Gbps shared but realistic sustained throughput is closer to 1–2 Gbps; we cap test rates well below this.

RoleInstancevCPURAMOS
TargetCCX33832 GBUbuntu 24.04
GeneratorCCX33832 GBUbuntu 24.04

Kernel and TCP tunables (applied identically to both):

net.core.somaxconn = 65535
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1
fs.file-max = 1048576
ulimit -n 1048576

Scenarios

Each tool runs the same three scenarios. All use HTTP/1.1, keep-alive enabled, GET requests against /api/health.

ScenarioTarget RPSDurationConnectionsPurpose
S1 — Steady 1k1,000120s50Baseline: every modern tool should handle this without strain
S2 — Steady 10k10,000120s500Stress: separates lightweight CLIs from production-grade generators
S3 — Burst0 → 20,000 → 0 ramp180s1,000Tests how each tool handles ramp behavior and reporting under varying load

Each scenario runs 5 times per tool, with a 30-second cooldown between runs. We report median, p5, and p95 across the 5 runs (not within a single run — that is a separate metric we also publish).

Tools and versions

ToolVersionInvocation style
LoadTesterTBDAPI-triggered run, single-worker config
k6TBDk6 run script.js
JMeterTBDNon-GUI jmeter -n -t plan.jmx
VegetaTBDvegeta attack -rate=N -duration=120s
heyTBDhey -z 120s -c 50 -q N
wrkTBDwrk -t8 -c500 -d120s
ApacheBenchTBDab -n N -c 50

Exact invocations for each tool are committed in the benchmark repo under scenarios/<tool>/ alongside any required config.

What we measure

From the load generator (per tool):

From the target (validation):

Cross-validation: we compare the generator's "achieved RPS" with the target's "received RPS." Discrepancies above 2% are flagged in the published results — they usually mean the generator is over- or under-counting.

Why latency numbers from different tools are not directly comparable. Tools differ in how they measure latency. Some measure from send to last byte received; some include connection setup; some sample only a subset of requests. We document each tool's measurement model in the per-tool notes, and we cross-check against the target's own histogram (kernel-level tcp_info sampling on the server side) where possible. Published latency tables include a "measurement model" column so the comparison is honest.

How results are published

For every published benchmark, we commit:

  1. Raw CSVs for every run (per-request timing where available, per-second buckets where not).
  2. The exact tool invocation, scenario script, and config files used.
  3. Generator and target syslog excerpts for the run window.
  4. Hetzner instance ID and snapshot label so the run is auditable.
  5. Discarded-run log with the reason for each discard (target-bound, network blip, etc.).

All of this lives in cloud-native/loadtester-benchmarks. Issues and pull requests are open. If you find a methodology flaw or believe we have under-tuned a specific tool, file an issue with the proposed change and a re-run will follow.

Conflict of interest disclosure

This benchmark is run by LoadTester, which is one of the tools being measured. That is the unavoidable bias. The mitigations are: (1) the methodology is fixed before any run; (2) all raw data is published; (3) the per-tool invocations are reviewable and tunable by the community; (4) when LoadTester does worse than another tool on a given metric, the result section says so without softening. We treat the benchmark as a credibility instrument, not a marketing instrument — a benchmark that can only ever favor the publisher is worse than no benchmark at all.

Results

Pending first run. When published, results will appear below as a dated subsection (e.g., 2026-06 — initial run) with the tables, charts, and raw-data links. We do not back-fill or quietly update old results; corrections are appended as new dated subsections with a note explaining what changed.

Methodology changelog