Date: 2026-04-03
Load: 1000 req/s (constant arrival rate), 30 seconds, up to 1000 VUs
Tool: k6 (constant-arrival-rate, 1000 iter/s)
Endpoint: /bench — 10 SQL queries per request (PostgreSQL)
| Parameter | Value |
|---|---|
| Host OS | WSL2 (Linux 5.15.146.1-microsoft-standard-WSL2) |
| CPU | 16 cores |
| RAM | 7.8 GB |
| Docker | 29.2.0 |
| Database | PostgreSQL 16 Alpine (per-service, max_connections=500) |
| TrueAsync FrankenPHP | Octane Swoole (NTS) | Octane Swoole (ZTS) | Octane FrankenPHP | |
|---|---|---|---|---|
| PHP | 8.6.0-dev (ZTS) | 8.5.4 (NTS) | 8.5.4 (ZTS) | 8.5.4 (NTS) |
| Server | FrankenPHP v2.11.2 (true-async fork) | Swoole 6.2.0 | Swoole 6.2.0 | FrankenPHP (official) |
| Laravel | 13.2.0 | 13.2.0 | 13.2.0 | 13.2.0 |
| OPcache | validate_timestamps=0, 128MB | validate_timestamps=0, 128MB | validate_timestamps=0, 128MB | default |
| Worker model | Coroutines (libuv, buffer=50) | Processes (fork) | Threads (ZTS) | Processes (fork) |
| Port | 8083 | 8084 | 8084 | 8085 |
Each request executes 10 SQL queries against PostgreSQL:
| # | Type | Query |
|---|---|---|
| 1 | SELECT | User by ID (auth lookup) |
| 2 | SELECT | 10 posts by user (list) |
| 3 | INSERT | post_view record |
| 4 | UPDATE | post views_count +1 |
| 5 | SELECT | Aggregate — total views + last view |
| 6 | SELECT | Top 5 most viewed posts |
| 7 | SELECT | Post count per user, top 5 |
| 8 | SELECT | Another user profile |
| 9 | SELECT | 5 posts by the other user |
| 10 | SELECT | 10 most recent post_views |
Database: 100 users, 1000 posts, growing post_views table. Fresh seed before each test.
| Workers | TrueAsync | Swoole NTS | Swoole ZTS | FrankenPHP Octane |
|---|---|---|---|---|
| 4 | 989 | 183 | 185 | 189 |
| 8 | 993 | 342 | 341 | 346 |
| 12 | 990 | 483 | 476 | 489 |
| 16 | 987 | 599 | 601 | 556 |
| Workers | TrueAsync | Swoole NTS | Swoole ZTS | FrankenPHP Octane |
|---|---|---|---|---|
| 4 | 28 ms | 5,440 ms | 5,320 ms | 5,240 ms |
| 8 | 27 ms | 2,870 ms | 2,900 ms | 2,800 ms |
| 12 | 28 ms | 2,040 ms | 2,050 ms | 1,990 ms |
| 16 | 29 ms | 1,640 ms | 1,660 ms | 1,780 ms |
| Workers | TrueAsync | Swoole NTS | Swoole ZTS | FrankenPHP Octane |
|---|---|---|---|---|
| 4 | 60 ms | 5,630 ms | 5,520 ms | 5,390 ms |
| 8 | 70 ms | 3,110 ms | 2,990 ms | 3,190 ms |
| 12 | 76 ms | 2,240 ms | 2,220 ms | 2,160 ms |
| 16 | 79 ms | 1,900 ms | 1,790 ms | 1,910 ms |
| Workers | TrueAsync | Swoole NTS | Swoole ZTS | FrankenPHP Octane |
|---|---|---|---|---|
| 4 | 0.5% | 78% | 78% | 78% |
| 8 | 0.4% | 63% | 63% | 62% |
| 12 | 0.8% | 48% | 49% | 48% |
| 16 | 1.1% | 36% | 36% | 41% |
| Workers | TrueAsync | Swoole NTS | Swoole ZTS | FrankenPHP Octane |
|---|---|---|---|---|
| 4 | 147 MB | 481 MB | 512 MB | 357 MB |
| 8 | 248 MB | 555 MB | 604 MB | 366 MB |
| 12 | 271 MB | 633 MB | 687 MB | 388 MB |
| 16 | 326 MB | 762 MB | 765 MB | 421 MB |
| Workers | TrueAsync | Swoole NTS | Swoole ZTS | FrankenPHP Octane |
|---|---|---|---|---|
| 4 | 206 MB | 500 MB | 510 MB | 386 MB |
| 8 | 320 MB | 595 MB | 605 MB | 425 MB |
| 12 | 327 MB | 687 MB | 687 MB | 427 MB |
| 16 | 344 MB | 766 MB | 764 MB | 370 MB |
| TrueAsync | Swoole NTS | Swoole ZTS | FrankenPHP Octane | |
|---|---|---|---|---|
| MB/worker | ~15 | ~23 | ~21 | ~5 |
TrueAsync holds ~990 req/s (the k6 ceiling) with 4, 8, 12, or 16 workers. Median latency is stable at ~28 ms. Adding workers provides no benefit because the bottleneck is PostgreSQL throughput, not PHP concurrency.
With buffer=50, each worker runs up to 50 coroutines concurrently. At 4 workers that's 200 effective connections — more than enough to keep PostgreSQL saturated. The coroutine yields during every PDO::query() call, allowing other coroutines to run while waiting for the database response.
| Workers | NTS req/s | ZTS req/s | NTS mem | ZTS mem |
|---|---|---|---|---|
| 4 | 183 | 185 | 481 MB | 512 MB |
| 8 | 342 | 341 | 555 MB | 604 MB |
| 12 | 483 | 476 | 633 MB | 687 MB |
| 16 | 599 | 601 | 762 MB | 765 MB |
Threads give zero throughput improvement over processes.
This is expected because Swoole's thread mode (6.2) still runs each worker as a blocking event loop. Switching from fork() to pthread_create() changes the isolation boundary but not the concurrency model. Each worker — whether a process or a thread — still executes one request at a time, blocking on every SQL query.
Why ZTS uses slightly more memory (+5-10%):
- The ZTS (Zend Thread Safety) allocator wraps every global in a thread-local storage (TLS) lookup via
TSRMLSmacros. This adds per-thread overhead to every Zend engine structure. - Thread stacks are allocated from the same address space (no copy-on-write benefit that
fork()gets from shared read-only pages like the OPcache SHM segment). - Swoole's thread runtime allocates additional synchronization structures (mutexes, thread-local arena pools) that the process model doesn't need.
Why threads don't help throughput:
The bottleneck is not process creation overhead or context switching — it's I/O wait. Each worker spends ~95% of its time blocked in PDO::query() → poll()/epoll_wait() waiting for PostgreSQL. Threads and processes block identically on a file descriptor. The kernel scheduler treats both the same way. The only way to reclaim that idle time is to yield to another coroutine (what TrueAsync does) or to add more workers.
During testing we discovered that Swoole was running without OPcache in all initial tests. Swoole uses the cli SAPI, and opcache.enable_cli defaults to Off. TrueAsync and FrankenPHP Octane use the frankenphp SAPI where opcache.enable=On takes effect.
After enabling opcache.enable_cli=1 for Swoole, we re-ran all tests with full 3-phase warmup (10 sequential + 50 conn/10s + 200 conn/10s):
| Workers | Without OPcache req/s | With OPcache req/s | Change | Without mem | With mem |
|---|---|---|---|---|---|
| 4 | 185 | 162 | -12% | 512 MB | 471 MB |
| 8 | 341 | 266 | -22% | 604 MB | 558 MB |
| 12 | 476 | 397 | -17% | 687 MB | 643 MB |
| 16 | 601 | 514 | -14% | 765 MB | 719 MB |
OPcache reduced memory by ~5-8% but decreased throughput by 12-22%.
This counterintuitive result is confirmed with full warmup (OPcache reports 62-65 cached scripts, 9.6 MB used). The throughput loss is real, not a warmup artifact.
Why OPcache hurts Swoole:
- In a long-running worker (Octane), PHP scripts are compiled once at boot and the compiled bytecode stays in the worker's memory for the entire process lifetime. OPcache is designed for short-lived request-per-process models (PHP-FPM) where it prevents recompilation on every request. In Octane, there is no recompilation to prevent.
- With OPcache enabled, every
include/requiremust first check the SHM cache (hash lookup + lock acquisition). In ZTS mode, this requires thread-safe access to shared memory — additional mutex contention on every opcode fetch. - The 9.6 MB SHM segment adds memory-mapped pages that the kernel must manage, while providing no benefit since the bytecode is already resident in worker memory.
- PHP userland memory stayed at 20.6 MB regardless — OPcache doesn't reduce the heap used by Laravel's runtime objects (service container, config arrays, route table).
Conclusion: OPcache is counterproductive for Swoole in long-running worker mode. All Swoole numbers in the main comparison tables use the faster configuration (without OPcache, opcache.enable_cli=Off).
FrankenPHP via Octane and Swoole produce nearly identical throughput. Both are blocking servers with the same fundamental constraint: 1 request per worker at a time.
FrankenPHP Octane uses less memory because:
- FrankenPHP embeds PHP in a Go process; Go's runtime is more memory-efficient for the HTTP/scheduling layer
- No separate per-worker PHP process — workers run as goroutine-dispatched PHP threads within the single FrankenPHP binary
But this memory advantage doesn't translate to throughput because the PHP execution within each worker is still fully blocking.
All three blocking servers (Swoole NTS, Swoole ZTS, FrankenPHP Octane) scale linearly at approximately ~40-45 req/s per worker:
| Workers | Avg blocking req/s | Workers needed for 990 req/s |
|---|---|---|
| 4 | ~185 (46/worker) | 22 |
| 8 | ~343 (43/worker) | 23 |
| 12 | ~483 (40/worker) | 25 |
| 16 | ~585 (37/worker) | 27 |
Per-worker throughput slightly decreases at higher counts due to PostgreSQL contention and CPU scheduling overhead.
TrueAsync achieves with 4 workers what blocking servers need ~25 workers to match — while using 2-3x less memory.
For a single /bench request (10 SQL queries):
| Phase | TrueAsync (4w) | Swoole (4w) |
|---|---|---|
| PHP execution | ~5 ms | ~5 ms |
| SQL I/O wait (10 queries) | ~23 ms | ~23 ms |
| Queue wait | ~0 ms | ~5,400 ms |
| Total | ~28 ms | ~5,440 ms |
The PHP and SQL times are identical. The entire difference is queue wait — at 1000 req/s with 4 blocking workers, requests back up in the accept queue. TrueAsync has no queue because coroutines handle requests immediately.
Measured via memory_get_usage() / memory_get_usage(true) inside the /bench endpoint.
These values reflect one worker's PHP heap — the function reports memory for the current thread/process.
| Workers | TrueAsync | Swoole ZTS | FrankenPHP Octane |
|---|---|---|---|
| 4 | 2.5 MB | 22.2 MB | 5.6 MB |
| 8 | 1.7 MB | 22.2 MB | 4.7 MB |
| 16 | 2.2 MB | 22.2 MB | 4.8 MB |
| Workers | TrueAsync | Swoole ZTS | FrankenPHP Octane |
|---|---|---|---|
| 4 | 6 MB | 24 MB | 8 MB |
| 8 | 6 MB | 24 MB | 8 MB |
| 16 | 6 MB | 24 MB | 8 MB |
| Workers | TrueAsync | Swoole ZTS | FrankenPHP Octane |
|---|---|---|---|
| 4 | 277 MB | 508 MB | 401 MB |
| 8 | 286 MB | 600 MB | 417 MB |
| 16 | 308 MB | 765 MB | 403 MB |
Each Swoole worker — whether process or thread — bootstraps its own full copy of the Laravel application: service container, configuration, router, facades, middleware stack, database manager. This is by design: Octane calls $app->boot() in each worker independently, and the entire application state lives in worker-local PHP memory.
22 MB × 16 workers = 352 MB of PHP heaps alone, plus Swoole runtime, event loop, and OS overhead → 765 MB total.
TrueAsync coroutines share the Laravel bootstrap within the same worker thread. The application is booted once per worker. Each incoming request creates a lightweight coroutine that holds only request-scoped variables (route parameters, query results, response buffer). The coroutine's C-stack is 2 MB, but the PHP heap contribution is minimal because Laravel's service container, config, and router are shared.
4 workers × ~35 MB (Zend Engine + shared Laravel) + coroutine overhead → 277-308 MB total.
FrankenPHP embeds PHP inside a Go process. The Go runtime handles HTTP and worker dispatch efficiently. PHP workers show only 5-8 MB in userland because FrankenPHP's worker model reuses more of the shared memory. However, it's still blocking — each worker handles one request at a time.
| TrueAsync | Swoole NTS | Swoole ZTS | FrankenPHP Octane | |
|---|---|---|---|---|
| Peak req/s (16w) | 987 | 599 | 601 | 556 |
| req/s at 4 workers | 989 | 183 | 185 | 189 |
| P50 at 4 workers | 28 ms | 5,440 ms | 5,320 ms | 5,240 ms |
| Memory at 4w (load) | 206 MB | 500 MB | 510 MB | 386 MB |
| Memory at 16w (load) | 344 MB | 766 MB | 764 MB | 370 MB |
| Workers to reach 990 req/s | 4 | ~25 | ~25 | ~25 |
| Error rate | 0% | 0% | 0% | 0% |
- PHP versions differ: 8.6-dev (TrueAsync) vs 8.5.4 (Swoole/FrankenPHP) — inherent to TrueAsync being a PHP fork
- OPcache configured identically on TrueAsync and Swoole; default on Octane FrankenPHP
- No CPU/memory limits on containers
- PostgreSQL not tuned beyond
max_connections=500 - 0% error rate across all 16 tests
- Swoole ZTS uses
SWOOLE_THREADmode (confirmed:PHP_ZTS=1,SWOOLE_THREAD=1)


