-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Poor RPC performance #5821
Description
I'm running a Westend RPC node that exhibits a poor performance serving RPC calls (both WS and HTTP).
RPC calls execution time varies greatly on whether it is first or subsequent call. The most notable example being state_getMetadata method that can take anywhere from 8ms to 18s. I recorded two profiles using perf during the execution of state_getMetadata method and rendered a flamegraph:
-
Slow (first) RPC call
Source: https://transfer.sh/e52S3b/perf-slow.svg -
Fast (subsequent) RPC call
Source: https://transfer.sh/RQXIUs/perf-fast.svg
On the first flamegraph most of the CPU time was spent compiling a WASM code and only 2.39% of the time actually executing it. During the WASM compilation I observe a high CPU utilization (100% per vCPU, depending on the number of in-flight requests).
I plotted a graph for the median distribution of the RPC calls execution time over the past two days (y-axis being microseconds)

state_getMetadata method alone is being executed on average 1.23 times per second.

Each time a method execution falls into a slow RPC call bucket and requires WASM compilation, a node uses 100% of the vCPU. As there are other methods being called it significantly increases the resources usage and impacts end-user experience. It also leads to a possibility to cause a DoS by utilizing all of the available resources.
The above behavior is observed on both GCP VM instance and inside a container running in a K8s cluster
System info 1 (VM):
Debian 10
Linux kernel: 4.19.0-18-cloud-amd64
Polkadot binary v0.9.26
CLI parameters: polkadot --detailed-log-output --name westend-rpc-1 --unsafe-ws-external --rpc-methods Safe --rpc-cors * --chain westend --listen-addr=/ip4/0.0.0.0/tcp/30333 --public-addr=/ip4/<redacted>/tcp/30333 --in-peers 25 --out-peers 25 --db-cache 512 --telemetry-url wss://<redacted> --pruning=archive -lsync=trace,rpc_metrics=debug --prometheus-external --prometheus-port 9615 --ws-port 9944 --ws-max-connections 5000 --rpc-port 9933
System info 1 (container):
Google Kubernetes engine: v1.22.10-gke.600
Linux kernel 5.10.109+
Polkadot image v0.9.26
CLI parameters: polkadot --name=westend-rpc-node-0 --base-path=/data/ --chain=westend --database=rocksdb --pruning=archive --prometheus-external --prometheus-port 9615 --unsafe-rpc-external --unsafe-ws-external --rpc-methods=safe --rpc-cors=all --ws-max-connections 5000 --telemetry-url wss://<redacted> --db-cache 512 --in-peers 25 --out-peers 25 -lrpc_metrics=debug --listen-addr=/ip4/0.0.0.0/tcp/30333