Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Poor RPC performance #5821

@bakhtin

Description

@bakhtin

I'm running a Westend RPC node that exhibits a poor performance serving RPC calls (both WS and HTTP).

RPC calls execution time varies greatly on whether it is first or subsequent call. The most notable example being state_getMetadata method that can take anywhere from 8ms to 18s. I recorded two profiles using perf during the execution of state_getMetadata method and rendered a flamegraph:

  1. Slow (first) RPC call
    perf-slow
    Source: https://transfer.sh/e52S3b/perf-slow.svg

  2. Fast (subsequent) RPC call
    perf-fast
    Source: https://transfer.sh/RQXIUs/perf-fast.svg

On the first flamegraph most of the CPU time was spent compiling a WASM code and only 2.39% of the time actually executing it. During the WASM compilation I observe a high CPU utilization (100% per vCPU, depending on the number of in-flight requests).

I plotted a graph for the median distribution of the RPC calls execution time over the past two days (y-axis being microseconds)
image

state_getMetadata method alone is being executed on average 1.23 times per second.
image

Each time a method execution falls into a slow RPC call bucket and requires WASM compilation, a node uses 100% of the vCPU. As there are other methods being called it significantly increases the resources usage and impacts end-user experience. It also leads to a possibility to cause a DoS by utilizing all of the available resources.

The above behavior is observed on both GCP VM instance and inside a container running in a K8s cluster
System info 1 (VM):
Debian 10
Linux kernel: 4.19.0-18-cloud-amd64
Polkadot binary v0.9.26
CLI parameters: polkadot --detailed-log-output --name westend-rpc-1 --unsafe-ws-external --rpc-methods Safe --rpc-cors * --chain westend --listen-addr=/ip4/0.0.0.0/tcp/30333 --public-addr=/ip4/<redacted>/tcp/30333 --in-peers 25 --out-peers 25 --db-cache 512 --telemetry-url wss://<redacted> --pruning=archive -lsync=trace,rpc_metrics=debug --prometheus-external --prometheus-port 9615 --ws-port 9944 --ws-max-connections 5000 --rpc-port 9933

System info 1 (container):
Google Kubernetes engine: v1.22.10-gke.600
Linux kernel 5.10.109+
Polkadot image v0.9.26
CLI parameters: polkadot --name=westend-rpc-node-0 --base-path=/data/ --chain=westend --database=rocksdb --pruning=archive --prometheus-external --prometheus-port 9615 --unsafe-rpc-external --unsafe-ws-external --rpc-methods=safe --rpc-cors=all --ws-max-connections 5000 --telemetry-url wss://<redacted> --db-cache 512 --in-peers 25 --out-peers 25 -lrpc_metrics=debug --listen-addr=/ip4/0.0.0.0/tcp/30333

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions