-
Notifications
You must be signed in to change notification settings - Fork 6.4k
UDP issues - Tons of RcvbufErrors and InErrors (same value for each) #4086
Description
Alert: 1m ipv4 udp receive buffer errors | 23650 errors
I have a cluster of three servers (Elasticsearch Logstash Kibana) receiving netflow/sflow/ipfix data. Each server has nginx load balancing a ton of UDP data among the three servers.
Example:
A router will send netflow to one of the servers on UDP port 9995 and will distribute it (with nginx stream) among all of the servers on udp/2055. Logstash is configured to listen on UDP port 2055.
This all works fine and without using netdata one would assume it was working perfectly but I'm seeing the following issue:
I know this is not a linux tuning help forum but I've been researching this for the better part of 6 hours now and I am not making any progress whatsoever. I've tried tuning things with sysctl absolutely no effect. The same graph pattern continues relentlessly and the RcvBufErrors and InErrors peak at about 700/events per second. Occasionally I'll see a spike or a dip while making changes slowly but the same pattern always prevails with the same peak values.
The values I've tried increasing with sysctl and their current values are:
net.core.rmem_default = 8388608
net.core.rmem_max = 33554432
net.core.wmem_default = 52428800
net.core.wmem_max = 134217728
net.ipv4.udp_early_demux = 0 (was 1)
net.ipv4.udp_mem = 764304 1019072 1528608
net.ipv4.udp_rmem_min = 18192
net.ipv4.udp_wmem_min = 8192
net.core.netdev_budget = 10000
net.core.netdev_max_backlog = 2000
Note I'm also getting the 10min netdev budget ran outs | 5929 events issue as well but this is less of a concern. That's why I've increased net.core.netdev_budget and net.core.netdev_max_backlog.
Any ideas on what I can do to find out what I need to change and how to actually determine what they should be set to would be appreciated.
Thanks for reading.
