otel: fixes and perf adjustments#4827
Conversation
Signed-off-by: Attila Szakacs <[email protected]>
Signed-off-by: Attila Szakacs <[email protected]>
Signed-off-by: Attila Szakacs <[email protected]>
Signed-off-by: Attila Szakacs <[email protected]>
Signed-off-by: Attila Szakacs <[email protected]>
to control how often we call batch cbs. Signed-off-by: Attila Szakacs <[email protected]>
Signed-off-by: Attila Szakacs <[email protected]>
Signed-off-by: Attila Szakacs <[email protected]>
3c45564 to
c56a0a2
Compare
| response_status = ::grpc::Status(::grpc::StatusCode::UNAVAILABLE, "Server is unavailable"); | ||
| break; | ||
| } | ||
| worker.post(msg); |
There was a problem hiding this comment.
What exactly will happen to the clients if we block and don't respond within a reasonable time?
Previously, we answered UNAVAILABLE, which seemed more native to the gRPC/HTTP environment.
Also, what was the trigger of this change?
There was a problem hiding this comment.
If a request timeout is not set on the client side, they will wait indefinitely.
If timeout is set, the client will get a DEADLINE_EXCEEDED result, which should be handled as a temporary error (we do so in the otel dest).
We have seen that syslog-ng "gives up" too quickly if it cannot forward the message. If we have multiple otel source workers, and they receive requests at the same exact time, the destination, which grabs those messages might not be able to grab them quickly enough. So even if we used a devnull destination, which can process 2 million messages per sec, sometimes it can put backpressure on the otel source, if there is a high transient load coming from it. We can raise log-iw-size() to buffer more messages and flatten these transients, but it raises the memory footprint considerably.
We could have a timeout for this, like trying blocking post, but surrendering after x seconds and returning unavailable. But I believe the current implementation is completely functional, too, at least on the server side. It would be nice to have options for request timeout on the client side (otel dest).
There was a problem hiding this comment.
Sounds reasonable. My only concern is that the original otel-source implementation was not necessarily a "thread-per-client" implementation. By blocking, I think we make it so.
There was a problem hiding this comment.
True.
However, concurrent-requests() makes us able to scale to more clients than workers(), which should be sufficient. With it the workers() option detaches from limiting the number of clients (gRPC will handle them for us if there are enough ServiceCalls registered), and just configures the processing parallelization.
This Pull Request introduces config grammar changessyslog-ng/be526b484d8913b98edbe043e0d5a783d6d0e7d7 -> alltilla/otel-perf-adjustment Details--- a/destination
+++ b/destination
bigquery(
+ channel-args(
+ <empty>
+ <string> => <number>
+ <string> => <string>
+ )
)
loki(
+ channel-args(
+ <empty>
+ <string> => <number>
+ <string> => <string>
+ )
)
opentelemetry(
+ channel-args(
+ <empty>
+ <string> => <number>
+ <string> => <string>
+ )
)
syslog-ng-otlp(
+ channel-args(
+ <empty>
+ <string> => <number>
+ <string> => <string>
+ )
)
--- a/source
+++ b/source
opentelemetry(
+ channel-args(
+ <empty>
+ <string> => <number>
+ <string> => <string>
+ )
+ concurrent-requests(<positive-integer>)
+ log-fetch-limit(<nonnegative-integer>)
)
syslog-ng-otlp(
+ channel-args(
+ <empty>
+ <string> => <number>
+ <string> => <string>
+ )
+ concurrent-requests(<positive-integer>)
+ log-fetch-limit(<nonnegative-integer>)
)
|
Signed-off-by: Attila Szakacs <[email protected]>
Signed-off-by: Attila Szakacs <[email protected]>
Signed-off-by: Attila Szakacs <[email protected]>
Signed-off-by: Attila Szakacs <[email protected]>
c56a0a2 to
bdb6990
Compare
|
Pushed style fixes. |
Signed-off-by: Attila Szakacs <[email protected]>
Signed-off-by: Attila Szakacs <[email protected]>
No description provided.