haproxy icon indicating copy to clipboard operation
haproxy copied to clipboard

smtpchk option does not send a QUIT resulting in thousands of Remote(SocketError) log messages per day on Exchange servers

Open wrightlaw opened this issue 3 years ago • 10 comments

Detailed Description of the Problem

When using smtpchk for SMTP service checks, HAProxy does not send a QUIT at the end of the transaction and simply drops the TCP connection. This results in Microsoft Exchange (in our case) recording thousands of Remote(SocketError) log messages per day as it sees each HAProxy service check as a failed connection.

Expected Behavior

HAProxy should cleanly terminate the transaction with a QUIT after the test.

Steps to Reproduce the Behavior

Configure an SMTP backend with smtpchk service checking and point to a Microsoft Exchange server Monitor Exchange SMTP Receive Log for Remote(SocketError) messages

Do you have any idea what may have caused this?

No code exists to send a QUIT command to SMTP target

Do you have an idea how to solve the issue?

Believe src/tcpcheck.c needs updating with appropriate code to send a QUIT before closing the TCP connection

What is your configuration?

global
    maxconn         1048576
    stats socket    /var/run/haproxy.stat mode 600 level admin
    log             127.0.0.1:514 local2
    chroot          /var/empty
    pidfile         /var/run/haproxy.pid
    user            haproxy
    group           haproxy
    stats socket    /var/lib/haproxy/stats

resolvers dns
    parse-resolv-conf
    timeout retry   10s
    timeout resolve 10s
    hold valid 10s
    hold nx 10s
    hold other 10s
    hold obsolete 10s
    accepted_payload_size 8192

defaults
    mode            tcp
    log             global
    maxconn         256000
    timeout connect 5000ms
    timeout client  5m
    timeout server  5m
    default-server on-marked-down shutdown-sessions init-addr last,none resolve-opts allow-dup-ip
    option tcplog

backend exch-smtp-25-be
description Service ID XX exch-smtp backend
mode tcp
option smtpchk HELO xxxxxx.xxxxxxx.xxxxxxx.xxxxxx
balance leastconn
server xxxxxx.xxxxxxx.xxxxxxx.xxxxxx_25 xxxxxx.xxxxxxx.xxxxxxx.xxxxxx:25 check resolvers dns fall 6
server xxxxxx.xxxxxxx.xxxxxxx.xxxxxx_25 xxxxxx.xxxxxxx.xxxxxxx.xxxxxx:25 check resolvers dns fall 6
server xxxxxx.xxxxxxx.xxxxxxx.xxxxxx_25 xxxxxx.xxxxxxx.xxxxxxx.xxxxxx:25 check resolvers dns fall 6
server xxxxxx.xxxxxxx.xxxxxxx.xxxxxx_25 xxxxxx.xxxxxxx.xxxxxxx.xxxxxx:25 check resolvers dns fall 6
server xxxxxx.xxxxxxx.xxxxxxx.xxxxxx_25 xxxxxx.xxxxxxx.xxxxxxx.xxxxxx:25 check resolvers dns fall 6
server xxxxxx.xxxxxxx.xxxxxxx.xxxxxx_25 xxxxxx.xxxxxxx.xxxxxxx.xxxxxx:25 check resolvers dns fall 6

Output of haproxy -vv

HAProxy version 2.4.2-553dee3 2021/07/07 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2026.
Known bugs: http://www.haproxy.org/bugs/bugs-2.4.2.html
Running on: Linux 3.10.0-1160.71.1.el7.x86_64 #1 SMP Tue Jun 28 15:37:28 UTC 2022 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
  OPTIONS = USE_PCRE=1 USE_PCRE_JIT=1 USE_THREAD=1 USE_LINUX_TPROXY=1 USE_OPENSSL=1 USE_ZLIB=1 USE_TFO=1 USE_NS=1 USE_SYSTEMD=1
  DEBUG   =

Feature list : +EPOLL -KQUEUE +NETFILTER +PCRE +PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL -LUA +FUTEX +ACCEPT4 -CLOSEFROM +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS -OT -QUIC -PROMEX -MEMORY_PROFILING

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=2).
Built with OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with network namespace support.
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 4.8.5 20150623 (Red Hat 4.8.5-44)

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTTP       side=FE|BE     mux=H2       flags=HTX|CLEAN_ABRT|HOL_RISK|NO_UPG
            fcgi : mode=HTTP       side=BE        mux=FCGI     flags=HTX|HOL_RISK|NO_UPG
       <default> : mode=HTTP       side=FE|BE     mux=H1       flags=HTX
              h1 : mode=HTTP       side=FE|BE     mux=H1       flags=HTX|NO_UPG
       <default> : mode=TCP        side=FE|BE     mux=PASS     flags=
            none : mode=TCP        side=FE|BE     mux=PASS     flags=NO_UPG

Available services : none

Available filters :
        [SPOE] spoe
        [CACHE] cache
        [FCGI] fcgi-app
        [COMP] compression
        [TRACE] trace

Last Outputs and Backtraces

No response

Additional Information

No response

wrightlaw avatar Aug 03 '22 19:08 wrightlaw

Yep there is no "QUIT" in the check https://github.com/haproxy/haproxy/blob/master/src/tcpcheck.c#L4234

You can create your own smtpchk with a explicit QUIT.

Something like this, untested.

tcp-check connect default linger
tcp-check expect rstring "^[0-9]{3}[ \r]" min-recv 4
# the smtp check string
tcp-check send "EHLO <YOUR_SMTP_HELLO>\r\n"
tcp-check expect rstring  "^2[0-9]{2}[- \r]" min-recv 4
tcp-check send "QUIT\r\n"

Please check the docs for the explanation of tcp-check syntax https://docs.haproxy.org/2.4/configuration.html#4-tcp-check%20connect

git001 avatar Aug 03 '22 21:08 git001

Yep there is no "QUIT" in the check https://github.com/haproxy/haproxy/blob/master/src/tcpcheck.c#L4234

You can create your own smtpchk with a explicit QUIT. [..]

Many thanks for these lines, these will be a great workaround (subject to testing, obvs!).

However, given that "smtpchk" is designed to be a quick "one stop shop" method for generating SMTP checks without lots of tcp-check lines, I believe there is a valid justification for a QUIT being added to the tcpcheck.c code so that users can get clean behaviour by default.

wrightlaw avatar Aug 04 '22 07:08 wrightlaw

However, given that "smtpchk" is designed to be a quick "one stop shop" method for generating SMTP checks without lots of tcp-check lines, I believe there is a valid justification for a QUIT being added to the tcpcheck.c code so that users can get clean behaviour by default.

Agree and it's a nice option to get a first patch from your site to fix the issue :smile:

git001 avatar Aug 04 '22 07:08 git001

However, given that "smtpchk" is designed to be a quick "one stop shop" method for generating SMTP checks without lots of tcp-check lines, I believe there is a valid justification for a QUIT being added to the tcpcheck.c code so that users can get clean behaviour by default.

Agree and it's a nice option to get a first patch from your site to fix the issue 😄

Lol was waiting for that ;) Seriously though, I'll have a go and prepare to be mocked for my efforts ;)

wrightlaw avatar Aug 04 '22 08:08 wrightlaw

The reason for this to be missing is that prior to the health-checks rewrite based on TCP check, it was not possible to perform multiple exchanges, there was only one resquest and one response thus we couldn't send the QUIT command. The checks were then converted as-is and the ability to send QUIT was totally overlooked!

So please give Alex's proposal a try, and feel free to suggest a patch if that works for you. Don't worry you won't be mocked at all, quite the opposite. When you're about to produce a patch, please have a look at the CONTRIBUTING file which explains how to properly format a commit message. Thanks!

wtarreau avatar Aug 07 '22 15:08 wtarreau

Note, I'm marking this one as "feedback required" since we're waiting for your test (and possibly patch) but I do definitely think that it should be improved if that works for you!

wtarreau avatar Aug 07 '22 15:08 wtarreau

Quick update, I haven't forgotten about this - am just going through my organisation's process to obtain permission to contribute. Please bear with me :)

wrightlaw avatar Sep 02 '22 14:09 wrightlaw

Great, thanks for the update!

wtarreau avatar Sep 02 '22 17:09 wtarreau

Have just submitted a patch to the mailing list, title includes the text "Issue 1812" which'll hopefully make it relatively easy to track down :)

wrightlaw avatar Sep 21 '22 09:09 wrightlaw

For reference: https://www.mail-archive.com/[email protected]/msg42745.html

TimWolla avatar Sep 21 '22 10:09 TimWolla

Thanks ! now merged

capflam avatar Sep 22 '22 13:09 capflam

The fix was backported as far as 2.2. Thanks !

capflam avatar Oct 25 '22 17:10 capflam