-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Rocket.Chat vs. NodeJS 8.11.1 (or rather > 8.9.4): Random SEGV (segmentation violation) #10331
Description
Description:
This is happening when running Rocket.Chat 0.61.2 as well as 0.63.0 on NodeJS 8.11.1. Both versions don't exhibit this behaviour when run on NodeJS 8.9.4. I had NodeJS 8.11.1 and Rocket.Chat running for a while on my testing instance, which didn't exhibit this behaviour. This leads me to think that the suspect is NodeJS 8.11.1 in combination with:
- either the load of the Rocket.Chat server
- or the data in MongoDB
Server Setup Information:
- Version of Rocket.Chat Server: 0.63.0 & 0.61.2 (this may affect other versions)
- Operating System: Oracle Linux 7
- Deployment Method(snap/docker/tar/etc): tar
- Number of Running Instances: 1
- DB Replicaset Oplog: -
- Node Version: 8.11.1
- mongoDB Version: 2.6.12
Steps to Reproduce:
I can only guess here:
- run a decently sized Rocket.Chat (in terms of amount of users) server on NodeJS 8.11.1
Expected behavior:
No SEGV
Actual behavior:
SEGV. Restart (due to systemd unit definition) of Rocket.Chat at random intervals
Relevant logs:
strace of the NodeJS process is available, but I will only share it as a last resort with one of the Rocket.Chat developers, as it possibly contains private/sensitive information.
Last lines in strace before SEGV:
read(12, "\27\3\3\0\265\252\244\276\253\262\345\32\335\230b\255\311H\331p\2200\10\245\222.\26\313\2035\210\327"..., 16384) = 9462
rt_sigprocmask(SIG_SETMASK, [], [], 8) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
+++ killed by SIGSEGV +++
auditd-log of failing node processes:
ANOM_ABEND: Triggered when a processes ends abnormally (with a signal that could cause a core dump, if enabled).
root@chat01 [/var/log] # ausearch --comm node
----
time->Wed Apr 4 14:35:17 2018
type=ANOM_ABEND msg=audit(1522845317.790:75): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=1043 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 14:47:00 2018
type=ANOM_ABEND msg=audit(1522846020.846:1164): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=4595 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 14:49:03 2018
type=ANOM_ABEND msg=audit(1522846143.096:1227): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5458 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 14:50:34 2018
type=ANOM_ABEND msg=audit(1522846234.113:1269): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5562 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 14:57:58 2018
type=ANOM_ABEND msg=audit(1522846678.970:1448): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5643 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 14:58:05 2018
type=ANOM_ABEND msg=audit(1522846685.473:1460): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5878 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 14:59:29 2018
type=ANOM_ABEND msg=audit(1522846769.954:1477): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5929 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 15:02:32 2018
type=ANOM_ABEND msg=audit(1522846952.269:1538): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=6007 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 15:23:04 2018
type=ANOM_ABEND msg=audit(1522848184.122:2496): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=9055 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 15:35:35 2018
type=ANOM_ABEND msg=audit(1522848935.572:3405): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=11501 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 15:40:33 2018
type=ANOM_ABEND msg=audit(1522849233.904:3470): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=11899 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 15:43:02 2018
type=ANOM_ABEND msg=audit(1522849382.823:3519): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=12040 comm="node"
/var/log/messages (notice, that the times are identical to the auditd-logs above)
root@chat01 [~] # grep SEGV /var/log/messages
Apr 4 14:35:17 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 14:47:00 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 14:49:03 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 14:50:34 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 14:57:59 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 14:58:05 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 14:59:29 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 15:02:32 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 15:23:04 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 15:35:35 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 15:40:33 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 15:43:02 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV