Skip to content

Timeouts / Segfaults or Endless run on Kubernetes pods #204

@Leyart

Description

@Leyart

Describe the bug
We are trying to run opengrep with the following cli setting in a kubernetes pod of our build agents running as a pipeline step in buildkite:

opengrep scan --error --metrics=off --sarif-output=ogrep.sarif --config "rulesdir/community" --jobs=4 --max-memory=2000 --timeout=3 --timeout-threshold=3 path/to/repository

The repository to be scanned is a monorepo.

The rules are downloaded via semgrep-rules-manager in a cached folder

Depending on the run, 3 different things might be happening:

  • Cancelled by asyncio
Traceback (most recent call last):
  File "asyncio/subprocess.py", line 135, in wait
  File "asyncio/base_subprocess.py", line 235, in _wait
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "asyncio/tasks.py", line 490, in wait_for
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "semgrep/commands/wrapper.py", line 37, in wrapper
  File "semgrep/commands/scan.py", line 811, in scan
  File "semgrep/tracing.py", line 263, in inner
  File "semgrep/run_scan.py", line 789, in run_scan
  File "semgrep/tracing.py", line 263, in inner
  File "semgrep/run_scan.py", line 338, in run_rules
  File "semgrep/core_runner.py", line 1169, in invoke_semgrep_core
  File "semgrep/core_runner.py", line 1142, in _run_rules_direct_to_semgrep_core
  File "semgrep/core_runner.py", line 1107, in _run_rules_direct_to_semgrep_core
  File "semgrep/core_runner.py", line 1025, in _run_rules_direct_to_semgrep_core_helper
  File "semgrep/tracing.py", line 263, in inner
  File "semgrep/core_runner.py", line 497, in execute
  File "asyncio/runners.py", line 44, in run
  File "asyncio/base_events.py", line 647, in run_until_complete
  File "semgrep/core_runner.py", line 450, in _stream_exec_subprocess
  File "asyncio/tasks.py", line 492, in wait_for
asyncio.exceptions.TimeoutError
  • Crashing with a segfault
    /path/to/my/calling/script.sh: line 104: 2299804 Segmentation fault (core dumped)

  • Endless run for hours (up to 2 days, at that point our agents get killed by our infrastructure due to running for too long)

With the exact same CLI parameters, codebase analyzed and tool version running on a Apple M3 Pro, the scan is completed within around 30 minutes

To Reproduce
Unfortunately very hard to do so since the repository is private.
The kubernetes pods don't have limits or requests on resources.
I assume we are not the only ones trying to run semgrep / opengrep in K8s but I was not able to find other similar reports.
--debug or --verbose options are not helping to identify the root cause of the issue
The repository looks like this when running a scan:

Image

We already followed the suggestions in https://semgrep.dev/docs/kb/semgrep-code/semgrep-scan-troubleshooting#memory-usage-issues-oom-errors

Running with --jobs=1 seems to reduce the crash occurrences, but guarantees that we end up in an "endless" scan

Expected behavior
I would expect the tool to run in a reasonable time (at most a couple of hours), similarly to what happens on my local machine (definitely not 17/18 hours)

What is the priority of the bug to you?

  • P0: blocking your adoption of Opengrep or workflow
  • P1: important to fix or quite annoying
  • P2: regular bug that should get fixed

Environment
Opengrep 1.0.2
Python 3.11.2
ulimit unlimited
cat /sys/fs/cgroup/cpuset.cpus.effective 0-15
free
total used free shared buff/cache available
Mem: 65843208 45054644 8008620 29752 13561676 20788564
Swap: 0 0 0

Use case
Solving this issue will enable us to run daily or weekly jobs on our entire codebase to get a full report and track the improvements and quality of our source code

Thanks a lot for the support, and let me know if there is anything I can provide additionally in terms of logs or different cli parameters that I could try to improve the situation

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions