Timeouts / Segfaults or Endless run on Kubernetes pods

**Describe the bug**
We are trying to run opengrep with the following cli setting in a kubernetes pod of our build agents running as a pipeline step in buildkite:

```shell
opengrep scan --error --metrics=off --sarif-output=ogrep.sarif --config "rulesdir/community" --jobs=4 --max-memory=2000 --timeout=3 --timeout-threshold=3 path/to/repository
```

The repository to be scanned is a monorepo.

The rules are downloaded via semgrep-rules-manager in a cached folder

Depending on the run, 3 different things might be happening:

- Cancelled by asyncio

```shell
Traceback (most recent call last):
  File "asyncio/subprocess.py", line 135, in wait
  File "asyncio/base_subprocess.py", line 235, in _wait
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "asyncio/tasks.py", line 490, in wait_for
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "semgrep/commands/wrapper.py", line 37, in wrapper
  File "semgrep/commands/scan.py", line 811, in scan
  File "semgrep/tracing.py", line 263, in inner
  File "semgrep/run_scan.py", line 789, in run_scan
  File "semgrep/tracing.py", line 263, in inner
  File "semgrep/run_scan.py", line 338, in run_rules
  File "semgrep/core_runner.py", line 1169, in invoke_semgrep_core
  File "semgrep/core_runner.py", line 1142, in _run_rules_direct_to_semgrep_core
  File "semgrep/core_runner.py", line 1107, in _run_rules_direct_to_semgrep_core
  File "semgrep/core_runner.py", line 1025, in _run_rules_direct_to_semgrep_core_helper
  File "semgrep/tracing.py", line 263, in inner
  File "semgrep/core_runner.py", line 497, in execute
  File "asyncio/runners.py", line 44, in run
  File "asyncio/base_events.py", line 647, in run_until_complete
  File "semgrep/core_runner.py", line 450, in _stream_exec_subprocess
  File "asyncio/tasks.py", line 492, in wait_for
asyncio.exceptions.TimeoutError
``` 

- Crashing with a segfault
/path/to/my/calling/script.sh: line 104: 2299804 Segmentation fault      (core dumped) 

- Endless run for hours (up to 2 days, at that point our agents get killed by our infrastructure due to running for too long)

With the exact same CLI parameters, codebase analyzed and tool version running on a Apple M3 Pro, the scan is completed within around 30 minutes

**To Reproduce**
Unfortunately very hard to do so since the repository is private. 
The kubernetes pods don't have limits or requests on resources. 
I assume we are not the only ones trying to run semgrep / opengrep in K8s but I was not able to find other similar reports.
--debug or --verbose options are not helping to identify the root cause of the issue
The repository looks like this when running a scan:

<img width="442" alt="Image" src="https://github.com/user-attachments/assets/468dc620-3cbf-413e-8698-206d29e8f8ed" />

We already followed the suggestions in https://semgrep.dev/docs/kb/semgrep-code/semgrep-scan-troubleshooting#memory-usage-issues-oom-errors

Running with --jobs=1 seems to reduce the crash occurrences, but guarantees that we end up in an "endless" scan

**Expected behavior**
I would expect the tool to run in a reasonable time (at most a couple of hours), similarly to what happens on my local machine (definitely not 17/18 hours)

**What is the priority of the bug to you?**

- [X] P0: blocking your adoption of Opengrep or workflow
- [ ] P1: important to fix or quite annoying
- [ ] P2: regular bug that should get fixed

**Environment**
Opengrep 1.0.2
Python 3.11.2
ulimit unlimited
cat /sys/fs/cgroup/cpuset.cpus.effective 0-15
free
               total        used        free      shared  buff/cache   available
Mem:        65843208    45054644     8008620       29752    13561676    20788564
Swap:              0           0           0

**Use case**
Solving this issue will enable us to run daily or weekly jobs on our entire codebase to get a full report and track the improvements and quality of our source code

Thanks a lot for the support, and let me know if there is anything I can provide additionally in terms of logs or different cli parameters that I could try to improve the situation


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeouts / Segfaults or Endless run on Kubernetes pods #204

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Timeouts / Segfaults or Endless run on Kubernetes pods #204

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions