This repository contains the tool to generate proof-of-concept exploits for vulnerable npm packages, in addition, it contains the datasets used for the evaluation.
- Clone the repository:
git clone https://github.com/sola-st/PoCGen
cd PoCGen- Install dependencies:
You need Node.js and npm installed.
npm installYou need
dockerinstalled for steps below.
- Option 1: Use the pre-built docker image (recommended):
docker pull aryaze/pocgen:v1.0
docker tag aryaze/pocgen:v1.0 gen-poc_mnt- Option 2: Build the docker images (this may take a while):
docker build -t patched_node -f patched_node.Dockerfile .
docker build -t gen-poc_mnt .The repository contains a wrapper script to run the tool in a docker container.
The script requires an .env file in the current directory with the following content:
OPENAI_API_KEY=sk-proj-xxx # required for LLM calls
GITHUB_API_KEY=github_pat_xxx # required for fetching vulnerabilities from GitHub Security Advisories database
The only required argument is the vulnerability ID, which can be a GitHub Advisory ID or a Snyk ID. The tool will automatically fetch the vulnerability report from the corresponding API/ scrape it from the website.
Run this script from the repository root:
./run-mnt.sh output node index.js create -v GHSA-m7p2-ghfh-pjvxThis will create a test for GHSA-m7p2-ghfh-pjvx in
./output/GHSA-m7p2-ghfh-pjvx/test.js.
For most vulnerabilities, it is recommended to run the test using the provided docker image:
./run-mnt.sh output node --test /output/<advisoryId>/test.jsFor ReDoS vulnerabilities, the test should be run with the following flags:
./run-mnt.sh output node --test --enable-experimental-regexp-engine-on-excessive-backtracks --regexp-backtracks-before-fallback=30000 output/<advisoryId>/test.jsFor vulnerabilities that involve long-running tasks (e.g. web servers), run the test with the following flags:
./run-mnt.sh output node --test --test-force-exit /output/<advisoryId>/test.jsThe repository contains
- the source code of PoCGen in
src/analysis: static and dynamic analyses used in PoCGenmodel: LLM models and utilitiesmodels: Javascript classesnpm: utilities for dealing with npm packagespipeline: the main pipeline for generating PoC exploitsprompting: prompt templates, and prompt generation utilitiesresources: CodeQL query templates, and the command injection coderunners: various runners (e.g., the agent) used to generate PoC exploitsutils: general utility functionsvulnerability-databases: scripts to retrieve vulnerability data from various sources
- the datasets used for the evaluation in
dataset/ - the scripts to summarize, aggregate, and visualize the results in
scripts/ - some helper functions in
lib/
We provide 3 levels for reproducing results based on the time and monetary costs:
- Inspecting and visualizing the results based on logs from our runs (no LLM costs involved, and very low execution time).
- Running PoCGen on a single vulnerability (low LLM costs, and low execution time).
- Running PoCGen on the full dataset (high LLM costs, and high execution time).
To follow on level 1, download the evaluation results from Zenodo, and then follow the instructions labeled with "level 1" below.
To evaluate on level 2, follow the instructions in the previous sections.
To evaluate on level 3, follow the instructions in the "Setup" section, and then follow the instructions labeled with "level 3" below.
(level 3) To run PoCGen on the SecBench.js dataset, use the following command:
./run-mnt.sh output node index.js pipeline -v dataset/SecBench.js/*\.allThis creates a directory under output with the IDs of each vulnerability as a subdirectory.
Each subdirectory contains the vulnerable package, an execution log file named output_*.log (showing the steps and execution outputs), an LLM interaction log file named prompt.json (showing the LLM interactions with all the metadata), a json file contaning all the information about the attempt named RunnerResult_*.json, and the proof-of-concept exploit as a test file named test.js.
(level 3) To run Mini-SWE-agent on the SecBench.js dataset, use the following command:
./run-mnt.sh output node index.js pipeline --runner RunnerMiniSWEAgent -v dataset/SecBench.js/*\.allThis creates the same directory structure, with the difference that it creates a mini_swe_workspace subdirectory for each vulnerability and stores the PoC exploit in it as poc.js.
(level 1) The generated PoC exploits can be found in eval_results/pocgen_*/<vulnerability_id>/test.js and eval_results/minisweagent_*/<vulnerability_id>/mini_swe_workspace/poc.js.
(level 3) A refiner can be specified using --refiner <refiner>. I.e.,
./run-mnt.sh output node index.js pipeline -v dataset/SecBench.js/*\.all --refiner C0RefinerThe following values were used in the evaluation:
noTaintfor noTaintC7Refinerfor noUsageSnippetsC6Refinerfor noFewShotC3Refinerfor noDebuggerC2Refinerfor noErrorRefiner
(level 1) The generated PoC exploits of the ablation study can be found in eval_results/pocgen_<refiner name>/<vulnerability_id>/test.js.
(level 1 & 3)
For each vulnerability the token costs are stored in the RunnerResult_*.json file under the model.totalPromptTokens and model.totalCompletionTokens fields for request and response tokens respectively.
To get the average token costs for PoCGen, you can run
python scripts/count_tokens.py <output_directory><output_directory> can be any of the subdirectories in eval_results of the format pocgen_*.
To get the average costs for Mini-SWE-agent, you can run
python scripts/count_agent_tokens.py <output_directory><output_directory> can be any of the subdirectories in eval_results of the format minisweagent_*.
(level 3) To run PoCGen on vulnerabilities reported in 2025-2026, use the following command:
./run-mnt.sh output node index.js pipeline -v dataset/ghsa_2025-2026.txt(level 1) The generated PoC exploits can be found in eval_results/pocgen_2025-2026/<vulnerability_id>/test.js.