Snopy: Bridging Sample Denoising with Causal Graph Learning for Effective Vulnerability Detection

Prerequisites

Install the necessary dependencies before running the project:

Environment Requirements

torch==1.9.0
torchvision==0.10.0
pytorch-lightning==1.4.2
tqdm>=4.62.1
wandb==0.12.0
pytest>=6.2.4
wget>=3.2
split-folders==0.4.3
omegaconf==2.1.1
torchmetrics==0.5.0
joblib>=1.0.1

Thrid Party Liraries

Dataset

The Dataset we used in the paper:

FFmpeg+QEMU [1]: https://drive.google.com/file/d/1x6hoF7G-tSYxg8AFybggypLZgMGDNHfF

Big-Vul [2]: https://drive.google.com/file/d/1-0VhnHBp9IGh90s2wCNjeCMuy70HPl8X/view?usp=sharing

DiverseVul [3]: https://drive.google.com/file/d/12IWKhmLhq7qn5B_iXgn5YerOQtkH-6RG/view

Getting Started

This section gives the steps, explanations and examples for getting the project running.

1) Clone this repo

$ git clone https://github.com/SnopyArtifact/Snopy.git

2) Install Prerequisites

3) run `preprocess/dataset_process/extract_vulContext.py` for sample denoising

4) run `detectors/train.py` for model training and evaluation

Structure

├── README.md                         <- The top-level README for developers using this project.
├── detectors                         <- Detection model used to provide prediction labels.
│   ├── configuration.py              <- Configuration scripts.
│   ├── model.py                      <- Feature encoder.
│   └── train.py                      <- Detection model training.
└──  Preprocess                       <- Preprocessing scripts.

Reference

[1] Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. NeurIPS 2019.

[2] Jiahao Fan, Yi Li, Shaohua Wang, and Tien Nguyen. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. MSR 2020.

[3] Yizheng Chen, Zhoujie Ding, Lamya Alowain, Xinyun Chen, and David Wagner. DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection. RAID 2023.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Figures		Figures
detectors		detectors
preprocess/dataset_process		preprocess/dataset_process
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snopy: Bridging Sample Denoising with Causal Graph Learning for Effective Vulnerability Detection

Prerequisites

Environment Requirements

Thrid Party Liraries

Dataset

Getting Started

1) Clone this repo

2) Install Prerequisites

3) run `preprocess/dataset_process/extract_vulContext.py` for sample denoising

4) run `detectors/train.py` for model training and evaluation

Structure

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Snopy: Bridging Sample Denoising with Causal Graph Learning for Effective Vulnerability Detection

Prerequisites

Environment Requirements

Thrid Party Liraries

Dataset

Getting Started

1) Clone this repo

2) Install Prerequisites

3) run preprocess/dataset_process/extract_vulContext.py for sample denoising

4) run detectors/train.py for model training and evaluation

Structure

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3) run `preprocess/dataset_process/extract_vulContext.py` for sample denoising

4) run `detectors/train.py` for model training and evaluation

Packages