This project implements the NetRL model, a deep reinforcement learning based method for network denoising, which can be guided by downstream task.
The script has been tested running under Python 3.5.2, with the following packages installed (along with their dependencies):
tensorflow==1.14.0numpy==1.17.0pandas==0.24.2sklearn==0.21.3scipy==1.3.1tqdm==4.32.1
Some Python module dependencies are listed in requirements.txt, which can be easily installed with pip:
pip install -r requirements.txt
In addition, CUDA 9.0 has been used in our project. Although not all dependencies are mentioned in the installation instruction links above, you can find most of the libraries in the package repository of a regular Linux distribution.
Some example data formats are given in data for your reference. The input files are expected to be three parts:
-
Dataset details, where
xis the node feature vectors,yis the one-hot labels andgraphis a dict in the format {index: [index_of_neighbor_nodes]}, where the neighbor nodes are organized as a list. In this example, we use the preprocessed citation network Cora and Citeseer provided by https://github.com/kimiyoung/planetoid (Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov, Revisiting Semi-Supervised Learning with Graph Embeddings, ICML 2016). -
edgelistis the network edgelist with different noise ratio, e.g.,1 2 1 3 1 4 ... -
embeddingis the pretrained node representations where the first line is header (# of nodes, # of dimension), all other lines are node-id and d dimensional representation:32 64 1 0.016579 -0.033659 0.342167 -0.046998 ... 2 -0.007003 0.265891 -0.351422 0.043923 ... ...
The help information of the main script main.py is listed as follows:
python Main.py -h
usage: Main.py [-h][--env_name] [--use_gpu] [--gpu_id] [--gpu_fraction] [--random_seed]
optional arguments:
-h, --help show this help message and exit
--env_name str, select the dataset.
--use_gpu bool, whether to use gpu or not.
--gpu_id str, which gpu to use.
--gpu_fraction str, idx / # of gpu fraction e.g. 1/3, 2/3, 3/3
--random_seed int, value of random seed
We set the default parameters in config.py. You can modify them when using your own dataset.