DBLP4k
- class dhg.data.DBLP4k(data_root=None)[source]
Bases:
dhg.data.base.BaseDataThe DBLP-4k dataset is a citation network dataset for node classification task. The dataset is an academic network from four research areas. There are 14,475 authors, 14,376 papers, and 20 conferences, among which 4,057 authors, 20 conferences and 100 papers are labeled with one of the four research areas (database, data mining, machine learning, and information retrieval). The vertice denotes author, and three types of correlation (co-paper, co-term, co-conference) can be used for building hyperedges. More details see the PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks paper.
The content of the DBLP-4k dataset includes the following:
num_classes: The number of classes: \(4\).num_vertices: The number of vertices: \(4,057\).num_paper_edges: The number of hyperedges constructed by the co-paper correlation: \(14,328\).num_term_edges: The number of hyperedges constructed by the co-term correlation: \(7,723\).num_conf_edges: The number of hyperedges constructed by the co-conference correlation: \(20\).dim_features: The dimension of author features: \(334\).features: The author feature matrix.torch.Tensorwith size \((4,057 \times 334)\).labels: The label list.torch.LongTensorwith size \((4,057, )\).edge_by_paper: The hyperedge list constructed by the co-paper correlation.Listwith length \((14,328)\).edge_by_term: The hyperedge list constructed by the co-term correlation.Listwith length \((7,723)\).edge_by_conf: The hyperedge list constructed by the co-conference correlation.Listwith length \((20)\).paper_author_dict: The dictionary of{paper_id: [author_id, ...]}.Dictwith length \((14,328)\).term_paper_dict: The dictionary of{term_id: [paper_id, ...]}.Dictwith length \((7,723)\).conf_paper_dict: The dictionary of{conf_id: [paper_id, ...]}.Dictwith length \((20)\).
- Parameters
data_root (
str, optional) – Thedata_roothas stored the data. If set toNone, this function will auto-download from server and save into the default direction~/.dhg/datasets/. Defaults to None.