NOTE: the models directory (which contains the trained machine learning models) is too big to be on GitHub and needs to be recreated when needed.
algorithms: the three original algorithms (clique, kdown, McSplit), and a combination of McSplit and clique called Fusion.data: two databases of graphs for MCS and SIP algorithms, and an AIDS dataset, which was never used.gnuplot: gnuplot scripts and generated cumulative plotsgraph_stats: a graph feature extractor taken from Portfolios of subgraph isomorphism algorithms (Kotthoff, McCreesh and Solnon, LION 2016)results: all (mostly CSV) data that was generated during this project..mcs.and.sip.denote which database the data came from.- a number between dots denotes the labelling percentage.
both.labelsmeans that both vertices and edges were labelled.vertex.labelsmeans that only vertex labels were labelled.associationfiles record features of the association graph.cliquefiles record the number of search nodes, runtime, and the size of the discovered MCS, for the clique algorithm.kdown,mcsplit, andmcsplitdowndo the same thing for other algorithms.costs.csvrecords the feature extraction costs, for all graphs.filtered_instancesis a random sample of MCS instances (30,000).filtered_instances2is a random sample offiltered_instances(10,000).filtered_instances_one_filenameisfiltered_instanceswithout the second of the two filenames.fusion1is the Fusion algorithm that switches after one decision.fusion2is the Fusion algorithm that switches after two decisions.labels.csv: each line corresponds to a different graph of the MCS instances. For each distinct label in the 33% labelling scheme, we record the number of vertices with that label.mcs_features.csvandsip_features.csvrecord the features for unlabelled MCS and SIP instances (ratio features are calculated later).mcs_features_individual.csvandsip_features_individual.csvrecord features of each individual graph separately.mcs_instancesandsip_instancescontain the complete lists of instances, for both graph databases.unlabelled.csv,vertex_labels.csv, andboth_labels.csvcontain summaries of the data, used for plotting cumulative plots
text: the dissertation, a short status report, and three sets of slides for different presentations.video: everything related to the video describing this project.