Algorithm Selection for Maximum Common Subgraph

NOTE: the models directory (which contains the trained machine learning models) is too big to be on GitHub and needs to be recreated when needed.

Directories and Files

algorithms: the three original algorithms (clique, kdown, McSplit), and a combination of McSplit and clique called Fusion.
data: two databases of graphs for MCS and SIP algorithms, and an AIDS dataset, which was never used.
gnuplot: gnuplot scripts and generated cumulative plots
graph_stats: a graph feature extractor taken from Portfolios of subgraph isomorphism algorithms (Kotthoff, McCreesh and Solnon, LION 2016)
results: all (mostly CSV) data that was generated during this project.
- .mcs. and .sip. denote which database the data came from.
- a number between dots denotes the labelling percentage.
- both.labels means that both vertices and edges were labelled.
- vertex.labels means that only vertex labels were labelled.
- association files record features of the association graph.
- clique files record the number of search nodes, runtime, and the size of the discovered MCS, for the clique algorithm.
- kdown, mcsplit, and mcsplitdown do the same thing for other algorithms.
- costs.csv records the feature extraction costs, for all graphs.
- filtered_instances is a random sample of MCS instances (30,000).
- filtered_instances2 is a random sample of filtered_instances (10,000).
- filtered_instances_one_filename is filtered_instances without the second of the two filenames.
- fusion1 is the Fusion algorithm that switches after one decision.
- fusion2 is the Fusion algorithm that switches after two decisions.
- labels.csv: each line corresponds to a different graph of the MCS instances. For each distinct label in the 33% labelling scheme, we record the number of vertices with that label.
- mcs_features.csv and sip_features.csv record the features for unlabelled MCS and SIP instances (ratio features are calculated later).
- mcs_features_individual.csv and sip_features_individual.csv record features of each individual graph separately.
- mcs_instances and sip_instances contain the complete lists of instances, for both graph databases.
- unlabelled.csv, vertex_labels.csv, and both_labels.csv contain summaries of the data, used for plotting cumulative plots
text: the dissertation, a short status report, and three sets of slides for different presentations.
video: everything related to the video describing this project.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
algorithms		algorithms
data		data
gnuplot		gnuplot
graph_stats		graph_stats
results		results
text		text
video		video
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
analysing_performance.R		analysing_performance.R
clique_analysis.R		clique_analysis.R
common.R		common.R
convert.py		convert.py
correlations.R		correlations.R
feature_correlations.R		feature_correlations.R
histogram.R		histogram.R
labelled_data_training.R		labelled_data_training.R
model_analysis.R		model_analysis.R
plotting.R		plotting.R
sample.R		sample.R
unlabelled_data_training.R		unlabelled_data_training.R
unused.R		unused.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Algorithm Selection for Maximum Common Subgraph

Directories and Files

About

Uh oh!

Releases

Packages

dilkas/maximum-common-subgraph

Folders and files

Latest commit

History

Repository files navigation

Algorithm Selection for Maximum Common Subgraph

Directories and Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages