This provides two classes illustrating the usage of Lucene's HNSW approximate nearest-neighbor (ANN) index. See the Lucene JIRA issue and the Elastic blog post for more background, and the original paper for how it works.
The classes provided are SimpleExample, which creates and searches a random graph, and Texmex, which tests the recall performance against a dataset with ground truth (i.e. known, exact nearest neighbors) precomputed for each query.
$ ./gradlew runSimple
$ ./gradlew runTexmex -PsiftName=siftsmallThe Texmex datasets may be found here.
The Texmex class expects to find the data files in a subdirectory of the current working directory, as extracted from the dataset tgz archive (e.g. siftsmall, sift, etc.). The siftsmall dataset runs in about 2 seconds. The sift dataset runs in about 10.5 minutes.