Skip to content

ankane/neighbor-s3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neighbor S3

Nearest neighbor search for Ruby and S3 Vectors

Installation

Add this line to your application’s Gemfile:

gem "neighbor-s3"

Create a vector bucket and set your AWS credentials in your environment:

AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...

Getting Started

Create an index

index = Neighbor::S3::Index.new("items", bucket: "my-bucket", dimensions: 3, distance: "cosine")
index.create

Add vectors

index.add(1, [1, 1, 1])
index.add(2, [2, 2, 2])
index.add(3, [1, 1, 2])

Search for nearest neighbors to a vector

index.search([1, 1, 1], count: 5)

Search for nearest neighbors to a vector in the index

index.search_id(1, count: 5)

IDs are treated as strings by default, but can also be treated as integers

Neighbor::S3::Index.new("items", id_type: "integer", ...)

Operations

Add or update a vector

index.add(id, vector)

Add or update multiple vectors

index.add_all([{id: 1, vector: [1, 2, 3]}, {id: 2, vector: [4, 5, 6]}])

Get a vector

index.find(id)

Get all vectors

index.find_in_batches do |batch|
  # ...
end

Remove a vector

index.remove(id)

Remove multiple vectors

index.remove_all(ids)

Metadata

Add a vector with metadata

index.add(id, vector, metadata: {category: "A"})

Add multiple vectors with metadata

index.add_all([
  {id: 1, vector: [1, 2, 3], metadata: {category: "A"}},
  {id: 2, vector: [4, 5, 6], metadata: {category: "B"}}
])

Get metadata with search results

index.search(vector, with_metadata: true)

Filter by metadata

index.search(vector, filter: {category: "A"})

Supports these operators

Specify non-filterable metadata on index creation

Neighbor::S3::Index.new(name, non_filterable: ["category"], ...)

Example

You can use Neighbor S3 for online item-based recommendations with Disco. We’ll use MovieLens data for this example.

Create an index

index = Neighbor::S3::Index.new("movies", bucket: "my-bucket", dimensions: 20, distance: "cosine")

Fit the recommender

data = Disco.load_movielens
recommender = Disco::Recommender.new(factors: 20)
recommender.fit(data)

Store the item factors

index.add_all(recommender.item_ids.map { |v| {id: v, vector: recommender.item_factors(v)} })

And get similar movies

index.search_id("Star Wars (1977)").map { |v| v[:id] }

See the complete code

Reference

Get index info

index.info

Check if an index exists

index.exists?

Drop an index

index.drop

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/neighbor-s3.git
cd neighbor-s3
bundle install
bundle exec rake test

About

Nearest neighbor search for Ruby and S3 Vectors

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages