Many new embedding models are outputting half precison (and even single byte precision) and we would like to be able to support these types natively within FAISS.
From what we can tell, the most straightforward way to do this would be to introduce a class-level template in faiss::Index or abstract class which we can instantiate/implement for those index types which we would like to support, but would always instantiate float types so that we can maintain compatiblity w/ the existing FAISS APIs.
@mdouze @wickedfoo @algoriddle @alexanderguzhva any other ideas on how we could support this funciontality? This is specifically being requested for CAGRA to start, but I suspect we will eventually want to support this more broadly? I also understand this can add to the binary size. At least on the GPU side, cuVS contains half- and byte-precision already, so it's just a matter of calling those APIs. Eventually cuVS will be moving some of the additoinal types to the new nvjjitlink technology so that it'll be compiled and linked at runtime.
Many new embedding models are outputting half precison (and even single byte precision) and we would like to be able to support these types natively within FAISS.
From what we can tell, the most straightforward way to do this would be to introduce a class-level template in
faiss::Indexor abstract class which we can instantiate/implement for those index types which we would like to support, but would always instantiatefloattypes so that we can maintain compatiblity w/ the existing FAISS APIs.@mdouze @wickedfoo @algoriddle @alexanderguzhva any other ideas on how we could support this funciontality? This is specifically being requested for CAGRA to start, but I suspect we will eventually want to support this more broadly? I also understand this can add to the binary size. At least on the GPU side, cuVS contains half- and byte-precision already, so it's just a matter of calling those APIs. Eventually cuVS will be moving some of the additoinal types to the new
nvjjitlinktechnology so that it'll be compiled and linked at runtime.