Better performance using text-embedding-3-large?

rtpenar · January 28, 2024, 4:56am

Im working on a project using embeddings (text-ada-2)
Has anyone seen any significant jump in performance using text-embedding-3-large, specifically using more dimensions? i.e. 3,072

I wonder if using more dimensions packs the meaning of chunks better and allows for better / more effective vector search. If anyone has seen any performance boost using the new large model Id love to know.

thanks

Diet · January 28, 2024, 5:54am

We’ve seen that the new models allow concepts to be orthogonal (cosine similarity near 0) , which was a near impossibility with ada.

this allows you to outright reject documents without using an arbitrary cutoff.

Whether you need that many dimensions depends on your actual use-case.

Of course, search slows down with more dimensions, but what you can do is use the dimensions parameter to subsample your embeddings while still taking advantage of the “smarter” model.

There is a substantial difference to ADA in my opinion - but it doesn’t necessarily have anything to do with the dimensions per se.

Does that sorta answer your question?

SomebodySysop · February 1, 2024, 9:53pm

Your entire question was exactly what I wanted to ask. Have you found out anything?

rtpenar · February 2, 2024, 4:25am

Thanks, yeah I think its a lot more nuanced than I originally thought.
it looks like there might be some possible gains with respect to accuracy and maybe like you said I can still see those gains by suing the new model with maybe the same # of dimensions.

I guess it depends a lot on the specific use case, will run tests next week.

Thank you for your insights

rtpenar · February 2, 2024, 4:26am

@Diet’s response was quite comprehensive, Ill run tests next week and update this entry with my findings :))

uqarni · February 7, 2024, 4:33pm

does anyone know if there’s a performance difference between the two models while holding vector dimensionality constant?

llamallamadingdong · February 11, 2025, 3:13pm

I have read there is about a 4-6% performance improvement between text-embedding-3-small and large at 1536D.

Topic		Replies	Views
Embeddings performance difference between small vs large at 1536 dimensions? API embeddings , vector-db	11	10449	April 13, 2024
Transitioning to the new embeddings models from ada API embeddings	8	5538	January 27, 2024
Are OpenAI text-embedding-ada-002 embedding model greater than text-embedding-3-large? Community embeddings , chatgpt , api	1	1570	February 21, 2024
Anybody using text-embedding-3-large? API embeddings	2	1112	September 1, 2024
Should we update to the new embeddings models? API api	4	1222	February 10, 2024

Better performance using text-embedding-3-large?

Related topics