OpenAI recently released new embedding models and API updates, introducing two additions to its lineup: a smaller and highly efficient text-embedding-3-small model, and a larger and more powerful text-embedding-3-large model.
Interestingly, the technique behind the embedding has been developed by Indian developers Aditya Kusupati, a researcher at Google and Pratik Jain, a Senior Staff Research Scientist at Google who recently published a paper titled ‘Matryoshka Representation Learning.’
🤯WOW🪆Matryoshka Representation Learning enables "native support for shortening embs" &"very flexible usage"
Jokes aside, excited that @OpenAI serves MRL by default in v3 embedding API for retrieval & RAG!
Other models & services should catch-up soon😄https://t.co/JcmTH8w8mh pic.twitter.com/omVe2qkW2t— Aditya Kusupati 🪆 (@adityakusupati) January 26, 2024
“Wish OpenAI had referred to Matryoshka embeddings (or nested embeddings, as we call them in the paper and presentations) instead of avoiding any of the names we have mentioned in the paper,” expressed Prateek Jain, one of the authors of the paper on X.
Later on, Owen Campbell-Moore, APIs PM at OpenAI, acknowledged that OpenAI did train on MRL. ‘Hey Prateek! We did train this based on MRL – I was responsible for the blog post, and it’s my mistake for not thinking or remembering to cite. We’re updating the blog post to add a citation now!’ Moore wrote on X.”
OpenAI edited the blog post mentioning the contribution of Prateek Jain and Aditya Kusupati’s paper.
🪆Matryoshka Representation Learning (MRL) 🪆
Hidden behind @OpenAI’s new embedding updates* is a cool embedding representation technique by @adityakusupati et al. that encodes information at coarse-to-fine granularities in a single vector. Short 🧵 on how it works.
Different… pic.twitter.com/74nIFEt7Hb— Jerry Liu (@jerryjliu0) January 26, 2024
What is MRL?
MRL is a cool embedding representation technique that encodes information at coarse-to-fine granularities in a single vector. MRL trains a single high-dimensional vector to encapsulate information at different granularities, akin to nesting dolls. It draws inspiration from the Russian nesting dolls, Matryoshka, where smaller dolls are encased within larger ones.
MRL adapts to various downstream tasks without modifying the original representation, saving computational resources and avoiding the need for separate models for each task.
The post OpenAI Uses Technique Created By Indian Developers appeared first on Analytics India Magazine.