LLM Embeddings — Explained Simply

Sandi Besen
AI Mind
Published in
4 min readJul 25, 2023

--

Embeddings are the fundamental reasons why large language models such as OpenAi’s GPT-4 and Anthropic’s Claude are able to contextualize information quickly and effectively. By the end of the article you’ll understand why and be able to follow the flow of data in the following image:

What are Embeddings?

In the context of large language models, embeddings are vectors stored in an index within a vector database. Before you freak out on me, let’s break the previous sentence into digestible chunks. I promise by the end you’ll be able to read that sentence and tell yourself “I totally get that”.

Embeddings are vectors stored in an index within a vector database:

Embeddings are a way to store data of all types (including images, audio files, text, documents, etc.) in number arrays called vectors.

For instance, the sentence “There are 7 stages in the sales cycle: Prospecting, Making contact, Qualify the prospect, Nurture the prospect, Present offer, Overcome objections, and Close the Sale.” could be represented as the embedding [ 0.00049438 0.11941205 0.00522949 ... 0.01687654 -0.02386115 0.00526433] with the “…” representing hundreds to thousands of other numbers. What precisely these numbers mean is known only to the transformer model (embedding algorithm) that generated them, but we can be sure that they represent the words, their context (the relationship of the words to each other and other embeddings), and their meaning. The embedding stores all the information the Retrieval Algorithm needs to be able to search based on the users query and find relevant information that the LLM can then use to answer the users question.

Embeddings are vectors stored in an index within a vector database:

Vectors are a way to organize embeddings with meaning. Although these embeddings are represented in 1D on your computer screen, they actually have many dimensions. Humans struggle to visualize more than a 3 dimensional space. One dimension is a line, 2 dimensions can be represented in a graph, 3 dimensions as a cube, but 4+ dimensions… exactly. In a vector, the number of dimensions depends on the length of your vector — “n”. So if an embedding has 150 numbers or n=150, the vector has 150 dimensions. Our human brains can’t fathom what that might look like, but here is a image of a 5D space for context.

source: https://en.wikipedia.org/wiki/Five-dimensional_space

Embeddings are vectors stored in an index within a vector database:

An index is the order in which the vectors are stored. Why does the order matter? The origin (where the vector starts) , direction (which way the vector moves), and magnitude (length of the vector), determine the vector’s relationship and proximity with other vectors. This relationship is what is used for the retrieval mechanism to compare two vectors and see how related they are. For example, the embedded vector representing the City of “Budapest” might be stored close to the embedded vector representing the city of “London” because they are both Country Capitals. Their distance to each other in the index demonstrates their relationship. When the retrieval algorithm queries the vector database for “Budapest” it might also return information on other country capitals because of its proximity in the vector database. The information that is returned from the retrieval search is then passed back to the application which formats the RAG search results, user query, and system prompt for the LLM to generate an informed response.

Embeddings are vectors stored in an index within a vector database:

The most efficient way to store a vectorized index is in a vector database, also sometimes referred to as vector store. Unlike traditional databases that store information in rows and columns, vector databases use an algorithm to index the vectors (like we talked about in the previous section). When information is extracted from a traditional database, the algorithm is looking for what row matched that specific search. In a vectorized database, function approximation is used to look for the most similar vector to the query — this process is called Retrieval Augmented Generation (RAG). RAG is a method of enhancing the LLM response by supplementing the context beyond what is provided in the LLMs training set.

Now that you can breakdown the sentence “Embeddings are vectors stored in an index within a vector database” into digestible and understandable chunks, take another look at the flow of data process chart.

If you still have questions or think that something needs to be further clarified? Drop me a DM on Linkedin! I‘m always eager to engage in food for thought and iterate on my work.

--

--

Learn along side me as I publish technical but digestible content for technical SMEs and novices alike. My opinons may not represent those of my employer.