How Vector Databases Work: Embeddings, Retrieval, and Matching in Practice
A technical explanation of how vector databases store embeddings, search them with ANN, and match results back to original documents.
How Vector Databases Work: Embeddings, Retrieval, and Matching in Practice
To really understand vector databases, you need to go beyond “it can do semantic search.”
The real question is more concrete:
- Where are embeddings stored?
- What is the database actually searching?
- How is the original document matched back to the vector?
- Why do these systems retrieve the “closest” item instead of the exact one?
This post explains the mechanics in engineering terms.
Bottom line
- A vector DB stores original data + vector + metadata together.
- Retrieval is usually done through Approximate Nearest Neighbor (ANN) search.
- The result is not the vector itself; the database returns an ID linked to the original document.
1) Start with the storage layout
When you store documents in a vector DB, you are not just converting the whole document into a vector and calling it a day. The usual structure looks more like this:
- id: the unique key for a document or chunk
- vector: the embedding values
- payload / metadata: source, language, type, date, permissions, and more
- original text or reference: either the raw text or a pointer to it
For example:
{
"id": "doc_123_chunk_4",
"vector": [0.12, -0.44, 0.88, ...],
"metadata": {
"docId": "doc_123",
"source": "handbook",
"lang": "ko",
"chunkIndex": 4
},
"text": "The original text for this chunk..."
}
The key point is that the vector is never truly alone. It is always tied to an identifier and metadata.
2) How embeddings are stored
An embedding is the output of turning text or images into a numeric array.
That array is usually a dense vector with hundreds of dimensions.
That vector is a compressed representation of meaning, so semantically similar items tend to land near each other in vector space.
What matters at storage time
- which embedding model generated the vector
- how many dimensions it has
- how the source data was chunked
- which metadata was attached
Change any of those and the search quality can change a lot.
3) How retrieval works
At query time, the user’s question is embedded the same way.
Example flow:
- A user asks: “Why do vector DBs matter in RAG?”
- The question is converted into an embedding.
- The DB compares that query vector with stored vectors.
- It finds the nearest ones.
- It returns the document IDs or chunk IDs linked to those vectors.
- The application fetches the original text from those IDs.
So retrieval is not string matching. It is distance calculation in vector space.
4) What does “closeness” mean?
Vector DBs compute similarity between vectors using distance or similarity metrics. Common ones include:
- cosine similarity: how aligned two vectors are
- dot product: inner-product scoring
- Euclidean distance: geometric distance in space
In practice, cosine similarity or dot product is common.
The important part is not the formula itself. The important part is that the embedding model was trained so that similar meaning → nearby vectors.
5) Why ANN instead of exact search?
Vector spaces are high-dimensional and large. Comparing every vector to every query is too slow.
That is why most vector DBs use Approximate Nearest Neighbor (ANN) search.
ANN means:
- do not compare against every vector exactly
- search for the most likely close candidates fast
Common indexing ideas include:
- graph-based traversal
- clustering
- quantization
- hierarchical search
You trade a bit of precision for a lot of speed.
6) How the original value is matched back
This is the part people often miss. A vector DB does not usually give you the vector as the answer. It returns the ID associated with that vector.
Example:
- Search result:
doc_123_chunk_4 - The app uses that ID to fetch the original document or chunk
- The final answer shown to the user is the raw text, summary, or metadata
In other words, the matching flow is:
- storage: split text into chunks and assign IDs
- embedding: generate a vector for each chunk
- indexing: store vector + ID together
- search: run similarity search and return IDs
- lookup: fetch the original chunk by ID
That separation is what makes vector search practical.
7) Why separate vectors from original text?
Vectors are good for search, but they are not human-readable.
Original text is human-readable, but not ideal for similarity search.
So systems keep them separate:
- vector: for finding things
- original text: for showing things
That is also why RAG works cleanly.
8) Practical things to watch
Chunk size
Too large loses precision. Too small loses context.
Metadata filtering
Without language, permission, type, or freshness filters, results get noisy.
Re-ranking
The first retrieved candidates are not always the best final results.
Freshness
If the embedding model or index changes, you may need to reindex.
Closing
A vector DB is not magic. But once you understand the mechanics, it becomes pretty straightforward.
- Split the documents.
- Create embeddings.
- Store vectors with IDs.
- Embed the query.
- Use ANN to find nearby vectors.
- Fetch the original text by ID.
That is the core loop.
My recommendation:
- separate vectors from original text in your design
- add metadata filtering and re-ranking early
- understand the speed/accuracy tradeoff of ANN indexing