Vector search, relational truth, and the real architecture LLMs force on us.

A popular story goes like this: SQL is brilliant at exactness, terrible at meaning, and vector databases exist because relational databases simply can't do what AI needs. It's a good story. It has heroes (HNSW), villains (B-trees), and a moral that fits in a tweet.

It's also incomplete in a way that matters more in 2026, when LLM systems aren't toys anymore—they're interfaces, workflow engines, and in some cases production decision makers.

The better story is less dramatic and more useful:

Vector databases exist because product teams asked systems to answer a new kind of question—"what feels like this?"—at production scale. Relational databases weren't built around that question. But in 2026, the line between "relational DB" and "vector DB" is no longer a wall. It's a seam, and you get to choose where to put it.

Let's build a mental model you can actually operate: when to use a "real" vector database, when Postgres is enough, when your search engine is secretly the correct vector store, and why the LLM landscape changes the decision.


1. The Classical Argument (and Why It Still Holds)

The foundational distinction is clean:

  • A relational DB + SQL is optimized for: "find the row where column equals value."
  • Similarity search is geometrical: "find the points near this point." B-trees don't represent "near," so brute force becomes the fallback.

That's the blind spot. And it explains why vector search needs an index with a different shape: HNSW (Hierarchical Navigable Small World), a layered graph that trades theoretical perfect nearest neighbors for fast, good-enough neighbors by narrowing candidates aggressively at each layer.

Two production truths get missed until you hit them:

  • HNSW is an index structure, not a product. By itself, it's memory-bound, process-local, and not a durable service.
  • A vector database is "HNSW plus adulthood": persistent storage, network API, metadata filtering, continuous updates, operational robustness.

That's a strong frame. It's correct. It's also missing the 2026 twist.


2. The 2026 Twist: SQL Is Learning Vectors, and Vectors Are Learning SQL

In 2026, you can't honestly say "SQL can't do semantic search." It can. The question is: at what scale, with what guarantees, with what operations burden, and with what failure modes?

2.1 Postgres Isn't Just a Relational Database Anymore

pgvector is now widely supported across managed Postgres offerings—AWS Aurora's support for pgvector 0.8.0 is one concrete milestone—and cloud vendors publish performance guidance because people are running real workloads on it.

That matters because it changes the architecture default:

  • If you already have Postgres as your system of record,
  • and your vector use case is "semantic retrieval with filters" (classic RAG),
  • then "one DB" is no longer naïve—it may be the most robust choice for small and medium systems, where complexity is your real enemy.

And yes, that is the opposite of what the clickbait "vector DBs exist because SQL is blind" headline implies. You can keep the headline and still be wrong in practice.

2.2 Search Engines Are Now Vector Engines

OpenSearch positions vector search as a first-class capability aimed at hybrid and semantic retrieval. Its k-NN project describes nearest-neighbor search at large scale, with filtering and aggregation support that looks suspiciously like... search engineering.

This is the second twist: a lot of "vector database" workloads are actually "search engine" workloads.

If you need:

  • hybrid ranking (BM25 + vectors),
  • faceting/aggregations,
  • multi-tenant query patterns,
  • strong operational controls for indexing pipelines,
  • query explainability and tuning,

then your "vector database" might be a search cluster that learned vector math.

2.3 Dedicated Vector DBs Are Specializing Upward

Milvus and its managed variants are positioning as high-performance vector infrastructure, with multiple ANN index families (HNSW, IVF variants, PQ, quantization strategies) and scaling modes.

That matches the real constraints:

  • ANN results are approximate (controllable)
  • Memory footprint grows fast at billion-vector scale
  • Sharding and quantization are the economics

Dedicated vector DBs are where you go when:

  • vector search is a core product feature,
  • latency is strict,
  • data volume is enormous,
  • you need specialized indexing and hardware options,
  • or you want to isolate vector workload risks from transactional workload risks.

3. The Real Blind Spot in 2026: Confusing Three Different Kinds of Retrieval

"SQL asks where an exact record lives; a vector DB asks what else lives nearby."

In 2026, you need a slightly richer taxonomy because LLM systems combine retrieval modes:

(A) Transactional Retrieval — Truth + Invariants

What it answers: "What is the user's account balance?" "Which invoice is unpaid?" Tool shape: relational DB, transactions, constraints.

(B) Semantic Retrieval — Meaning + Proximity

What it answers: "Find cases like this," "find paragraphs that explain X," "retrieve related policy exceptions." Tool shape: embeddings + ANN index + metadata filters.

(C) Lexical Retrieval — Keywords + Coverage

What it answers: "Find the document that literally contains 'NS 8407'," "search exact clause numbers." Tool shape: inverted index (BM25), analyzers, synonyms, highlighting.

Most real LLM applications need all three.

RAG isn't "vector search." It's orchestrated retrieval: semantic to get candidates, lexical to ensure coverage and precision, relational to apply permissions and business rules.

The architectural failure mode of 2026 is choosing a single tool and forcing it to cosplay as the other two.


4. Where LLMs Change the Decision

LLMs don't just "use retrieval." They reshape what retrieval is for.

4.1 Retrieval Is Now Part of an Interaction Loop

In classic search, a user types a query and gets results. In LLM systems, retrieval is one step in a chain:

  1. Interpret query
  2. Retrieve candidates
  3. Re-rank or filter
  4. Synthesize response
  5. Cite sources
  6. Log evidence
  7. Iterate if the answer seems low-confidence

This changes what "good database choice" means.

A vector DB that is fast but hard to filter correctly can produce answers that are smooth and wrong. A relational DB that is correct but too limited in ANN tuning can make your LLM feel dumb. A search engine that supports hybrid ranking can make your system feel uncannily competent—even if it's not the fashionable choice.

4.2 Permissions Are the Killer Feature, Not Embeddings

In regulated domains (legal, finance, healthcare), access control is not a nice-to-have: it's the system.

This pushes teams toward architectures where metadata filtering is not an afterthought—vectors plus structured attributes queried as one cohesive process.

In practice, many teams already have their permission model in SQL. So the question becomes:

  • Do you duplicate permissions into the vector store?
  • Do you query SQL first to compute an "allowed set," then vector-search within it?
  • Do you build a hybrid index that includes ACL tags, tenant partitions, and time windows?

There isn't one correct answer. There is a correct principle:

The retrieval layer must be at least as correct as your authorization layer, or your LLM becomes a compliance incident generator.
That's why "just add a vector DB" is often the wrong first move.

4.3 Hybrid Search Is the Normal Case Now

Your retrieval stack in a serious 2026 system often becomes:

  • Relational DB: identity, permissions, transactions, audit
  • Search engine: lexical search, filters, aggregations, highlighting
  • Vector index: semantic similarity + reranking

Sometimes those collapse into two systems (OpenSearch as "search + vector," Postgres as "relational + vector"). Sometimes you keep all three.

The right move is the one that keeps your operational and cognitive load bounded.


5. A Practical Decision Framework

Start with your retrieval contract: write down, in a paragraph, what retrieval must guarantee:

  • Must it be exact (transactions)?
  • Must it be high recall (search coverage)?
  • Must it be low latency (interactive)?
  • Must it be permission-perfect (regulated)?
  • Must it support hybrid ranking?
  • Must it run at edge or offline?

Now map that contract to an architecture.

Option 1: Postgres + pgvector (Small/Medium, Correctness-Heavy)

Good when:

  • you have modest vector volume,
  • strong metadata filtering needs,
  • desire for one operational system,
  • and retrieval failure is worse than slower retrieval.

Caution: At very large scales, vector indexes get memory-hungry and performance tuning becomes its own craft—the same trade-offs around memory footprint, sharding, and quantization that dedicated vector DBs were built to handle.

Option 2: OpenSearch as Hybrid Search Platform (Medium/Large, Search-Centric)

Good when:

  • you already need robust keyword search,
  • you want hybrid scoring (BM25 + kNN in a single query),
  • you want aggregations and search UX features,
  • you're building an "AI-powered search" product.

OpenSearch provides k-NN with filters and aggregations as part of a unified search platform—not a bolt-on.

Option 3: Dedicated Vector DB + Relational DB (Large, Vector-Centric)

Good when:

  • semantic retrieval is a core product feature,
  • vector count is in the hundreds of millions or billions,
  • strict latency SLAs apply,
  • you want ANN index flexibility (HNSW vs IVF vs PQ),
  • you want to scale vector ops independently of transactional ops.

Milvus-class systems emphasize multiple index families and scaling strategies precisely for this profile.

Option 4: Embedded Vector Search (Edge/Offline, Constraints-First)

The sleeper trend of 2026: SQLite vector extensions (sqlite-vss, sqlite-vec lineage) enable small semantic indexes that live close to the user—offline notes, device knowledge bases, local assistants.

That matters because "LLM landscape" increasingly includes on-device and edge workflows where shipping a full vector DB makes no operational sense.


6. A More Honest Critique of "SQL Has One Blind Spot"

The pedagogical claim is right: a similarity query against a B-tree degenerates into scanning every row and computing distances, which doesn't scale.

But the 2026 nuance is:

  • SQL databases are not frozen artifacts from 1998.
  • They're extensible engines, and the market has decided vectors are a first-class need.
  • So the "blind spot" is not semantic similarity itself; it's the native index structures and query planners that weren't built around vector distances.

That's why the world didn't replace SQL. It extended it—sometimes successfully, sometimes awkwardly.

A good heuristic:

If your organization is mostly fighting data correctness, governance, and complexity — you probably want vectors inside your existing DB or search platform. If your organization is mostly fighting latency, index performance, and scale — you probably want vectors in a dedicated system.
The trade-off is almost never "accuracy vs meaning." It's simplicity vs specialization.

7. What This Means for System Design in 2026

A production-grade RAG pipeline in 2026:

Step 1 — Ingest: Chunk documents with stable IDs. Store text chunks, embeddings, and metadata (tenant, ACL tags, doc type, timestamps) together.

Step 2 — Retrieve with hybrid strategy:

  • Vector search to get candidates
  • Lexical search for precision and coverage
  • Relational filters for authorization and business rules

Step 3 — Re-rank (optional): Cross-encoder or model-based reranker over the candidate set.

Step 4 — Synthesize: Generate answer with citations and evidence log.

Step 5 — Evaluate continuously: Track retrieval hit-rate, citation correctness, hallucination rate.

The non-negotiables that separate production from prototype:

  • Observability: what did we retrieve and why?
  • Evals: is this working?
  • Replayability: can we reproduce outputs?

LLMs are stochastic. Retrieval is your anchor. The database choice is part of how you constrain the model's imagination—not just its speed.


8. A "Do This Monday" Guide

If advising a team building a serious 2026 system:

1. Define retrieval correctness first. What constitutes an unacceptable retrieval result? (Usually: wrong tenant, wrong ACL, wrong time window.) Write this down before choosing a database.

2. Start with hybrid unless you have a reason not to. Pure vector search is rarely sufficient. OpenSearch-style platforms exist because the real world uses both BM25 and kNN.

3. Treat metadata filtering as first-class. Not a post-processing step. Filter-at-index-time, not filter-after-retrieval.

4. Pick storage shape based on organizational pain:

  • Governance/correctness pain → Postgres + vectors
  • Search/hybrid relevance pain → search engine + vectors
  • Scale/latency pain → dedicated vector DB

5. Instrument retrieval. Log candidate sets, filters applied, and final citations. You can't improve what you can't observe.

6. Write evals before you write opinions. A retrieval system without evals is just a feeling.


The Database Was Never the Point

The most quietly important insight isn't about B-trees or HNSW. It's about deliberate design: systems that work use the right tool for each question, not because one database is "better."

SQL didn't fail. It stayed honest about what it was built to do.

Vector databases didn't replace SQL. They made meaning queryable at scale.

And the real architecture decision in 2026 is not "vector DB or relational DB."

It's:

Where do you want truth to live, and where are you willing to tolerate approximation?

Because approximation isn't a bug anymore. It's a design parameter—like latency, like cost, like risk.

And, inconveniently, like responsibility.