Learn Before
Concept

Late-Interaction Neural Retrieval

Late-interaction neural retrieval is a paradigm in neural information retrieval where queries and documents are independently encoded into multi-vector representations, with one contextualized embedding per token, and relevance is computed at query time through fine-grained token-level interactions rather than a single dot product between pooled embeddings. The standard scoring function is MaxSim: for each query token embedding, the maximum similarity across all document token embeddings is taken, and the per-query-token maxima are summed to produce the query-document score. This decouples encoding from interaction (so document embeddings can be precomputed and indexed) while preserving expressive fine-grained matching, at the cost of substantially larger indexes than single-vector dense retrievers.

0

1

Updated 2026-05-16

Contributors are:

Who are from:

Tags

Science

Related