1Cademy - all-MiniLM-L6-v2 Sentence Embedding Model

Learn Before

Sentence-BERT Siamese Sentence Embedding Framework (Reimers & Gurevych, 2019)

Concept

all-MiniLM-L6-v2 Sentence Embedding Model

all-MiniLM-L6-v2 is a public sentence-embedding checkpoint released through the sentence-transformers library. It is built on the nreimers/MiniLM-L6-H384-uncased backbone — a $6$ -layer, hidden-size- $384$ Transformer produced by MiniLM-style self-attention distillation, with roughly 22.7M parameters — and is fine-tuned with an SBERT-style contrastive objective on more than 1.17 billion sentence pairs aggregated from Reddit comments, S2ORC citation pairs, WikiAnswers duplicates, PAQ, Stack Exchange, MS MARCO, and many additional sources. The model maps a sentence or short paragraph into a $384$ -dimensional dense vector suitable for semantic search, clustering, and retrieval; inputs longer than $256$ word pieces are truncated. The released checkpoint is the canonical default encoder used in many downstream retrieval pipelines.