all-MiniLM-L6-v2 Sentence Embedding Model
all-MiniLM-L6-v2 is a public sentence-embedding checkpoint released through the sentence-transformers library. It is built on the nreimers/MiniLM-L6-H384-uncased backbone — a -layer, hidden-size- Transformer produced by MiniLM-style self-attention distillation, with roughly 22.7M parameters — and is fine-tuned with an SBERT-style contrastive objective on more than 1.17 billion sentence pairs aggregated from Reddit comments, S2ORC citation pairs, WikiAnswers duplicates, PAQ, Stack Exchange, MS MARCO, and many additional sources. The model maps a sentence or short paragraph into a -dimensional dense vector suitable for semantic search, clustering, and retrieval; inputs longer than word pieces are truncated. The released checkpoint is the canonical default encoder used in many downstream retrieval pipelines.
0
1
Tags
Science
Auditable Strict-Parity Evaluation of Prerequisite-Graph Retrieval for RAG under Leakage Controls