Last verified · v1.0
Calculator · math
Cosine Similarity Calculator
Instantly compute cosine similarity between two vectors in up to 5 dimensions. Enter components for vectors A and B to get a similarity score from -1 to 1.
Inputs
Cosine Similarity
—
Explain my result
Get a plain-English breakdown of your result with practical next steps.
The formula
How the
result is
computed.
What Is Cosine Similarity?
Cosine similarity measures the cosine of the angle between two non-zero vectors in a multi-dimensional space. Rather than comparing absolute magnitudes, it captures directional alignment — making it ideal for comparing documents, embeddings, and feature vectors regardless of scale. A value of 1 indicates identical direction (perfect similarity), 0 indicates orthogonality (no directional overlap), and -1 indicates opposite directions.
The Formula
The cosine similarity between vectors A and B is defined as:
cos(θ) = (A · B) / (‖A‖ × ‖B‖) = Σ(Aᵢ × Bᵢ) / [√Σ(Aᵢ²) × √Σ(Bᵢ²)]
Variables Explained
- A · B (dot product): the sum of element-wise products — A₁B₁ + A₂B₂ + … + AₙBₙ.
- ‖A‖ (magnitude of A): the Euclidean norm √(A₁² + A₂² + … + Aₙ²).
- ‖B‖ (magnitude of B): the Euclidean norm √(B₁² + B₂² + … + Bₙ²).
- n (dimensions): the number of components per vector, selectable from 2 to 5 in this calculator.
Why Normalization Matters
The denominator in the cosine similarity formula normalizes both vectors by dividing the dot product by the product of their magnitudes. This normalization is what makes cosine similarity scale-invariant — it measures only the angle between vectors, not their lengths. Two vectors representing the same concept but with different scales (e.g., a short document and a long document on identical topics) will still yield a similarity of 1.0, whereas Euclidean distance would penalize the length difference heavily. This property is why cosine similarity is the metric of choice for sparse, high-dimensional text data.
Step-by-Step Example
Compute cosine similarity for A = [2, 3, 1] and B = [4, 0, 2] across 3 dimensions:
- Dot product: (2×4) + (3×0) + (1×2) = 8 + 0 + 2 = 10.
- ‖A‖: √(4 + 9 + 1) = √14 ≈ 3.742.
- ‖B‖: √(16 + 0 + 4) = √20 ≈ 4.472.
- Result: 10 ÷ (3.742 × 4.472) = 10 ÷ 16.733 ≈ 0.5976.
A score of approximately 0.60 indicates moderate directional alignment between the two vectors. Notice that the vectors do not need to have the same magnitude; the normalization step ensures the result depends only on direction.
Interpreting the Score
- 1.0: perfectly aligned — vectors point in the same direction.
- 0.7–0.99: high similarity — typical for closely related documents or feature sets.
- 0.3–0.69: moderate similarity — partial content or feature overlap.
- 0.0: orthogonal — no shared directional information.
- -1.0: perfectly opposite — vectors point in entirely different directions.
Real-World Applications
Natural Language Processing and Search
TF-IDF vectors and dense word embeddings are routinely compared with cosine similarity to rank document relevance in information retrieval. As detailed in the Northwestern University LING 334 GloVe assignment, cosine similarity between 300-dimensional GloVe vectors effectively captures semantic relatedness — word pairs like king and queen score near 0.85 while semantically unrelated words score near 0.0, demonstrating the measure's discriminative power.
Plagiarism Detection and Document Comparison
Academic integrity systems convert essays into term-frequency vectors and flag document pairs that exceed a cosine similarity threshold of 0.8. Research published on PubMed Central (2024) confirms that cosine similarity outperforms Euclidean distance for high-dimensional sparse vectors — precisely the regime where text data lives — due to its inherent scale invariance and insensitivity to document length.
Recommendation Systems and Bioinformatics
Collaborative filtering engines represent user preferences as item-rating vectors and surface similar users through cosine similarity, powering recommendations on e-commerce and streaming platforms. In bioinformatics, gene expression profiles modeled as numeric vectors are clustered by cosine similarity to identify co-expressed gene sets across tissue samples. The measure's magnitude-independence means a high-expression and low-expression sample with proportionally identical profiles still register near 1.0 similarity. Financial analysts apply the same principle to compare portfolio return vectors across different fund sizes.
Computational Advantages
Beyond its mathematical elegance, cosine similarity offers practical computational advantages. For sparse vectors common in text processing, the dot product computation skips zero components entirely, making the calculation fast even in thousands of dimensions. The normalization by magnitude prevents overflow errors common in other distance metrics, and the bounded output range [-1, 1] simplifies threshold-based decision-making in production systems.
About This Calculator
This cosine similarity calculator supports vectors with 2 to 5 dimensions. Select the number of dimensions, enter each component of vectors A and B, and receive the result instantly — no manual norm computation required. The foundational theory and geometric derivation are covered in detail in the UTA ITLab Cosine Similarity Tutorial, a comprehensive reference for students and practitioners alike.
Reference