terican

Last verified · v1.0

Calculator · math

Cosine Similarity Calculator

Instantly compute cosine similarity between two vectors in up to 5 dimensions. Enter components for vectors A and B to get a similarity score from -1 to 1.

FreeInstantNo signupOpen source

Inputs

Cosine Similarity

Explain my result

0/3 free

Get a plain-English breakdown of your result with practical next steps.

Cosine Similarity

The formula

How the
result is
computed.

What Is Cosine Similarity?

Cosine similarity measures the cosine of the angle between two non-zero vectors in a multi-dimensional space. Rather than comparing absolute magnitudes, it captures directional alignment — making it ideal for comparing documents, embeddings, and feature vectors regardless of scale. A value of 1 indicates identical direction (perfect similarity), 0 indicates orthogonality (no directional overlap), and -1 indicates opposite directions.

The Formula

The cosine similarity between vectors A and B is defined as:

cos(θ) = (A · B) / (‖A‖ × ‖B‖) = Σ(Aᵢ × Bᵢ) / [√Σ(Aᵢ²) × √Σ(Bᵢ²)]

Variables Explained

  • A · B (dot product): the sum of element-wise products — A₁B₁ + A₂B₂ + … + AₙBₙ.
  • ‖A‖ (magnitude of A): the Euclidean norm √(A₁² + A₂² + … + Aₙ²).
  • ‖B‖ (magnitude of B): the Euclidean norm √(B₁² + B₂² + … + Bₙ²).
  • n (dimensions): the number of components per vector, selectable from 2 to 5 in this calculator.

Why Normalization Matters

The denominator in the cosine similarity formula normalizes both vectors by dividing the dot product by the product of their magnitudes. This normalization is what makes cosine similarity scale-invariant — it measures only the angle between vectors, not their lengths. Two vectors representing the same concept but with different scales (e.g., a short document and a long document on identical topics) will still yield a similarity of 1.0, whereas Euclidean distance would penalize the length difference heavily. This property is why cosine similarity is the metric of choice for sparse, high-dimensional text data.

Step-by-Step Example

Compute cosine similarity for A = [2, 3, 1] and B = [4, 0, 2] across 3 dimensions:

  1. Dot product: (2×4) + (3×0) + (1×2) = 8 + 0 + 2 = 10.
  2. ‖A‖: √(4 + 9 + 1) = √14 ≈ 3.742.
  3. ‖B‖: √(16 + 0 + 4) = √20 ≈ 4.472.
  4. Result: 10 ÷ (3.742 × 4.472) = 10 ÷ 16.733 ≈ 0.5976.

A score of approximately 0.60 indicates moderate directional alignment between the two vectors. Notice that the vectors do not need to have the same magnitude; the normalization step ensures the result depends only on direction.

Interpreting the Score

  • 1.0: perfectly aligned — vectors point in the same direction.
  • 0.7–0.99: high similarity — typical for closely related documents or feature sets.
  • 0.3–0.69: moderate similarity — partial content or feature overlap.
  • 0.0: orthogonal — no shared directional information.
  • -1.0: perfectly opposite — vectors point in entirely different directions.

Real-World Applications

Natural Language Processing and Search

TF-IDF vectors and dense word embeddings are routinely compared with cosine similarity to rank document relevance in information retrieval. As detailed in the Northwestern University LING 334 GloVe assignment, cosine similarity between 300-dimensional GloVe vectors effectively captures semantic relatedness — word pairs like king and queen score near 0.85 while semantically unrelated words score near 0.0, demonstrating the measure's discriminative power.

Plagiarism Detection and Document Comparison

Academic integrity systems convert essays into term-frequency vectors and flag document pairs that exceed a cosine similarity threshold of 0.8. Research published on PubMed Central (2024) confirms that cosine similarity outperforms Euclidean distance for high-dimensional sparse vectors — precisely the regime where text data lives — due to its inherent scale invariance and insensitivity to document length.

Recommendation Systems and Bioinformatics

Collaborative filtering engines represent user preferences as item-rating vectors and surface similar users through cosine similarity, powering recommendations on e-commerce and streaming platforms. In bioinformatics, gene expression profiles modeled as numeric vectors are clustered by cosine similarity to identify co-expressed gene sets across tissue samples. The measure's magnitude-independence means a high-expression and low-expression sample with proportionally identical profiles still register near 1.0 similarity. Financial analysts apply the same principle to compare portfolio return vectors across different fund sizes.

Computational Advantages

Beyond its mathematical elegance, cosine similarity offers practical computational advantages. For sparse vectors common in text processing, the dot product computation skips zero components entirely, making the calculation fast even in thousands of dimensions. The normalization by magnitude prevents overflow errors common in other distance metrics, and the bounded output range [-1, 1] simplifies threshold-based decision-making in production systems.

About This Calculator

This cosine similarity calculator supports vectors with 2 to 5 dimensions. Select the number of dimensions, enter each component of vectors A and B, and receive the result instantly — no manual norm computation required. The foundational theory and geometric derivation are covered in detail in the UTA ITLab Cosine Similarity Tutorial, a comprehensive reference for students and practitioners alike.

Reference

Frequently asked questions

What is cosine similarity and what does it measure?
Cosine similarity measures the cosine of the angle between two vectors in n-dimensional space, producing a value between -1 and 1. It focuses on directional alignment rather than magnitude, so two vectors pointing in the same direction yield a similarity of 1.0 even if their lengths differ greatly. This property makes cosine similarity especially useful for comparing text documents, word embeddings, and feature vectors where scale differences are irrelevant to content similarity.
What values can cosine similarity produce and what do they mean?
Cosine similarity produces values ranging from -1 to 1. A score of 1.0 means the vectors point in exactly the same direction (perfectly similar), 0.0 means they are perpendicular (no directional overlap), and -1.0 means they point in opposite directions (maximally dissimilar). For non-negative data such as word-frequency counts or user ratings, all vector components are zero or positive, so practical scores range from 0 to 1 in those contexts.
How is cosine similarity different from Euclidean distance?
Euclidean distance measures the straight-line distance between two points, so large vectors that are directionally identical but differently scaled appear dissimilar. Cosine similarity ignores magnitude entirely and compares only direction. For example, a 1,000-word document and a 100-word document on the same topic will have a large Euclidean distance but a cosine similarity close to 1.0 — making cosine similarity the preferred choice for text and embedding comparisons where document length should not penalize similarity.
How is cosine similarity used in natural language processing?
In NLP, text documents are converted into high-dimensional vectors using methods like TF-IDF or neural word embeddings, and cosine similarity quantifies how semantically related they are. Search engines rank results by computing cosine similarity between query and document vectors. Embedding models like GloVe and Word2Vec are evaluated by verifying that semantically similar words produce cosine similarity scores above 0.75 in 300-dimensional embedding space, with near-synonyms often reaching 0.85 or higher.
What dimensions does this cosine similarity calculator support?
This calculator supports vectors from 2 to 5 dimensions. Use the dimension selector to choose the appropriate size, then enter the corresponding components — A1 through A5 for Vector A, and B1 through B5 for Vector B. Components beyond the selected dimension count are automatically ignored; for example, a 3-dimensional calculation uses only A1, A2, A3 and B1, B2, B3, discarding any values entered in A4, A5, B4, or B5.
Is cosine similarity the same as Pearson correlation?
Cosine similarity and Pearson correlation are closely related but not identical. Pearson correlation measures linear association between mean-centered variables, while cosine similarity computes directional agreement between raw vectors without subtracting the mean. When both vectors are mean-centered before comparison, cosine similarity and Pearson correlation produce identical results. In practice, cosine similarity is preferred for sparse, non-negative data like word counts, while Pearson correlation is the standard choice for continuous variables in classical statistics.