The Jaccard Similarity, also known as the Jaccard Index or Jaccard Coefficient, is a measure of similarity between two sets. It is defined as the size of the intersection divided by the size of the union of the two sets:

Explanation

  1. Measure the similarity of two sets
  2. |X| Stands for the number of elements of set X
  3. Jaccard distance = 1 − Jaccard similarity

python

  def jaccard_similarity(set_a, set_b):
    """Compute the Jaccard Similarity between two sets."""
    intersection = len(set_a.intersection(set_b))
    union = len(set_a.union(set_b))
    return intersection / union if union != 0 else 0.0
 
# Example usage:
set1 = {"a", "b", "c"}
set2 = {"b", "c", "d"}
similarity = jaccard_similarity(set1, set2)
print(f"Jaccard Similarity: {similarity}")

Reference List

  1. A Comprehensive Survey of Clustering Algorithms Xu, D. & Tian, Y. Ann. Data. Sci. (2015)