2

常见相似度计算方法回顾

 2 years ago
source link: https://blogread.cn/it/article/8202?f=hot1
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

常见相似度计算方法回顾

浏览:850次  出处信息

   最近学习了常见的一些相似度计算的方法,在寻找资料的过程中找到了一篇较好的博客。主要是图做的比较好。所以拿过来做下简单的回顾与复习。

   

similarity-distance-measure.png

欧几里得距离

   

Euclidean-distance.png

   欧几里得距离其实就是空间内两点之间的直线距离。

   Python实现:

from math import*

def euclidean_distance(x,y):
    return sqrt(sum(pow(a-b,2) for a, b in zip(x, y)))

print euclidean_distance([0,3,4,5],[7,6,3,-1])

曼哈顿距离

   

Manhattan-Distance.png

   曼哈顿距离其实就是每一轴距离之和。

   Python实现:

from math import*
 
def manhattan_distance(x,y):
    return sum(abs(a-b) for a,b in zip(x,y))

print manhattan_distance([10,20,10],[10,20,20])

   

Minkowski-Distance.png

   闵氏距离被看做是欧氏距离曼哈顿距离的一种推广。公式中包含了欧氏距离、曼哈顿距离和切比雪夫距离

   Python实现:

from math import*
from decimal import Decimal
 
def nth_root(value, n_root):
    root_value = 1/float(n_root)
    return round (Decimal(value) ** Decimal(root_value),3)
 
def minkowski_distance(x,y,p_value):
    return nth_root(sum(pow(abs(a-b),p_value) for a,b in zip(x, y)),p_value)
 
print minkowski_distance([0,3,4,5],[7,6,3,-1],3)

余弦相似度

   

Cosine-similarity.png

   余弦相似度理解起来较为简单,就是向量在空间方向上的差异。

   Python实现:

from math import*
 
def square_rooted(x):
    return round(sqrt(sum([a*a for a in x])),3)
 
def cosine_similarity(x,y):
    numerator = sum(a*b for a,b in zip(x,y))
   denominator = square_rooted(x)*square_rooted(y)
   return round(numerator/float(denominator),3)
 
print cosine_similarity([3, 45, 7, 2], [2, 54, 13, 15])

杰卡德相似度

   

Jaccard-similarity-1.png

   

Jaccard-similarity-2.png

   

Jaccard-similarity-3.png

   杰卡德相似度理解起来非常的简单,就是集合的交集除以并集。

   Python实现:

def jaccard_similarity(x,y):
    intersection_cardinality = len(set.intersection(*[set(x), set(y)]))
    union_cardinality = len(set.union(*[set(x), set(y)]))
    return intersection_cardinality/float(union_cardinality)
 
print jaccard_similarity([0,1,2,5,6],[0,2,3,5,7,9])

   原文链接:http://dataaspirant.com/2015/04/11/five-most-popular-similarity-measures-implementation-in-python/

建议继续学习:

QQ技术交流群:445447336,欢迎加入!
扫一扫订阅我的微信号:IT技术博客大学习

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK