

常见相似度计算方法回顾
source link: https://blogread.cn/it/article/8202?f=hot1
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

常见相似度计算方法回顾
最近学习了常见的一些相似度计算的方法,在寻找资料的过程中找到了一篇较好的博客。主要是图做的比较好。所以拿过来做下简单的回顾与复习。

欧几里得距离

欧几里得距离其实就是空间内两点之间的直线距离。
Python实现:
from math import* def euclidean_distance(x,y): return sqrt(sum(pow(a-b,2) for a, b in zip(x, y))) print euclidean_distance([0,3,4,5],[7,6,3,-1])
曼哈顿距离

曼哈顿距离其实就是每一轴距离之和。
Python实现:
from math import* def manhattan_distance(x,y): return sum(abs(a-b) for a,b in zip(x,y)) print manhattan_distance([10,20,10],[10,20,20])

闵氏距离被看做是欧氏距离和曼哈顿距离的一种推广。公式中包含了欧氏距离、曼哈顿距离和切比雪夫距离。
Python实现:
from math import* from decimal import Decimal def nth_root(value, n_root): root_value = 1/float(n_root) return round (Decimal(value) ** Decimal(root_value),3) def minkowski_distance(x,y,p_value): return nth_root(sum(pow(abs(a-b),p_value) for a,b in zip(x, y)),p_value) print minkowski_distance([0,3,4,5],[7,6,3,-1],3)
余弦相似度

余弦相似度理解起来较为简单,就是向量在空间方向上的差异。
Python实现:
from math import* def square_rooted(x): return round(sqrt(sum([a*a for a in x])),3) def cosine_similarity(x,y): numerator = sum(a*b for a,b in zip(x,y)) denominator = square_rooted(x)*square_rooted(y) return round(numerator/float(denominator),3) print cosine_similarity([3, 45, 7, 2], [2, 54, 13, 15])
杰卡德相似度



杰卡德相似度理解起来非常的简单,就是集合的交集除以并集。
Python实现:
def jaccard_similarity(x,y): intersection_cardinality = len(set.intersection(*[set(x), set(y)])) union_cardinality = len(set.union(*[set(x), set(y)])) return intersection_cardinality/float(union_cardinality) print jaccard_similarity([0,1,2,5,6],[0,2,3,5,7,9])
原文链接:http://dataaspirant.com/2015/04/11/five-most-popular-similarity-measures-implementation-in-python/
建议继续学习:
扫一扫订阅我的微信号:IT技术博客大学习
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK