1. 介绍
2. 指标
2.1. 欧几里得距离
该指标计算 n 维空间中两点之间的直线距离。它常用于连续的数值数据,易于理解和实现。但是,它可能对异常值很敏感,并且没有考虑不同特征的相对重要性。
from scipy.spatial import distance
# Calculate Euclidean distance between two points
point1 = [1, 2, 3]
point2 = [4, 5, 6]
# Use the euclidean function from scipy's distance module to calculate the Euclidean distance
euclidean_distance = distance.euclidean(point1, point2)
2.2. 曼哈顿距离
from scipy.spatial import distance
# Calculate Manhattan distance between two points
point1 = [1, 2, 3]
point2 = [4, 5, 6]
# Use the cityblock function from scipy's distance module to calculate the Manhattan distance
manhattan_distance = distance.cityblock(point1, point2)
# Print the result
print("Manhattan Distance between the given two points: " + \
2.3. 余弦相似度
from sklearn.metrics.pairwise import cosine_similarity
# Calculate cosine similarity between two vectors
vector1 = [1, 2, 3]
vector2 = [4, 5, 6]
# Use the cosine_similarity function from scikit-learn to calculate the similarity
cosine_sim = cosine_similarity([vector1], [vector2])[0][0]
# Print the result
print("Cosine Similarity between the given two vectors: " + \
str(cosine_sim))Jaccard Similarity
2.4. Jaccard相似度
def jaccard_similarity(list1, list2):
Calculates the Jaccard similarity between two lists.
list1 (list): The first list to compare.
list2 (list): The second list to compare.
float: The Jaccard similarity between the two lists.
# Convert the lists to sets for easier comparison
s1 = set(list1)
s2 = set(list2)
# Calculate the Jaccard similarity by taking the length of the intersection of the sets
# and dividing it by the length of the union of the sets
return float(len(s1.intersection(s2)) / len(s1.union(s2)))
# Calculate Jaccard similarity between two sets
set1 = [1, 2, 3]
set2 = [2, 3, 4]
jaccard_sim = jaccard_similarity(set1, set2)
# Print the result
print("Jaccard Similarity between the given two sets: " + \
2.5. 皮尔逊相关系数
import numpy as np
# Calculate Pearson correlation coefficient between two variables
x = [1, 2, 3, 4]
y = [2, 3, 4, 5]
# Numpy corrcoef function to calculate the Pearson correlation coefficient and p-value
pearson_corr = np.corrcoef(x, y)[0][1]
# Print the result
print("Pearson Correlation between the given two variables: " + \
