《Python机器学习——预测分析核心算法》——2.3　对“岩石vs.水雷”数据集属性的可视化展示

2.3　对“岩石vs.水雷”数据集属性的可视化展示

2.3.1　利用平行坐标图进行可视化展示

__author__ = 'mike_bowles'
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plot
target_url = ("https://archive.ics.uci.edu/ml/machine-learning-"
"databases/undocumented/connectionist-bench/sonar/sonar.all-data")

#read rocks versus mines data into pandas data frame

for i in range(208):
#assign color based on "M" or "R" labels
if rocksVMines.iat[i,60] == "M":
pcolor = "red"
else:
pcolor = "blue"

#plot rows of data as if they were series data
dataRow = rocksVMines.iloc[i,0:60]
dataRow.plot(color=pcolor)

plot.xlabel("Attribute Index")
plot.ylabel(("Attribute Values"))
plot.show()

<div style="text-align: center"><img src="https://yqfile.alicdn.com/650ab5cd482e35defd49513378ac50cbb96acf02.png" width="" height="">
</div>

####2.3.2　属性和标签的关系可视化

author = 'mike_bowles'
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plot
target_url = ("https://archive.ics.uci.edu/ml/machine-learning-"
"databases/undocumented/connectionist-bench/sonar/sonar.all-data")

calculate correlations between real-valued attributes

dataRow2 = rocksVMines.iloc[1,0:60]
dataRow3 = rocksVMines.iloc[2,0:60]

plot.scatter(dataRow2, dataRow3)

plot.xlabel("2nd Attribute")
plot.ylabel(("3rd Attribute"))
plot.show()

dataRow21 = rocksVMines.iloc[20,0:60]

plot.scatter(dataRow2, dataRow21)

plot.xlabel("2nd Attribute")
plot.ylabel(("21st Attribute"))
plot.show()

图2-4和图2-5为来自“岩石vs.水雷”数据集的两对属性的散点图。“岩石vs.水雷”数据集的属性是声纳返回的取样值。声纳返回的信号又叫啁啾信号 ，因为它是一个脉冲信号，开始在低频，然后上升到高频。这个数据集的属性就是声波由岩石或水雷反射回来的时间上的取样。这些返回的声学信号携带的时间与频率的关系与发出的信号是一样的。数据集的60个属性是返回的信号在60个不同时间点的取样（因此是60个不同的频率）。你可能会估计相邻的属性会比隔一个的属性更相关，因为在相邻时间上的取样在频率上的差别应该不大。

<div style="text-align: center"><img src="https://yqfile.alicdn.com/8a2f4ea1dceeae64e40ffd315e073bae593fd68b.png" width="" height="">
</div>

author = 'mike_bowles'
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plot
from random import uniform
target_url = ("https://archive.ics.uci.edu/ml/machine-learning-"
"databases/undocumented/connectionist-bench/sonar/sonar.all-data")

change the targets to numeric values

target = []
for i in range(208):

#assign 0 or 1 target value based on "M" or "R" labels
if rocksVMines.iat[i,60] == "M":
target.append(1.0)
else:
target.append(0.0)


plot 35th attribute

dataRow = rocksVMines.iloc[0:208,35]
plot.scatter(dataRow, target)

plot.xlabel("Attribute Value")
plot.ylabel("Target Value")
plot.show()

and makes them somewhat transparent

target = []
for i in range(208):

assign 0 or 1 target value based on "M" or "R" labels

# and add some dither

if rocksVMines.iat[i,60] == "M":
target.append(1.0 + uniform(-0.1, 0.1))
else:
target.append(0.0 + uniform(-0.1, 0.1))

#plot 35th attribute with semi-opaque points

dataRow = rocksVMines.iloc[0:208,35]
plot.scatter(dataRow, target, alpha=0.5, s=120)

plot.xlabel("Attribute Value")
plot.ylabel("Target Value")
plot.show()

__author__ = 'mike_bowles'
import pandas as pd
from pandas import DataFrame
from math import sqrt
import sys
target_url = ("https://archive.ics.uci.edu/ml/machine-learning-"
"databases/undocumented/connectionist-bench/sonar/sonar.all-data")

#read rocks versus mines data into pandas data frame

#calculate correlations between real-valued attributes
dataRow2 = rocksVMines.iloc[1,0:60]
dataRow3 = rocksVMines.iloc[2,0:60]
dataRow21 = rocksVMines.iloc[20,0:60]

mean2 = 0.0; mean3 = 0.0; mean21 = 0.0
numElt = len(dataRow2)
for i in range(numElt):
mean2 += dataRow2[i]/numElt
mean3 += dataRow3[i]/numElt
mean21 += dataRow21[i]/numElt

var2 = 0.0; var3 = 0.0; var21 = 0.0
for i in range(numElt):
var2 += (dataRow2[i] - mean2) * (dataRow2[i] - mean2)/numElt
var3 += (dataRow3[i] - mean3) * (dataRow3[i] - mean3)/numElt
var21 += (dataRow21[i] - mean21) * (dataRow21[i] - mean21)/numElt

corr23 = 0.0; corr221 = 0.0
for i in range(numElt):

corr23 += (dataRow2[i] - mean2) * \
(dataRow3[i] - mean3) / (sqrt(var2*var3) * numElt)
corr221 += (dataRow2[i] - mean2) * \
(dataRow21[i] - mean21) / (sqrt(var2*var21) * numElt)

sys.stdout.write("Correlation between attribute 2 and 3 \n")
print(corr23)
sys.stdout.write(" \n")
sys.stdout.write("Correlation between attribute 2 and 21 \n")
print(corr221)
sys.stdout.write(" \n")

Output:
Correlation between attribute 2 and 3
0.770938121191

Correlation between attribute 2 and 21
0.466548080789

2.3.3　用热图（heat map）展示属性和标签的相关性

__author__ = 'mike_bowles'
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plot
target_url = ("https://archive.ics.uci.edu/ml/machine-learning-"
"databases/undocumented/connectionist-bench/sonar/sonar.all-data")

#read rocks versus mines data into pandas data frame

#calculate correlations between real-valued attributes
corMat = DataFrame(rocksVMines.corr())

#visualize correlations using heatmap
plot.pcolor(corMat)
plot.show()

<div style="text-align: center"><img src="https://yqfile.alicdn.com/503adf13b79dcdac254303e279e43c3d21f90480.png" width="" height="">
</div>

####2.3.4　对“岩石vs.水雷”数据集探究过程小结`

|
1天前
|

Python 与机器学习：构建高效数据处理流程

8 2
|
1天前
|
Python

6 1
|
1天前
|

Python中的数据分析与可视化库Matplotlib简介

13 0
|
13天前
|

GEE机器学习——混淆矩阵Classifier.confusionMatrix()和errorMatrix()和exlain()的用法（js和python代码）
GEE机器学习——混淆矩阵Classifier.confusionMatrix()和errorMatrix()和exlain()的用法（js和python代码）
11 0
|
13天前
|

GEE机器学习——最大熵分类器案例分析（JavaScript和python代码）
GEE机器学习——最大熵分类器案例分析（JavaScript和python代码）
15 0
|
13天前
|
SQL 定位技术 API
GEE python:按照矢量中的几何位置、属性名称和字符串去筛选矢量集合
GEE python:按照矢量中的几何位置、属性名称和字符串去筛选矢量集合
16 0
|
13天前
|

29 0
|
16天前
|

11 0
|
17天前
|

Python与机器学习：开启智能应用的新纪元

16 2
|
17天前
|

18 2

• 机器翻译
• 工业大脑

更多

更多

更多